Nautical Terms – A Partial List

Greetings, Internet!

I’m back now, and ready to narrate my experiences during my hiatus.

First off, my paper on OCR optimization of public signage photos has been published (finally!) Here is the link:

Let’s now step back by a semester, to last year’s June, when I took on a month-long internship at Bharat Electronics Limited (BEL). While there, I came face-to-face with various technologies encountered only in Naval contexts. The experience was an exciting one, and it gave me a chance to interact with the minds who work to keep the Indian Navy combat-ready, and understand how to deal with problems faced on-board its ships and submarines.

In the coming set of posts, I shall attempt to explain each technology I studied and worked with. Before doing that, I need to elaborate on some of the nautical terms which I shall use in my explanations. Time to dive in!

Parts of a Marine Vessel

  1. Port – This is the left side of a marine vessel.
  2. Starboard (pronounced stah-bud) – This is the right side of the vessel.
  3. Bow – This is the front side of the ship.
  4. Stern – This is the back side of the ship.
  5. Bridge – It is a room or platform from which the ship can be commanded.
    The bridge of a civilian ship [Credits: Unique Infinity]
  6.  Control Room – Also called the Ops Room (short for Operations Room), it is where the captain’s commands are deployed.
  7. Conning Tower – An armoured platform that allows the officer-in-charge to control the vessel’s movements. In submarines, this is also where the periscope is located.
    Parts of a Submarine [Source:]
  8. Superstructure – This is the part of the ship above the deck. It includes the Bridge, Conning Tower and Control Room.
  9. Hull – It is the main body of the vessel, on top of which the superstructure is built.
  10. Quarterdeck – A raised deck present at the stern of the ship.
  11. Forecastle – It is the upper deck of the ship, and is present at the bow.
  12. Periscope – It allows a submarine to visually search for nearby threats and targets on the water surface and in the air. It retracts into the submarine’s hull when not in use.
  13. Gyro – Gyros are used to stabilize roll motions on a marine vessel while at sea.
  14. Galley – It is the kitchen of the marine vessel, as food is prepared here.
  15. Turret – Short for gun turret, it is a weapon mount that houses the crew or mechanism of a projectile-firing weapon. It also allows the weapon to be aimed and fired at a particular azimuth and elevation.
  16. Rudder – It is a flat plane attached to the stern of the vessel with hinges. A rudder operates by redirecting the water past the hull, imparting a yawing motion to the vessel.
  17. Gangway – This is a narrow walkway used to join the quarterdeck to the forecastle of the ship, or to board and disembark ships.

Direction, Distance and Speed

Following are the nautical terms for direction, distance and speed:

  1. Roll, Pitch, Yaw – These are the rotation axes of a marine vessel.
  2. Fore – It is the direction towards the bow of the ship.
  3. Aft – It is the direction towards the stern of the ship.
  4. Heading – This is the direction of the vessel’s bow.
  5. Bearing – It is described w.r.t. the magnetic North or South, and at what angle it lies in the magnetic East or West direction.
  6. Azimuth – It is the clockwise horizontal angle (in degrees) w.r.t. magnetic North. This too is calculated in the clockwise direction, and is equivalent to a vessel’s bearing. For example, a bearing of S 45° E is equivalent to an azimuth of 135°.
  7. Elevation – The elevation of a target or satellite is the angle between itself and the local horizontal plane of the vessel.
  8. Course – This is the intended path of travel by the vessel.
  9. Nautical Mile – This is the standard for distance measurements at sea. One nautical mile equals 1.85 kilometres (or 1.15 miles).
  10. Knot – This is the standard unit of measurement for speed in watercrafts. One knot is the equivalent of one nautical mile per hour.

This concludes my list of nautical terms. In my next post, I shall explain the working of Fire Control Systems (FCSes).

Hope you found this informative!

Related Links:

  1. Classifications of Naval Vessels; an article:
  2. Conning Towers, Bridges and Periscopes; an article:
  3. Inside an Indian Submarine (INS Sindhughosh); a video:
  4. Difference between an Azimuth and Bearing; an explanation:
  5. Understanding Azimuth and Elevation; an article:
  6. Why is a ship’s speed measured in knots; an article:

FFTs And Their Usage

It was around two centuries ago, when eminent mathematician Joseph Fourier demonstrated that some functions could be represented as an infinite sum of harmonics. Named the Fourier series in his honour, this technique was employed thereafter on other mathematical functions as well. Over the decades, these functions have been discretized, and methods such as FFTs (Fast Fourier Transforms) have been developed, and used to quickly find their Fourier transforms.

Naturally, the question that pops to mind is this – what are Fourier transforms, and how do they concern us?

Fourier Transforms – what are they?

In layman terms, it is simply a way of representing a given signal in the form of sinusoidal waves – the kind of waves scientists and engineers encounter on an almost daily basis.

At first, it might not appear to be very useful. After all, Fourier transforms merely convert signals from the time domain into components in the frequency domain. Is there any point in performing any more computations on a given signal, that is already defined in the time domain?

However, given that all real-world signals are prone to noise, it becomes essential to find a simple representation of such signals. As it turns out, Fourier transformations help accomplish that.

This GIF explains the concept more clearly, showing how a complex signal may be analyzed on the basis of its frequencies, thanks to the Fourier transform:

Image Credits: Lucas V. Barbosa

This signal is still a relatively simple one. Once signals become more complex, calculating their Fourier transforms becomes a tedious process. To simplify these calculations, DFTs (Discrete Fourier Transforms)  are used in place of regular Fourier transforms – more specifically, N-point DFTs.

DFTs and FFTs

Here is the formula for an N-point DFT:

\displaystyle  X[k]= \sum_{n=0}^{N-1} x[n]{w^{kn}_N}


\displaystyle k = 0, 1, ..., N-1


\displaystyle  w = e^{-j\frac{2\pi}{N}}

Despite the merits this method had to offer, DFTs were still found to be computationally intensive. Hence, to reduce the calculation time, James Cooley  and John Tukey proposed a new technique in their research paper titled “An Algorithm for the Machine Calculation of Complex Fourier Series”, in 1965.

Although the main motive behind the Cooley-Tukey algorithm was to locate nuclear explosions in the Soviet Union through sensors planted in surrounding countries, it soon began to be used in all electronic devices in general. In fact, it is the most commonly used method to calculate FFTs even today.

The Butterfly Structure

The Cooley-Tukey algorithm makes use of a ‘butterfly structure’, in which the ‘butterfly’ is the basic computational element of the FFT, that converts two complex points into two other complex points. Here is what it looks like:

Image Credits:

This structure helps visualize the working of the FFT. It explains the reduction in number of computations, right from the N2 multiplications and N(N-1) additions in an N-point DFT, to (N/2)log2N multiplications and Nlog2N additions in an N-point radix-2 FFT.

Why FFTs?

Because of their ability to convert complex information into data of a much smaller size, along with being reversible (through Inverse Fast Fourier Transforms, or IFFTs), FFTs are widely used by modern-day devices process large amounts of data, especially in audio(mp3), video (mp4), and image(JPEG) files. Moreover, they are used in almost all communication protocols in use today, such as Ethernet, Wi-Fi, 4G, Bluetooth etcetera, because of their ability to convert analog information into digital data, and vice versa.

Hence, we cannot deny the importance of FFTs in our daily lives. With more appliances and electronic devices turning into ‘smart’ devices, we can only expect the implementation of FFTs to become more commonplace. In fact, you would not have read this article, had it not been for the Cooley-Tukey algorithm scrambling and unscrambling the information bits comprising this blog post!

Related Links

  1. The Fourier Transform; a website:

  1. An Algorithm for the Machine Calculation of Complex Fourier Series; the original paper by James W. Cooley and John W. Tukey:

  1. Computing FFT Twiddle Factors; an article:

  1. Cooley-Tukey FFT algorithm; an article:

  1. How the FFT Works; an article:

  1. Fourier analysis and applications to sound processing; a PDF with MATLAB examples:

  1. FFT; an old article:

  1. Amateur-radio Applications of Fast Fourier Transform; an old PDF:

Python Codes and Matrices

This has been quite a busy month, especially on the programming front. On the good side of things, I have made some headway in the image-to-text project I previously mentioned. On the other hand, my teammate and I really need to hasten our work, and complete our project within the next two months. There is much to do, in so little time!

This project has also opened my mind about Python codes, and has given me compelling reasons to continue learning more about it. Being a staunch C++ programmer since my high school days, it took me quite some time to realize its immense value in today’s computer systems, and gain confidence in it .

Python – an introduction

Python was released by Guido van Rossum in 1991. An ardent fan of Monty Python’s Flying Circus, Van Rossum initiated it as a hobby project in December 1989, as an interpreter for ABC, another programming language at the time. Python eventually became an interpreted language in its own right, and now has a large developer base, spread all over the planet.

One reason for its global appeal is its strong emphasis on keeping lines of code as readable and clutter-free as possible. This is implemented by PEPs (Python Enhancement Proposals) – design documents that provide the rationale behind a feature, and its technical specifications.

The strict adherence to readability in Python is obvious from the fact that it lacks any curly braces or semicolons for indentation, and completely relies on the use of tabs and spaces for the same. This, unfortunately, acts as a double-edged sword, since Python is unforgiving when programmers add too many or too little spaces, or use a mix of spaces and tabs in their code. More often than not, Python code developed by rookie programmers will not run, due to incorrect indentation. Hence, it might help to go through the PEP 8 guide, and correct all indentation errors in the code.

Nevertheless, Python has become immensely popular over the last few decades, and continues to go mainstream, so much that de-facto libraries like OpenGL, OpenCL, Unity and TensorFlow have all been developed in C++, and are available for integration with Python, using wrappers.

What are wrappers?

Wrappers are interfaces that allow a program to build on to an existing piece of code or program, without disturbing it. This allows the programmer to extend the capabilities of a program, or a portion of it, while hiding a few features of the original program (abstraction). Hence, the portability of the program code is increased.

Through wrappers, Python lifts a big burden off the programmer’s shoulders, since he or she no longer has to translate the entire code into another language, and simply has to decide what features to include, and which ones to exclude, while designing the new interface.

Python wheels – libraries installed using pip

Another reason why I am impressed by Python modules is their ease of installation. With just a single ‘pip install’ command in Terminal, a user can download and install any Python library he or she needs, and deploy it immediately.

This has been made possible through wheels – a packaging format created by the Python community. These are maintained on PyPI (Python Packaging Index), Python’s official repository for third-party software. With its origins dating back to September 2000, it continues to grow and improve to this day. It currently houses over 100,000 packages, enough for a programmer to be spoilt for choice.

On the other hand, a C/C++ user, more often than not, is expected to manually download a zipped file from the Internet, decompress it, and build it from source – an unpleasant and tiresome experience in my past programming ventures. Undoubtedly, Python programmers are at an advantage over here!


One data type that piqued my interest was the ‘tuple’, a concept borrowed from mathematics. It is a finite sequence of elements. It is extremely valuable in creating CSV files, and computational mathematics, especially matrix operations.

At this point, many of you might be wondering, “Even if all its elements are numbers, a tuple is just a row matrix. How is computational mathematics going to benefit from tuples?”

Enter the matrix

To counter this thought, I shall now share an example, from my project. Remember the colour images I wished to convert to grayscale? Well, that requires each pixel of the image to be read as a matrix, and what better way to do that than by tuples!

A cat photo from Tumblr.

Here is some of the pixel information of this image, in matrix notation:

[[[ 71 65 53]

[ 73 67 55]

[ 76 70 58]

[168 178 143]

[166 176 139]

[164 174 137]]

[[ 58 48 38]

[ 57 47 37]

[ 55 45 35]

[221 215 199]

[221 215 199]

[221 215 199]]]

Though it is only part of the whole image, it is clear that each pixel is represented as a tuple of three elements. These elements are the red(R), green(G) and blue(B) values corresponding to their pixels. Python needs to read the image as a matrix of three-element tuples, before any kind of image manipulation can be done.

Taking this matrix of tuples, the image gets converted to grayscale using a simple formula, that computes the average of the R,G and B values of each pixel.

This causes the program to generate this output:

The cat, in grayscale.

By the way, here is the complete Python program, if you wish to have a look:

Note: Although the saved grayscale has the same resolution as the colour input, the pyplot function is generating this output on my machine, for some unknown reason.

I suspect it is caused by a missing module from my machine, since it works fine on my friend’s laptop. Hopefully, I shall fix this bug soon!

External Links

  1. PEP 8 – the Style Guide for Python code:

  1. Function wrapper and python decorator; a blog post:

  1. Tuples; a chapter in How to Think Like a Computer Scientist: Learning with Python 3:

  1. RGB to Grayscale Conversion; a tutorial:


The Processing IDE continues to impress me with its visual capabilities and intuitive functions. In fact, it surprised me to find that I could program a side-scroller – a hitherto complex interface – within 90 lines of code, and decided to write about it.

The term ‘side-scroller’ now encompasses all videogames that utilize a side-view angle of the camera in the gameplay. The common element in all of these games is a display that scrolls in response to the player’s input.

This month’s discussion is dedicated to BikeRoller – an endlessly scrolling sketch. Here, the player controls the biker’s direction (along with the background behind it) using the LEFT and RIGHT arrow keys. 

A brief history  

Side-scrollers have been in existence since the early 1970s, with Speed Race(1974) being the first game to use a scrolling display, albeit a vertical one. This technological feat was achieved Tomohiro Nishikado, the game’s designer, who also incorporated sprites and collision detection in this racing arcade game.

The most iconic game title in this genre is Super Mario Bros., a platformer game that was developed and published by Nintendo for the NES(Nintendo Entertainment System) platform. Here, the player controls the protagonist Mario, who must race through all the stages in Mushroom Kingdom to save Princess Peach(US version: Princess Toadstool), while smashing bricks, collecting coins and power-ups, and defeating the antagonist Goombas(mushrooms) and Koopa Troopas(turtles).

This game has laid the foundation for subsequent(as well as modern) videogames, whether side-scrolling or not, by incorporating secret levels to discover, power-ups to collect, and enemies to defeat – all within a fixed time frame and limited number of lives.

While side-scrollers have given way to FPSes(first person shooters) in the past two decades, they continue to be popular on smart-phones and other handheld devices.

Scrolling the display

The display may be scrolled in several different ways – horizontally, vertically, or a combination of both. In this sketch, the background elements(the road, the river, and the sky) move in response to the arrow keys pressed by the user. Hence, the biker(controlled by the player) appears to move due to the foreground and background elements moving behind it.

In order to give the player a sense of depth, a parallax effect has been created by moving foreground and background elements at different speeds, giving a sense of depth.  

Though the sketch is quite basic, it helps to illustrate the basic working of a side-scroller. Currently using only LEFT and RIGHT arrow keys as its input, its scope may be expanded with more keyboard and mouse controls.

Technical explanation

As can be seen from the algorithm, three inputs are required to execute the scrolling loop: 

1. img – an image file of the desired element, 

2. step – an increment value by which the element is moved or scrolled, and

3. keyCode – the value of the pressed key. 

Apart from these, two more integers are declared within this function, which are as follows:

1. x – the x-position of the element on the screen. It is used in the incrementing process (updating the rendered element’s position)

2. dir – used to store either the value of -1 or 1, depending on the value of keyCode.

In the first run, the scrolling function will render the element by positioning one copy of it at (0,0) , (-imgWd, 0) and (imgWd, 0).[Note: imgWd is the pixel width of the element.]

After this, x is incremented by a value of -dir*step, followed by the value of dir being checked. 

From here on, depending upon the conditions mentioned in the algorithm, the flow of control will move either towards termination of the loop, or continuation of the same set of steps detailed above.

Hopefully, you have now learnt a little more about the technicalities of a scrolling background.

Related Links        

1. BikeRoller; the project on GitHub:          

2. Side Scrollers: A Planar Odyssey; a historical narrative:

3. Forerunners : The History Of The PC Side-Scroller; a documentary:

4. Endless Runner Games: Evolution and Future; an article:

Processing Visuals

Of late, I have been tinkering around with a new, Java-based IDE (integrated development environment) – Processing 3. Also offered as Python and JavaScript libraries (as and p5.js respectively), Processing is an open-source initiative by Benjamin Fry and Casey Reas to automate visuals.

Having seen several output demos and YouTube tutorials about this platform, I decided to try it out myself. After downloading it, and going through several Processing sketches – both on GitHub and elsewhere – I came up with a sketch of my own. Here is its GIF:

Hypnotic-Spiral (my first Processing sketch)

The Interface

Several factors have influenced my latest IDE choice, one of them being the sheer abundance of tutorials and sketch ideas on the Internet. This, along with its simple sketchbook interface, enhances its usability.

What really encouraged me to install Processing in the first place was its integration with Arduino’s IDE. For quite some time, I had been looking for ways to create some visually useful programs that would interface with my Arduino board. To be more specific, I wanted better control of the output received by the Morse Code machine I built using this tutorial:

I was pleasantly surprised to find that the Arduino sketchbook was inspired by Processing’s interface itself. Here is a side-by-side comparison of both –

Processing and Arduino – note the similarities.

Processing – which has inspired several more projects, apart from Arduino – is rightfully credited to the painstaking efforts put in by its community of developers. Hats off to them!

Graphics Programming – Not a New Venture

For me, at least, it’s not a new thing. In fact, the sole reason I picked Computer Science as my optional subject in high school was to understand computer graphics better. At that time, I had read ‘Masters of Doom’ by David Kushner, and was particularly interested in the programming of Doom – a game that paved the way for FPSes(first person shooters) in the DOS era, partly due to the revolutionizing effect of its (pseudo) 3D graphics.

The Computer Science classes were quite useful, since they taught me about the basics of programming in Turbo C++ – flow of control, classes, constructors and destructors, pointers, arrays, read – write sequences etc. To my dismay, all the graphic functions for this IDE were stowed away in the elusive <graphics.h> file, which was never invoked even once in our lessons.

From there, I embarked on a solo mission to educate myself about the same. Equipped with a book on Borland Graphics that I found in my school library, I installed the required software – Turbo C++ 4.5(the IDE), DOSBox(the DOS emulator), and taught myself C code – a process that took me at least two months.

Having experienced this, I find Processing a much simpler IDE, both in installation and usage. In fact, I won’t be surprised if Processing becomes the de-facto for programming in schools and pre-university courses, in the coming years.

I look forward to creating more Processing and Arduino sketches in the future.

External Links

  1. Processing (programming language); a Wikipedia article:

  1. Download \; the download page:

  1. Hypnotic-Spiral; the GitHub repo:

PDF Conversions – Today’s Necessity

Being a college student, I often find myself at the print shop, carrying with me all kinds of documents to be printed – fee slips, academic transcripts, scanned copies of handwritten notes etc. While apps like CamScanner help in creating PDF copies of class notes, their functionality is limited to images that are directly captured by the app(s). Furthermore, the only file formats recognized at the print shop are JPEG, DOC(X) and PDF.

That’s why I have been scouring Google’s Play Store – in the pursuit of an app that can convert all my files to PDF copies, and into other formats, as and when required. One such app that fits the bill is PDF Convertor, developed by Cometdocs.

Before reviewing the app, it is imperative to expound a little on the history and advantages of the file format this app is built around – the PDF.

The emergence of PDF

Short for Portable Document Format, it has a legacy spanning more than two decades, with its first version released on 15 June 1993 by Adobe as a proprietary file format. What made its popularity soar to new heights was the ISO 32000-1, a Public Patent License, which allowed anyone to make, use, sell and distribute PDF-compliant implementations, without paying any royalties to Adobe.

What makes PDF so special today?

There are practical reasons for PDF being the de facto standard for electronic file types. Its capability to convert itself to print-ready graphics on paper, while preserving hyperlinks, images and text embedded within it makes it a versatile format. The cherry on top is its file size, which is much smaller than its JPEG counterpart, thanks to the data compression algorithms it uses.

Another factor is its OS independence, which allows it to look the same across all operating systems, making it more portable. Further, with recent versions of Android supporting PDF, its user base has expanded even more.

Having explained the PDF a little, it’s time to focus on the app itself.

The app’s interface (UI)

On opening PDF Convertor for the first time, the user is greeted with a blank screen, to which files can be added for conversion. There are a total of 24 conversion types to choose from, with 7 of them available as paid features. As of 25th November 2017, the full pack is worth 790 INR, while a la carte conversions are 250 INR each. I was especially interested in its capability to convert XPS to PDF, a hitherto locked feature for me. (XPS is the file format for the output plots generated by OrCAD PSpice, a software I use for circuit simulations, as part of my undergraduate course.)

Having unlocked the full pack, I set forth to use the app for converting the documents at my disposal.

Some of the in-app file conversions available.

There is also a batch conversion option, that allows you to generate a multi-page PDF, or vice versa, depending upon the conversion options at your disposal. I didn’t unlock this feature, since my conversions never exceed beyond a page or two.

An experience limited by Wi-Fi

Despite the well-laid design of the app with its easy to find menus, buttons and notifications, along with the slew of conversion options it has, I was unable to enjoy it to the fullest. The main reason for this is the Wi-Fi connection at my residence, where signal strength is pretty erratic. More often than not, when trying to convert any file, I get the following message:

Check your connection and try again.

Though I couldn’t carry out conversions at all times, it has been a satisfactory experience. All the conversions worked, whenever the Wi-Fi signal was strong enough.

My thoughts and suggestions

Having used different methods of PDF conversion for a while now, I have come to realize that every file conversion requires 3 steps –

  1. Upload files to a server
  2. Wait for the server to convert the files
  3. Download the converted files

The reason most conversion apps draw flak from the users is because they falter in step 1 itself. Not everyone has access to dedicated, high-speed Internet – especially users from developing and underdeveloped nations, making it a huge obstacle that developers need to overcome.

A related moot point is the use of browser web pages for the same task. For most users, who generally have to convert only a file or two, it seems more fitting to convert in this manner, rather than use a dedicated app for the same.

Keeping this in mind, PDF Convertor can incentivize its users into continued usage, by allowing them to create an offline queue for the files to be uploaded. As an analogy, we have YouTube Offline, a feature that allows users to create an offline queue of videos, which are downloaded as and when signal strength is sufficient.

Overall, I find this app an impressive one, and look forward to improvements in its UX.

External Links

  1. PDF Convertor on Google Play; the app:

  1. PDF, What is it FOR?; a video:
  1. PDF, Version 1.7 (ISO 32000-1:2008); a technical description:

  1. Document Management – Portable document format – Part 1: PDF 1.7; the 2008 documentation:

  1. Knowing When to Use Which File Format; an article:

Duolingo – an App Review

I recently acquired a brand-new phone – a Samsung Galaxy J7, as a replacement for my previous Nokia C6-01 smart-phone. The reason is pretty simple – I wasn’t able to install any apps on my Nokia phone, since its Symbian OS is not compatible with .apk files (the file extension for Android apps).

The first thing I did with my new phone was to install a few apps – Duolingo being one of them. Since I had come across multiple recommendations for this app, I decided to give it a try. Besides, I was looking for ways to improve my language proficiency in Urdu and Japanese.

Having used the app for a little while now, I feel that it deserves a review of its own – hence this article!

The interface – first impressions

One feature I really admire about Duolingo is its UI (user interface) – clean, simple and intuitive. When the app is opened the first time, the user is greeted with a plethora of options to choose from – German, Korean, English, Russian, and Japanese, to name a few. Depending upon the user’s language preferences, it offers these languages in different instruction modes.

Since my preferred language is English, I scrolled through the section for English speakers. To my dismay, I couldn’t find Urdu listed under any section, let alone the English section. However, it did list Japanese, which I decided to try out.

The UX (user experience)

Once a course is selected, the user is redirected to a test pertaining to the language. This is completed only after correctly answering a certain number of questions, following which some XP is earned, and a few ‘lingots’ – the currency used for purchases from the ‘Shop’.

Each ‘skill’, indicated by an egg icon, comprises of a number of tests, which must be completed in a similar fashion. Each test has multiple choice questions, translation tasks (audio and/or text), and word-match questions. The more questions the user answers correctly in a row, the more XP and lingots he or she earns.

While it may be used without registration, things get a little tricky when the user wishes to save his or her progress. In that case, app registration is required.

However, once registered, users are allowed to join a language club. These clubs have weekly leaderboards, which effectively gamify the app by creating an atmosphere of competitiveness.

Improving the app

If you’re looking for an app to learn languages in the form of a casual ‘game’, then Duolingo is the way to go. However, I wasn’t quite satisfied with the app, and probably had unrealistically high expectations from it.

In order to truly learn a language, one must not only read and listen to it, but also write it, and speak it. While I don’t mind jotting down words in a notebook, I don’t know whether my handwriting is legible or not. If there was a ‘capture’ feature in Duolingo to detect and identify text, it would be a big help in improving my Japanese handwriting.

When it comes to speaking the language, it is tough to comprehend the pronunciations correctly, even with audio read-outs of displayed words. For this, I suggest that IPA transcriptions be added to every word, and get the app to read out those transcriptions. This will go a long way in making the app’s experience more fulfilling.

Edit: After publishing this post, I came across TinyCards, which is another app developed by Duolingo. Its feature of allowing the creation of custom decks by users really impressed me.

In fact, I would go so far as to say that TinyCards is the perfect learning aid I have come across, for teachers and students alike.

Here is a deck of Urdu words I created, using this app:

Related links

  1. Duolingo on Google Play; the app:

  1. IPA transcriptions in Duolingo; a GitHub repo:

  1. Recognizing handwritten glyphs; a research paper:

Text Detection using Tesseract

Since the past couple of months, me and my colleague have been working on a research project.

The goal is simple – detect characters from a real-world image. However, the intermediate steps involved don’t make the task as straightforward as you might think!

Before discussing the technicalities of the project, it’s important to know what OCR is.

OCR – the heart of text detection

Short for Optical Character Recognition, it is used to identify glyphs – be it handwritten or printed. This way, all glyphs are detected and are separately assigned a character by the computer.

While OCR has gained traction in recent times, is not a new concept. In fact, it is this very technology that bank employees use to read cheques and bank statements.

For this project we chose Tesseract as our OCR engine. It has been developed by Google, and is what is used in their Google Keep app to convert images to text.

The project’s nitty-gritties               

We have limited our scope to printed text – specifically, street signs – and are attempting to convert the captured images to .txt files. This is how our code is intended to work:

If it works, then it would be possible to scale down the file size – a  very handy tool for storing names of places in smart-phones, which always come equipped with a camera these days. Ideally, such a task would be easy to accomplish, with perfect lighting, no perspective distortions or warping, and no background noise.

Reality, unsurprisingly, is quite the opposite. Hence, we are trying to process the images before feeding them to Tesseract, which is known to work best with binary (black and white) images.

According to our plan, we shall implement a three-step method:

  1. remove perspective distortion from the image
  2. binarize the image
  3. pass the image through Tesseract

Training the Tesseract engine

Before processing the images, the OCR engine needs to be ‘trained’ in order to work properly. For this reason, I downloaded jTessBoxEditor – a Java program for editing boxfiles (files generated by Tesseract when detecting glyphs). Since the project uses Ubuntu’s OS, I had to download and install Java Runtime Environment (JRE) to run jTessBoxEditor.

Since my portion of the project involves training the engine, I need to generate sample data for it. The engine needs to be fed samples of Times New Roman, Calibri, and Arial – the three fonts we came across in our images.

Our progress so far

Tesseract is still being trained, and the sample data is yet to be generated. After a while, realizing that these fonts would be available in my Windows installation, I copied the font files to Ubuntu, and successfully installed the fonts. One step down, several more to go!

On the image processing side, we are currently evaluating a Python implementation of ‘font and background colour independent text binarization’, a technique pioneered by T Kasar, J Kumar and A G Ramakrishnan.

I modified the code to work with python3, in order to avoid discrepancies between the various modules of our project. Here is the link:

A web forum also suggested that the input images be enlarged or shrunk, in order to make the text legible. This task requires ImageMagick, a software that uses a CLI (command line interface) for image manipulation. Therefore, I downloaded a bunch of grayscale text images (with the desired font, of course), and decided to convert all of them to PNG.

For some reason, I’m not able to do so, and have failed to convert any of them.

As an example, here is a sample command:

magick convert gray25.gif gray25.png

This is the error message I get in Terminal:

No command 'magick' found, did you mean:

 Command 'magic' from package 'magic' (universe)

magick: command not found

I’ve tried re-installing ImageMagick several times, but to no avail. I need to go through yet more web forums for a solution to this problem.

What’s the scope?

This is a question almost everyone asks whenever I discuss my project. Indeed, it doesn’t look very promising at first sight, due to the tedious nature of the steps involved.

However, its scope is quite vast – ranging from preservation of ancient texts and languages to transliteration and transliteration of public signage, and converting street signs to audio for the visually impaired. In fact, it may be used as a last resort for driverless vehicles to navigate an area when GPS fails.

We are only limited by our imaginations. Once merged with technology, they can be used to achieve miracles!

External Links

1.Font and Background Color Independent Text Binarization; a research paper:

2.Perspective rectification of document images using fuzzy set and morphological operations; a research paper:

3.jTessBoxEditor; a how-to guide:

4.AptGet/HowTo; a how-to guide:

Using Oracle’s VirtualBox – A Review

Of late, I have been tinkering around with Ubuntu. The reason? I needed to work on a Python project, and wasn’t making much headway into it.

Being a Windows user, I was finding it difficult to install the required Python modules for my project. This was especially exasperating with SciPy, a library that’s a prerequisite for almost all Python programs. Its latest distribution, unfortunately, is compatible only with Linux.

At the same time, I was apprehensive of even touching Unix, since it’s always spelt doom for my PC. Dual-booting Windows with any Linux or Ubuntu distro had caused, in the past, many a computer to crash – right in front of my eyes.

Hence, I had to overcome my apprehensions, tap into the hitherto alien Unix environment, and work on my project from there –  whether I enjoyed it or not.

While scrolling the internet for solutions, I stumbled upon VirtualBox, a VM(Virtual Machine) software by Oracle. Upon going through a few tutorials, I decided to give it a go.

What’s a Virtual Machine?

A virtual machine is a software that allows emulation of an OS (operating system). This way, the user can control one OS, while working within another OS. You may think of it as a case of one OS nested within another.

It’s amusing to think, “What if I run a virtual machine within my virtual installation? Is infinite nesting of OSes allowed?”

Ideally, such an experiment would be possible. In reality, hardware limitations would render it futile, since emulation saps up a significant portion of the host OS’s resources, such as RAM and memory. The hardware has to be divided between itself and the nested (also called guest) OS, a situation very similar to a dual-boot option.

As an explanation, I shall now use this infographic.


It’s clear that with more number of OSes, each nested OS shall have very little computational power at its disposal. In fact, OS 7 is a mere shadow of C64 (Commodore 64), which is itself an obsolete system by today’s hardware standards, since the latter requires at least 64 KB of RAM to operate.

A review of the installation

Here’s one feature universally appreciated about VirtualBox – it allows hassle-free toggling between the guest and host OS (in my case, Ubuntu) – all with a simple click of the mouse button.

This is especially useful to me, since I’m a staunch Windows user, and can’t stand Ubuntu’s interface for too long. Sure, Ubuntu allows for quick development of program code, but when it comes to good UI (user interface), I feel that its developers should borrow some design tips from Windows 8.1, which is the OS currently installed on my PC.

In fact, here’s what it looks like, along with VirtualBox:

Since my hard drive has around 500 GB memory, and 6 GB RAM, I’ve found it convenient to run a fully installed (virtual) version of Ubuntu, with 20 GB memory and 1 GB RAM allocated to it.

So far, it’s working well for me, and am quite satisfied with it!

Related links 

The VirtualBox website:

Installing Ubuntu within Windows using VirtualBox; a how-to guide:

Sharing files between VirtualBox and host; a how-to guide:

Vintage Devices – The Gramophone

A month ago, my father and I embarked on a mission to restore our gramophone to working condition. Before detailing the process, allow me to explain a little more about this analog device.

Originally known as the phonograph, it is one of the earliest audio devices in the history of civilization. Thomas Edison introduced this ‘voice machine’ in the late nineteenth century, and was soon seen in the households of the rich and the prosperous.

Eventually, it was discarded, with the adoption of mass produced radios, and wireless broadcast systems such as AM and FM being put to use. However, it is important to know the history of this machine, in order to fully understand the working of audio devices.

The phonograph is a mechanical device that records and reproduces sound. It was invented by Thomas Edison in 1877, with his model capable of storing and playing sound from wax cylinders. A decade later, Emile Berliner patented the ‘Gramophone’, the most popular variant of the phonograph till date.

Note: In this article, I shall refer to the device that plays audio from flat discs as a gramophone, while the device that uses wax cylinders for the same as a phonograph. Since I neither possess a phonograph nor expect my audience to have one, I shall be discussing the gramophone in greater detail. Here’s a diagram of the two devices, side by side:

What makes it spin?

The turntable is basically a metal disc resting on a pivot. The mechanism that rotates it consists of an arrangement of gears, is housed in the gramophone’s wooden box, and uses the energy supplied by a hand crank.

An important part of the driving mechanism is the ‘governor’, which controls the turntable’s speed. This allows it to rotate uniformly at the set RPM (revolutions per minute). The governor consists of a worm gear, to which three weights are attached using steel bands.

The sound box

The sound box is a hollow circular box, with a steel needle attached to its bottom by a screw. Also known as the stylus, it moves along the circular groove of the vinyl disc, generating sound in accordance to the groove’s pits and bumps. The sound box is part of the ‘tone arm’, which allows the stylus to move freely along the groove.

The trumpet, also called the ‘horn’, is seen atop the gramophone, and acts as its amplifier. The device works perfectly fine even without one. In fact, the portable models completely forgo the horn, including cabinet-style gramophones.

Where’s the volume button?

One important thing to note is that this device doesn’t have any volume control. The reason is simple -this device was intended to be used in drawing rooms, especially to play music in the presence of company. It was only after radio receivers began appearing in the market that volume controls came into existence, as a way to amplify weak signals.


The first step was to take the machine apart, which involved removing the horn, dismantling the turntable, unscrewing the hand crank, the turntable’s shoe brake and speed setter, and finally retrieving the internal mechanism.

Upon close inspection, it was found that one of the governor’s weights had snapped from its steel band. My father decided to change all the three weights anyway, in order to evenly distribute the governor’s centrifugal force. Besides, it was only a matter of time before the remaining two would snap as well.

For the gramophone’s exterior, the wooden case was sent for repainting. All the brass parts – the horn, the tone arm and the sound box – were cleaned up with Brasso. The RPM setter and the braking shoe had a thick layer of rust on them, which had to be scraped off using emery paper.

The restoration efforts seem to be worth it, as the gramophone is now in working order. See for yourself!

Let’s play some music!

The only vinyl discs we have run at a much lower RPM – about 33 to 45 RPM. This is unfortunate, since the gramophone we restored has a range of 78 – 86 RPM. Hence, the discs play at about double speed, making the vocals stored on the records sound like squirrel squeaks.

Here are the vinyl discs, which I attempted to play:

The first record has 2 songs from the film Padosan(1968).

The second record has 3 songs from the film Aandhi(1975), despite being smaller in size.

We had many more discs – probably a dozen of them – which were smashed to smithereens by yours truly in her childhood.

Hopefully, we shall acquire a few more vinyl discs in the future, which are fit for playing on our gramophone.

Related links

Record players and phonographs; an article:

The invention of vinyl records; an article:

How Does the Gramophone Work; a forum post:

How Records Are Made; an article:

“Bob Maffit – Phonograph Restoration” (2000); a video:

Electron microscope slow-motion video of vinyl LP; a video:

Stereo Records vs. Mono Records; an article: