Text Detection using Tesseract

Since the past couple of months, me and my colleague have been working on a research project.

The goal is simple – detect characters from a real-world image. However, the intermediate steps involved don’t make the task as straightforward as you might think!

Before discussing the technicalities of the project, it’s important to know what OCR is.

OCR – the heart of text detection

Short for Optical Character Recognition, it is used to identify glyphs – be it handwritten or printed. This way, all glyphs are detected and are separately assigned a character by the computer.

While OCR has gained traction in recent times, is not a new concept. In fact, it is this very technology that bank employees use to read cheques and bank statements.

For this project we chose Tesseract as our OCR engine. It has been developed by Google, and is what is used in their Google Keep app to convert images to text.

The project’s nitty-gritties               

We have limited our scope to printed text – specifically, street signs – and are attempting to convert the captured images to .txt files. This is how our code is intended to work:

If it works, then it would be possible to scale down the file size – a  very handy tool for storing names of places in smart-phones, which always come equipped with a camera these days. Ideally, such a task would be easy to accomplish, with perfect lighting, no perspective distortions or warping, and no background noise.

Reality, unsurprisingly, is quite the opposite. Hence, we are trying to process the images before feeding them to Tesseract, which is known to work best with binary (black and white) images.

According to our plan, we shall implement a three-step method:

  1. remove perspective distortion from the image
  2. binarize the image
  3. pass the image through Tesseract

Training the Tesseract engine

Before processing the images, the OCR engine needs to be ‘trained’ in order to work properly. For this reason, I downloaded jTessBoxEditor – a Java program for editing boxfiles (files generated by Tesseract when detecting glyphs). Since the project uses Ubuntu’s OS, I had to download and install Java Runtime Environment (JRE) to run jTessBoxEditor.

Since my portion of the project involves training the engine, I need to generate sample data for it. The engine needs to be fed samples of Times New Roman, Calibri, and Arial – the three fonts we came across in our images.

Our progress so far

Tesseract is still being trained, and the sample data is yet to be generated. After a while, realizing that these fonts would be available in my Windows installation, I copied the font files to Ubuntu, and successfully installed the fonts. One step down, several more to go!

On the image processing side, we are currently evaluating a Python implementation of ‘font and background colour independent text binarization’, a technique pioneered by T Kasar, J Kumar and A G Ramakrishnan.

I modified the code to work with python3, in order to avoid discrepancies between the various modules of our project. Here is the link:


A web forum also suggested that the input images be enlarged or shrunk, in order to make the text legible. This task requires ImageMagick, a software that uses a CLI (command line interface) for image manipulation. Therefore, I downloaded a bunch of grayscale text images (with the desired font, of course), and decided to convert all of them to PNG.

For some reason, I’m not able to do so, and have failed to convert any of them.

As an example, here is a sample command:

magick convert gray25.gif gray25.png

This is the error message I get in Terminal:

No command 'magick' found, did you mean:

 Command 'magic' from package 'magic' (universe)

magick: command not found

I’ve tried re-installing ImageMagick several times, but to no avail. I need to go through yet more web forums for a solution to this problem.

What’s the scope?

This is a question almost everyone asks whenever I discuss my project. Indeed, it doesn’t look very promising at first sight, due to the tedious nature of the steps involved.

However, its scope is quite vast – ranging from preservation of ancient texts and languages to transliteration and transliteration of public signage, and converting street signs to audio for the visually impaired. In fact, it may be used as a last resort for driverless vehicles to navigate an area when GPS fails.

We are only limited by our imaginations. Once merged with technology, they can be used to achieve miracles!

External Links

1.Font and Background Color Independent Text Binarization; a research paper:


2.Perspective rectification of document images using fuzzy set and morphological operations; a research paper:


3.jTessBoxEditor; a how-to guide:


4.AptGet/HowTo; a how-to guide:



Duolingo – an App Review

I recently acquired a brand-new phone – a Samsung Galaxy J7, as a replacement for my previous Nokia C6-01 smart-phone. The reason is pretty simple – I wasn’t able to install any apps on my Nokia phone, since its Symbian OS is not compatible with .apk files (the file extension for Android apps).

The first thing I did with my new phone was to install a few apps – Duolingo being one of them. Since I had come across multiple recommendations for this app, I decided to give it a try. Besides, I was looking for ways to improve my language proficiency in Urdu and Japanese.

Having used the app for a little while now, I feel that it deserves a review of its own – hence this article!

The interface – first impressions

One feature I really admire about Duolingo is its UI (user interface) – clean, simple and intuitive. When the app is opened the first time, the user is greeted with a plethora of options to choose from – German, Korean, English, Russian, and Japanese, to name a few. Depending upon the user’s language preferences, it offers these languages in different instruction modes.

Since my preferred language is English, I scrolled through the section for English speakers. To my dismay, I couldn’t find Urdu listed under any section, let alone the English section. However, it did list Japanese, which I decided to try out.

The UX (user experience)

Once a course is selected, the user is redirected to a test pertaining to the language. This is completed only after correctly answering a certain number of questions, following which some XP is earned, and a few ‘lingots’ – the currency used for purchases from the ‘Shop’.

Each ‘skill’, indicated by an egg icon, comprises of a number of tests, which must be completed in a similar fashion. Each test has multiple choice questions, translation tasks (audio and/or text), and word-match questions. The more questions the user answers correctly in a row, the more XP and lingots he or she earns.

While it may be used without registration, things get a little tricky when the user wishes to save his or her progress. In that case, app registration is required.

However, once registered, users are allowed to join a language club. These clubs have weekly leaderboards, which effectively gamify the app by creating an atmosphere of competitiveness.

Improving the app

If you’re looking for an app to learn languages in the form of a casual ‘game’, then Duolingo is the way to go. However, I wasn’t quite satisfied with the app, and probably had unrealistically high expectations from it.

In order to truly learn a language, one must not only read and listen to it, but also write it, and speak it. While I don’t mind jotting down words in a notebook, I don’t know whether my handwriting is legible or not. If there was a ‘capture’ feature in Duolingo to detect and identify text, it would be a big help in improving my Japanese handwriting.

When it comes to speaking the language, it is tough to comprehend the pronunciations correctly, even with audio read-outs of displayed words. For this, I suggest that IPA transcriptions be added to every word, and get the app to read out those transcriptions. This will go a long way in making the app’s experience more fulfilling.

Related links

  1. Duolingo on Google Play; the app:


  1. IPA transcriptions in Duolingo; a GitHub repo :


  1. Recognizing handwritten glyphs; a research paper:


Using Oracle’s VirtualBox – A Review

Of late, I have been tinkering around with Ubuntu. The reason? I needed to work on a Python project, and wasn’t making much headway into it.

Being a Windows user, I was finding it difficult to install the required Python modules for my project. This was especially exasperating with SciPy, a library that’s a prerequisite for almost all Python programs. Its latest distribution, unfortunately, is compatible only with Linux.

At the same time, I was apprehensive of even touching Unix, since it’s always spelt doom for my PC. Dual-booting Windows with any Linux or Ubuntu distro had caused, in the past, many a computer to crash – right in front of my eyes.

Hence, I had to overcome my apprehensions, tap into the hitherto alien Unix environment, and work on my project from there –  whether I enjoyed it or not.

While scrolling the internet for solutions, I stumbled upon VirtualBox, a VM(Virtual Machine) software by Oracle. Upon going through a few tutorials, I decided to give it a go.

What’s a Virtual Machine?

A virtual machine is a software that allows emulation of an OS (operating system). This way, the user can control one OS, while working within another OS. You may think of it as a case of one OS nested within another.

It’s amusing to think, “What if I run a virtual machine within my virtual installation? Is infinite nesting of OSes allowed?”

Ideally, such an experiment would be possible. In reality, hardware limitations would render it futile, since emulation saps up a significant portion of the host OS’s resources, such as RAM and memory. The hardware has to be divided between itself and the nested (also called guest) OS, a situation very similar to a dual-boot option.

As an explanation, I shall now use this infographic.


It’s clear that with more number of OSes, each nested OS shall have very little computational power at its disposal. In fact, OS 7 is a mere shadow of C64 (Commodore 64), which is itself an obsolete system by today’s hardware standards, since the latter requires at least 64 KB of RAM to operate.

A review of the installation

Here’s one feature universally appreciated about VirtualBox – it allows hassle-free toggling between the guest and host OS (in my case, Ubuntu) – all with a simple click of the mouse button.

This is especially useful to me, since I’m a staunch Windows user, and can’t stand Ubuntu’s interface for too long. Sure, Ubuntu allows for quick development of program code, but when it comes to good UI (user interface), I feel that its developers should borrow some design tips from Windows 8.1, which is the OS currently installed on my PC.

In fact, here’s what it looks like, along with VirtualBox:

Since my hard drive has around 500 GB memory, and 6 GB RAM, I’ve found it convenient to run a fully installed (virtual) version of Ubuntu, with 20 GB memory and 1 GB RAM allocated to it.

So far, it’s working well for me, and am quite satisfied with it!

Related links 

The VirtualBox website:


Installing Ubuntu within Windows using VirtualBox; a how-to guide:


Sharing files between VirtualBox and host; a how-to guide:


Vintage Devices – The Gramophone

A month ago, my father and I embarked on a mission to restore our gramophone to working condition. Before detailing the process, allow me to explain a little more about this analog device.

Originally known as the phonograph, it is one of the earliest audio devices in the history of civilization. Thomas Edison introduced this ‘voice machine’ in the late nineteenth century, and was soon seen in the households of the rich and the prosperous.

Eventually, it was discarded, with the adoption of mass produced radios, and wireless broadcast systems such as AM and FM being put to use. However, it is important to know the history of this machine, in order to fully understand the working of audio devices.

The phonograph is a mechanical device that records and reproduces sound. It was invented by Thomas Edison in 1877, with his model capable of storing and playing sound from wax cylinders. A decade later, Emile Berliner patented the ‘Gramophone’, the most popular variant of the phonograph till date.

Note: In this article, I shall refer to the device that plays audio from flat discs as a gramophone, while the device that uses wax cylinders for the same as a phonograph. Since I neither possess a phonograph nor expect my audience to have one, I shall be discussing the gramophone in greater detail. Here’s a diagram of the two devices, side by side:

What makes it spin?

The turntable is basically a metal disc resting on a pivot. The mechanism that rotates it consists of an arrangement of gears, is housed in the gramophone’s wooden box, and uses the energy supplied by a hand crank.

An important part of the driving mechanism is the ‘governor’, which controls the turntable’s speed. This allows it to rotate uniformly at the set RPM (revolutions per minute). The governor consists of a worm gear, to which three weights are attached using steel bands.

The sound box

The sound box is a hollow circular box, with a steel needle attached to its bottom by a screw. Also known as the stylus, it moves along the circular groove of the vinyl disc, generating sound in accordance to the groove’s pits and bumps. The sound box is part of the ‘tone arm’, which allows the stylus to move freely along the groove.

The trumpet, also called the ‘horn’, is seen atop the gramophone, and acts as its amplifier. The device works perfectly fine even without one. In fact, the portable models completely forgo the horn, including cabinet-style gramophones.

Where’s the volume button?

One important thing to note is that this device doesn’t have any volume control. The reason is simple -this device was intended to be used in drawing rooms, especially to play music in the presence of company. It was only after radio receivers began appearing in the market that volume controls came into existence, as a way to amplify weak signals.


The first step was to take the machine apart, which involved removing the horn, dismantling the turntable, unscrewing the hand crank, the turntable’s shoe brake and speed setter, and finally retrieving the internal mechanism.

Upon close inspection, it was found that one of the governor’s weights had snapped from its steel band. My father decided to change all the three weights anyway, in order to evenly distribute the governor’s centrifugal force. Besides, it was only a matter of time before the remaining two would snap as well.

For the gramophone’s exterior, the wooden case was sent for repainting. All the brass parts – the horn, the tone arm and the sound box – were cleaned up with Brasso. The RPM setter and the braking shoe had a thick layer of rust on them, which had to be scraped off using emery paper.

The restoration efforts seem to be worth it, as the gramophone is now in working order. See for yourself!

Let’s play some music!

The only vinyl discs we have run at a much lower RPM – about 33 to 45 RPM. This is unfortunate, since the gramophone we restored has a range of 78 – 86 RPM. Hence, the discs play at about double speed, making the vocals stored on the records sound like squirrel squeaks.

Here are the vinyl discs, which I attempted to play:

The first record has 2 songs from the film Padosan(1968).

The second record has 3 songs from the film Aandhi(1975), despite being smaller in size.

We had many more discs – probably a dozen of them – which were smashed to smithereens by yours truly in her childhood.

Hopefully, we shall acquire a few more vinyl discs in the future, which are fit for playing on our gramophone.

Related links

Record players and phonographs; an article:


The invention of vinyl records; an article:


How Does the Gramophone Work; a forum post:


How Records Are Made; an article:


“Bob Maffit – Phonograph Restoration” (2000); a video:


Electron microscope slow-motion video of vinyl LP; a video:


Stereo Records vs. Mono Records; an article:


Microcontrolling an LCD & LED

Ever wondered how a display system works? Right from the traffic lights seen on busy roads, to the laptop or mobile display on which you are currently reading this blog, the tech involved may seem daunting at the surface, but its logic is relatively simple.

In fact, the simplicity might appeal to your curiosity, setting you on the path towards your very first electronics project, just like me. Well then, let’s dive in!

LCDs and LEDs – what are they?

Note: The LCD discussed here is a screen that displays characters, while the LED I have used is a simple diode, with two terminals. It’s different from LED displays, which is another category of flat-panel screens.

LCD is short for Liquid Crystal Display, and is most commonly seen in handheld calculators. It basically consists of a layer of liquid crystal sandwiched between two polarizing sheets. These sheets must be oriented at right angles to each other, otherwise the display won’t work. Its lowermost layer is either a mirror or an LED panel (if it’s backlit). To avoid confusion, I shall only discuss reflective layer LCDs here.

These displays are divided into cells, whose liquid crystal is individually controlled. In the OFF state, the liquid crystal is in a helical configuration, allowing light entering the top polarizer to pass through the second polarizer as well, resulting in a blank screen. Once it enters the ON state, the liquid begins to ‘untwist’, causing light to get blocked by the second polarizer, making the cell appear black in colour.

On the other hand, the LED (or Light Emitting Diode) is a two terminal device that is relatively easy to use, and may be plugged into a circuit, just like any other component. Being the latest advancement in indoor lighting solutions, it significantly improves over incandescent and fluorescent technologies, being more energy efficient than them.

Though this project uses both an LCD and an LED, I have laid more emphasis on the former’s functioning, as it requires more inputs for setting the cursor position, and ensuring that the output text is displayed in the way intended. The LED is simply a blinking bulb in this project.

Enter Arduino – the microcontroller

The Arduino project can be traced back to 2003, with Masimmo Benzi, along with fellow students at the Interaction Design Institute Ivrea, attempting to create a range of microcontrollers that is economical for students and professionals. Today, it is a leading manufacturer of open source hardware, and has a wide consumer base.

From what I read, it’s useful in creating a large number of projects, which I shall put to test, starting with this project. Here, I have used an Arduino Uno to control the LCD screen and LED bulb’s behaviour.

Assembling the hardware

The main components used in this project are: an LED(3V), a 220 Ω resistor, a 16×2 LCD character display (Hitachi HD44780), a solderless breadboard, a 10 KΩ potentiometer(for brightness control), an Arduino Uno, and an A/B USB 2.0 cable (for connecting the Arduino board to the computer).

A few optional but useful tools include: a table lamp, a wire cutter, a penknife, and a pair of tweezers (to pull out wire stubs in case a wire snaps).

Here’s the circuit’s breadboard view:

The data inputs of the LCD screen are assigned values from D0 to D7. In this project, I have used the 4-bit mode of operation, since only 4 data lines have been used.

It is recommended that the wires to the character display be soldered, as it avoids unnecessary data loss at its input pins.

Getting the code right

You may obtain the program code via this link to my GitHub repository:


The Arduino IDE is required to compile the program, and upload it to the microcontroller board. Here is the link to its download page:


Fire up the sytem!

Finally, the A/B cable is connected to the computer’s USB port. Once the Arduino board has been correctly identified, along with its COM port, the program is uploaded, and the result obtained is something like this:

Notice that the Arduino board’s built-in LED (the dot next to the red LED bulb) flashes at the same frequency as the latter.

The potentiometer’s knob will require some toggling in order to get the correct brightness for the LCD screen.

Change the arguments of delay(), print() and setCursor(), and see how it alters the output. Is it expected, unusual, or dramatic?

Further, if I want a scrollable display, what hardware/software modifications do I need? Leave your suggestions in the comment section!

Related Links

Fundamentals of Liquid Crystal Display; a white paper:


The History of the Light Bulb; an article:


LED Basics; a video:


Arduino – Troubleshooting: