Text Detection using Tesseract

Since the past couple of months, me and my colleague have been working on a research project.

The goal is simple – detect characters from a real-world image. However, the intermediate steps involved don’t make the task as straightforward as you might think!

Before discussing the technicalities of the project, it’s important to know what OCR is.

OCR – the heart of text detection

Short for Optical Character Recognition, it is used to identify glyphs – be it handwritten or printed. This way, all glyphs are detected and are separately assigned a character by the computer.

While OCR has gained traction in recent times, is not a new concept. In fact, it is this very technology that bank employees use to read cheques and bank statements.

For this project we chose Tesseract as our OCR engine. It has been developed by Google, and is what is used in their Google Keep app to convert images to text.

The project’s nitty-gritties               

We have limited our scope to printed text – specifically, street signs – and are attempting to convert the captured images to .txt files. This is how our code is intended to work:

If it works, then it would be possible to scale down the file size – a  very handy tool for storing names of places in smart-phones, which always come equipped with a camera these days. Ideally, such a task would be easy to accomplish, with perfect lighting, no perspective distortions or warping, and no background noise.

Reality, unsurprisingly, is quite the opposite. Hence, we are trying to process the images before feeding them to Tesseract, which is known to work best with binary (black and white) images.

According to our plan, we shall implement a three-step method:

  1. remove perspective distortion from the image
  2. binarize the image
  3. pass the image through Tesseract

Training the Tesseract engine

Before processing the images, the OCR engine needs to be ‘trained’ in order to work properly. For this reason, I downloaded jTessBoxEditor – a Java program for editing boxfiles (files generated by Tesseract when detecting glyphs). Since the project uses Ubuntu’s OS, I had to download and install Java Runtime Environment (JRE) to run jTessBoxEditor.

Since my portion of the project involves training the engine, I need to generate sample data for it. The engine needs to be fed samples of Times New Roman, Calibri, and Arial – the three fonts we came across in our images.

Our progress so far

Tesseract is still being trained, and the sample data is yet to be generated. After a while, realizing that these fonts would be available in my Windows installation, I copied the font files to Ubuntu, and successfully installed the fonts. One step down, several more to go!

On the image processing side, we are currently evaluating a Python implementation of ‘font and background colour independent text binarization’, a technique pioneered by T Kasar, J Kumar and A G Ramakrishnan.

I modified the code to work with python3, in order to avoid discrepancies between the various modules of our project. Here is the link:


A web forum also suggested that the input images be enlarged or shrunk, in order to make the text legible. This task requires ImageMagick, a software that uses a CLI (command line interface) for image manipulation. Therefore, I downloaded a bunch of grayscale text images (with the desired font, of course), and decided to convert all of them to PNG.

For some reason, I’m not able to do so, and have failed to convert any of them.

As an example, here is a sample command:

magick convert gray25.gif gray25.png

This is the error message I get in Terminal:

No command 'magick' found, did you mean:

 Command 'magic' from package 'magic' (universe)

magick: command not found

I’ve tried re-installing ImageMagick several times, but to no avail. I need to go through yet more web forums for a solution to this problem.

What’s the scope?

This is a question almost everyone asks whenever I discuss my project. Indeed, it doesn’t look very promising at first sight, due to the tedious nature of the steps involved.

However, its scope is quite vast – ranging from preservation of ancient texts and languages to transliteration and transliteration of public signage, and converting street signs to audio for the visually impaired. In fact, it may be used as a last resort for driverless vehicles to navigate an area when GPS fails.

We are only limited by our imaginations. Once merged with technology, they can be used to achieve miracles!

External Links

1.Font and Background Color Independent Text Binarization; a research paper:


2.Perspective rectification of document images using fuzzy set and morphological operations; a research paper:


3.jTessBoxEditor; a how-to guide:


4.AptGet/HowTo; a how-to guide:



Using Oracle’s VirtualBox – A Review

Of late, I have been tinkering around with Ubuntu. The reason? I needed to work on a Python project, and wasn’t making much headway into it.

Being a Windows user, I was finding it difficult to install the required Python modules for my project. This was especially exasperating with SciPy, a library that’s a prerequisite for almost all Python programs. Its latest distribution, unfortunately, is compatible only with Linux.

At the same time, I was apprehensive of even touching Unix, since it’s always spelt doom for my PC. Dual-booting Windows with any Linux or Ubuntu distro had caused, in the past, many a computer to crash – right in front of my eyes.

Hence, I had to overcome my apprehensions, tap into the hitherto alien Unix environment, and work on my project from there –  whether I enjoyed it or not.

While scrolling the internet for solutions, I stumbled upon VirtualBox, a VM(Virtual Machine) software by Oracle. Upon going through a few tutorials, I decided to give it a go.

What’s a Virtual Machine?

A virtual machine is a software that allows emulation of an OS (operating system). This way, the user can control one OS, while working within another OS. You may think of it as a case of one OS nested within another.

It’s amusing to think, “What if I run a virtual machine within my virtual installation? Is infinite nesting of OSes allowed?”

Ideally, such an experiment would be possible. In reality, hardware limitations would render it futile, since emulation saps up a significant portion of the host OS’s resources, such as RAM and memory. The hardware has to be divided between itself and the nested (also called guest) OS, a situation very similar to a dual-boot option.

As an explanation, I shall now use this infographic.


It’s clear that with more number of OSes, each nested OS shall have very little computational power at its disposal. In fact, OS 7 is a mere shadow of C64 (Commodore 64), which is itself an obsolete system by today’s hardware standards, since the latter requires at least 64 KB of RAM to operate.

A review of the installation

Here’s one feature universally appreciated about VirtualBox – it allows hassle-free toggling between the guest and host OS (in my case, Ubuntu) – all with a simple click of the mouse button.

This is especially useful to me, since I’m a staunch Windows user, and can’t stand Ubuntu’s interface for too long. Sure, Ubuntu allows for quick development of program code, but when it comes to good UI (user interface), I feel that its developers should borrow some design tips from Windows 8.1, which is the OS currently installed on my PC.

In fact, here’s what it looks like, along with VirtualBox:

Since my hard drive has around 500 GB memory, and 6 GB RAM, I’ve found it convenient to run a fully installed (virtual) version of Ubuntu, with 20 GB memory and 1 GB RAM allocated to it.

So far, it’s working well for me, and am quite satisfied with it!

Related links 

The VirtualBox website:


Installing Ubuntu within Windows using VirtualBox; a how-to guide:


Sharing files between VirtualBox and host; a how-to guide: