I hear and I forget. I see and I remember. I do and I understand. — Chinese Proverb


OCR Dhiraagu E-Directory Captcha with ImageMagick and tesseract-ocr

Dhiraagu E-Directory requires you to enter the Captcha text from the randomly generated image before it searches the directory.

This is a simple control to ensure that you are a human being and not a nasty little program that automatically queries the directory.

I am aware that a few others have come up with small hacks to bypass this or to search through the directory by other means, so this is not what this post is about.

I simply wanted to check how well this captcha control is doing its job in fulfilling its purpose. The object is to challenge so that only a human is able to read and enter the text.

However, using two very simple tools, it is possible to automate the process of identifying the text without the use of a human.

This can be done in two simple steps, 1) perform a simple threshold to get rid of the noise and 2) use an OCR engine to read the text

Continue reading