@i

I hear and I forget. I see and I remember. I do and I understand. — Chinese Proverb

Category: Information Technology

Thaana support for LaTex

LaTex is a typesetting system that is used to produce publication quality documents. LaTex is predominantly used by academics to produce technical/scientific/journal/conference papers. LaTex began its roots from Tex designed and developed by Donald Knuth around late 1970s. You can read more on the history from here. LaTex was developed by Lesslie Lamport around early 1980s.

LaTex is not for everyone. Most of us who are already very much used to wordprocessors such as Ms Word/LibreOffice Writer, the “point and click toolbar to get everything done” approach cannot be used. Everything is done using special markup or escape sequences of code. However, the benefit is that, you can be assured that the output and formatting is consistent all the time. The file is basically a plan text file, and the LaTex processor handles the processing and generation of the output (either PDF, PS, etc. format). More advantages and disadvantages here.

LaTex does not support Thaana typesetting natively. There are some packages such as bidi or babel that can be used for multilingual/right-to-left language support. LaTex uses packages to add features to the document. Similar to including library at the header when programming. But it wasn’t working out very well for Thaana. So XeTex was the preferred choice. Since I used XeTex for the task in hand, you must be wondering why I have been talking about LaTex all this while. The simple and honest answer is that most of us are familiar with LaTex and we often use it synonymously to refer to Tex style typesetting. Read the rest of this entry »

OCR Dhiraagu E-Directory Captcha with ImageMagick and tesseract-ocr

Dhiraagu E-Directory requires you to enter the Captcha text from the randomly generated image before it searches the directory.

This is a simple control to ensure that you are a human being and not a nasty little program that automatically queries the directory.

I am aware that a few others have come up with small hacks to bypass this or to search through the directory by other means, so this is not what this post is about.

I simply wanted to check how well this captcha control is doing its job in fulfilling its purpose. The object is to challenge so that only a human is able to read and enter the text.

However, using two very simple tools, it is possible to automate the process of identifying the text without the use of a human.

This can be done in two simple steps, 1) perform a simple threshold to get rid of the noise and 2) use an OCR engine to read the text

Read the rest of this entry »

20 years of Linux

Quran – Text Analysis

I recently wrote a simple Python script to analyze the Quran Text to determine the occurrences of individual Arabic characters.

The Quran Text was obtained from http://tanzil.net and the format used was the “Simple Clean” XML format – excluding the pause marks, sajdhah signs, rub-el-hizb signs, and superscript alefs.

I haven’t done much Python programming before, so wanted to get more familiar with it. While developing this script, I was amazed at how easy it was to get the job done. The script simply parses the XML and iterates through the chapters, verses and letters to narrow down to each letter. These letters are then added to a Counter (new in Python 2.7), which is a data structure that adds unique elements and then increments the duplicates. This made it ideal for my task. And there were no complexities involved even though all the content was in Unicode. I plan to experiment further, so there could be more versions in the future.

The following are the results, excluding the white-space count. I exported to Excel to generate a graph to make it look much prettier :)

Read the rest of this entry »

HTML DOM USING .NET

By Ahmed Ibrahim

Today the software development landscape has evolved significantly with the proliferation of Web technologies. Thus a majority of applications developed have some form of connectivity or integration with another application, web service, web application, remote database, etc.

This article will therefore try to touch one specific area, which is HTML content and DOM. And in doing so will investigate two approaches available in .Net which can be used to fuse these two for some practical purpose.

Examples provided are based on .Net code and libraries. However, the concepts remain the same for HTML and DOM are independent from any programming language. This article is not exhaustive in any manner however references are provided for those seeking a more in depth coverage.

Demo Application Screenshot

Full Article on Code Project…

Free[dom] Licenses

If you are interested in distributing or uploading code, documents, etc., under the claim that it’s free, you might find the following link interesting.

http://www.codeproject.com/info/Licenses.aspx

My first Py

A simple python script to retrieve prayer times form islamicfinder.com. Simply replace the prayerUrl variable to change the location to your preference.

Read the rest of this entry »

Follow

Get every new post delivered to your Inbox.

Join 82 other followers