I hear and I forget. I see and I remember. I do and I understand. — Chinese Proverb

Thaana Unicode Sheet


Dhivehi ASCII to Dhivehi Unicode

Thaana Unicode Sheet

Thaana Unicode Sheet

Now a days Dhivehi ASCII is almost unheard of as Unicode has been widely adopted for quite sometime, especially for content published on the Internet. However, there is still the case of legacy support and the mountains of content created by Accent Express, Recorder, and not to forget the repetitive pressing of the Left-Arrow key. Those were the good times when typing Dhivehi was an art and only mastered by an elite :).

So recently one old document came back to haunt us in a project that I was collaborating with a close friend of mine. I couldn’t bear to watch my friend re-type hundreds of lines all over again in Unicode. So I created a simple python script to convert the text file to its Unicode equivalent. The input has to be a text file, so the focus is only to convert the text, and not to create a fully word processed document.

The code is available on GitHub.


Quran – Text Analysis

I recently wrote a simple Python script to analyze the Quran Text to determine the occurrences of individual Arabic characters.

The Quran Text was obtained from http://tanzil.net and the format used was the “Simple Clean” XML format – excluding the pause marks, sajdhah signs, rub-el-hizb signs, and superscript alefs.

I haven’t done much Python programming before, so wanted to get more familiar with it. While developing this script, I was amazed at how easy it was to get the job done. The script simply parses the XML and iterates through the chapters, verses and letters to narrow down to each letter. These letters are then added to a Counter (new in Python 2.7), which is a data structure that adds unique elements and then increments the duplicates. This made it ideal for my task. And there were no complexities involved even though all the content was in Unicode. I plan to experiment further, so there could be more versions in the future.

The following are the results, excluding the white-space count. I exported to Excel to generate a graph to make it look much prettier 🙂

Continue reading