I recently wrote a simple Python script to analyze the Quran Text to determine the occurrences of individual Arabic characters.
The Quran Text was obtained from http://tanzil.net and the format used was the “Simple Clean” XML format – excluding the pause marks, sajdhah signs, rub-el-hizb signs, and superscript alefs.
I haven’t done much Python programming before, so wanted to get more familiar with it. While developing this script, I was amazed at how easy it was to get the job done. The script simply parses the XML and iterates through the chapters, verses and letters to narrow down to each letter. These letters are then added to a Counter (new in Python 2.7), which is a data structure that adds unique elements and then increments the duplicates. This made it ideal for my task. And there were no complexities involved even though all the content was in Unicode. I plan to experiment further, so there could be more versions in the future.
The following are the results, excluding the white-space count. I exported to Excel to generate a graph to make it look much prettier 🙂