@i

I hear and I forget. I see and I remember. I do and I understand. — Chinese Proverb


Leave a comment

Download Youtube videos

This is not a new topic. But I have recently been looking for a simpler way to download Youtube videos I have bookmarked over the past (and currently growing rapidly). There are very useful tools already created including many many plugins, etc. I also came across mps-youtube and youtube-dl.

It is surprising how easy it is to write your own script using many opensource libraries already available. mps-youtube is based on Pafy. So I wrote my own python script to download the best quality videos from a list of youtube urls in a plain text file.

PyYoutubeDL code is available on github (https://github.com/ai8rahim/PyYoutubeDL).

 

Thaana Unicode Sheet


2 Comments

Dhivehi ASCII to Dhivehi Unicode

Thaana Unicode Sheet

Thaana Unicode Sheet

Now a days Dhivehi ASCII is almost unheard of as Unicode has been widely adopted for quite sometime, especially for content published on the Internet. However, there is still the case of legacy support and the mountains of content created by Accent Express, Recorder, and not to forget the repetitive pressing of the Left-Arrow key. Those were the good times when typing Dhivehi was an art and only mastered by an elite :).

So recently one old document came back to haunt us in a project that I was collaborating with a close friend of mine. I couldn’t bear to watch my friend re-type hundreds of lines all over again in Unicode. So I created a simple python script to convert the text file to its Unicode equivalent. The input has to be a text file, so the focus is only to convert the text, and not to create a fully word processed document.

The code is available on GitHub.


Leave a comment

Dual Boot and Virtualize same partition on Mac

I recently installed Linux Mint on my old MacBook and set it up to dual boot. I needed this for some tools that I was developing for my research. However all my documents were still on the Mac installation and going back and forth on the dual boot would be counter productive hence with some google magic, found out a guide by Whitson Gordon on how to run the dual boot installation on a virtual machine. He suggested using Parallels or VMWare Fusion (not free) for Mac, however I made some changes and used VirtualBox (free) instead. This is how:

  1. Firstly you would need the dual boot installation. There are several guides out there, I used the rEFIt method.
  2. Create the Virtual Machine (vmdk) file. I created this in a separate folder called “vmstuff” in my home directory.
    • You’d first need to get the id of the physical partition Linux is installed on. Run “diskutil list” without the quotes. It would be something like “disk0s4“.
    • Run the following command to create the vmdk (in my case mint.vmdk) file using the partition id found in the previous step. Please note “~” refers to your home directory in both Mac and Linux.
    • sudo VBoxManage internalcommands createrawvmdk -filename ~/vmstuff/mint.vmdk -rawdisk /dev/disk0s4
  3. Create the GRUB iso file.
    • You have to copy the required GRUB files in to a specific directory structure. So open the terminal and create a directory called “iso”, then another directory called “boot” inside “iso”, then another directory called “grub” inside “boot”. So you’d have ~/iso/boot/iso as the directory structure.
    • Now run the following two commands.
    • cp /usr/lib/grub/i386-pc/* ~/iso/boot/grub
    • cp /boot/grub/grub.cfg ~/iso/boot/grub
    • Customize the Grub menu by editing the grub.cfg file you copied. It’s your preference. Personally I don’t see any reason why you would even need a Grub menu for the VirtualMachine. Changing this file will not effect the original Grub menu that you see when you dual boot.
    • Now after editing the menu to your preference, create the iso file.
    • grub-mkrescue -o boot.iso ~/iso/
    • If you get an error saying “xorriso: not found“, then do an apt-get and install “xorriso“.
    • You would end up with an iso file called “boot.iso“, which you must copy back to the Mac installation. You can either mount the linux partition from Mac, or just use a flash drive to copy it back. If you chose to mount the Linux partition from Mac, make sure you unmount it before proceeding with step 4.
  4. Start the new virtual machine in VirtualBox.
    • Start VirtualBox, you would need sudo/root level to do the next steps.
    • Create a new virtual machine, choose the desired RAM, BUT choose to use an “Existing Virtual Hard Drive File”, and choose the vmdk file that you created in step 1, and press create.
    • In the storage settings, add the boot.iso file created in step 3 to the “Controller:IDE”.
    • Start the virtual machine and hopefully it would work.

For more details it would be worth having a look at the original article I used as my guide.

 


4 Comments

Thaana support for LaTex

LaTex is a typesetting system that is used to produce publication quality documents. LaTex is predominantly used by academics to produce technical/scientific/journal/conference papers. LaTex began its roots from Tex designed and developed by Donald Knuth around late 1970s. You can read more on the history from here. LaTex was developed by Lesslie Lamport around early 1980s.

LaTex is not for everyone. Most of us who are already very much used to wordprocessors such as Ms Word/LibreOffice Writer, the “point and click toolbar to get everything done” approach cannot be used. Everything is done using special markup or escape sequences of code. However, the benefit is that, you can be assured that the output and formatting is consistent all the time. The file is basically a plan text file, and the LaTex processor handles the processing and generation of the output (either PDF, PS, etc. format). More advantages and disadvantages here.

LaTex does not support Thaana typesetting natively. There are some packages such as bidi or babel that can be used for multilingual/right-to-left language support. LaTex uses packages to add features to the document. Similar to including library at the header when programming. But it wasn’t working out very well for Thaana. So XeTex was the preferred choice. Since I used XeTex for the task in hand, you must be wondering why I have been talking about LaTex all this while. The simple and honest answer is that most of us are familiar with LaTex and we often use it synonymously to refer to Tex style typesetting. Continue reading


6 Comments

OCR Dhiraagu E-Directory Captcha with ImageMagick and tesseract-ocr

Dhiraagu E-Directory requires you to enter the Captcha text from the randomly generated image before it searches the directory.

This is a simple control to ensure that you are a human being and not a nasty little program that automatically queries the directory.

I am aware that a few others have come up with small hacks to bypass this or to search through the directory by other means, so this is not what this post is about.

I simply wanted to check how well this captcha control is doing its job in fulfilling its purpose. The object is to challenge so that only a human is able to read and enter the text.

However, using two very simple tools, it is possible to automate the process of identifying the text without the use of a human.

This can be done in two simple steps, 1) perform a simple threshold to get rid of the noise and 2) use an OCR engine to read the text

Continue reading


5 Comments

Quran – Text Analysis

I recently wrote a simple Python script to analyze the Quran Text to determine the occurrences of individual Arabic characters.

The Quran Text was obtained from http://tanzil.net and the format used was the “Simple Clean” XML format – excluding the pause marks, sajdhah signs, rub-el-hizb signs, and superscript alefs.

I haven’t done much Python programming before, so wanted to get more familiar with it. While developing this script, I was amazed at how easy it was to get the job done. The script simply parses the XML and iterates through the chapters, verses and letters to narrow down to each letter. These letters are then added to a Counter (new in Python 2.7), which is a data structure that adds unique elements and then increments the duplicates. This made it ideal for my task. And there were no complexities involved even though all the content was in Unicode. I plan to experiment further, so there could be more versions in the future.

The following are the results, excluding the white-space count. I exported to Excel to generate a graph to make it look much prettier 🙂

Continue reading


1 Comment

HTML DOM USING .NET

By Ahmed Ibrahim

Today the software development landscape has evolved significantly with the proliferation of Web technologies. Thus a majority of applications developed have some form of connectivity or integration with another application, web service, web application, remote database, etc.

This article will therefore try to touch one specific area, which is HTML content and DOM. And in doing so will investigate two approaches available in .Net which can be used to fuse these two for some practical purpose.

Examples provided are based on .Net code and libraries. However, the concepts remain the same for HTML and DOM are independent from any programming language. This article is not exhaustive in any manner however references are provided for those seeking a more in depth coverage.

Demo Application Screenshot

Full Article on Code Project…