Monday, September 19, 2005

Hindi on the web

My latest fascination for the last few days has been my mother tongue, Hindi on my computer. I've always been reading Hindi websites on and off, especially news sites like BBC Hindi. As for creating documents or typing in Hindi, I had done it very rarely using hard coded fonts like Shusha.

The urge to communicate fully in Hindi has never really left my mind. Even after I read this great article on Unicode, though, I still thought that sites like BBC Hindi used fonts like Shusha for which basically map ASCII english letters to Hindi letters. A couple of days ago, though, I clicked to see the HTML source of BBC Hindi and this is what I saw:

The Hindi text was actually part of the plain-text HTML file, and that gave me jolt, and the vague ideas in my head about Unicode started slowly lighting up. I searched furiously for some time and figured out how to enable and use the Devanagari keyboard input system on my Powerbok. Then I tried searching the net for one of my favourite Hindi poems - Basanti Hawa - and the result absolutely delighted me.

So I could now search Google to search in Hindi and I'd get results back - all thanks to the Unicode system which basically just gives the Hindi alphabet its own place in a character set which allows for the usual web page indexing means. I then discovered the Hindi wikipedia which actually has Hindi in its URL line, and is of course, completely searchable. Since then I've been reading up on Hindi efforts, chatting with friends and parents in Hindi and even blogging on the Shaayari blog in actual Hindi, and enjoying myself immensely, much to the bewilderment of many people around me.

There remain some problems for Devanagari script (which is the basis for Hindi/Marathi/Sanskrit and even Nepali). The biggest problem is kerning, or display of vowels in the correct manner. Unicode Devanagari strings have the vowels of every letter in a word after the letter itself. Although the vowel or "matra" after the first letter is typed and positioned after the letter itself, it is (or should be) rendered as appearing before the first letter. Your rendering system has to be programmed to recognize this and render it accordingly. Mac OS X and most of its applications do this perfectly, and it appears to work quite well under Windows as well (or so my parents report), and I'm pretty sure I've seen Fedora Linux systems that render the fonts correctly. Even on the Mac though, there are some programs that don't render Devanagari fonts correctly. One surprising culprit is Microsoft Office for Mac. Here's how MS Word displays the word "Dil":

And this is the correct version (a screenshot from Text Edit):

Another surprising culprit of bad rendering is Mozilla Firefox for Mac. I've filed a bug and it appears that a number of people are looking into fixing the problem (the bug was instantly marked as a duplicate of several related bugs). So for now, I have to use Safari for Hindi website viewing, which renders flawlessly.

Well, thats another day, another discovery, another language to type in.