[ I am still using proxies to access blogspot. So I have designed this logo and decided to carry it in all my future posts as a mark of protest till VSNL gives me the reason for blocking or removes the blockade. ] “For me it is a matter of pride to see my language on web” thundered a Professor when internationalization (i18n) and localization (l10n) were being discussed in Indian context. I think that sums up the reason why we need l10n and i18n here. But we are failing to understand the complexities of things. Here I try to present some of my concerns regarding these issues. As I remember there was a survey done by IIIT Hyderabad (I am not sure whether it was Hyderabad or Bangalore) about the market for Indian language software. This was in 1999. It concluded that the market was so small that commercial players may not be interested in that. The reason quoted was huge investment needed for research & efforts for standardization, font rendering etc. And it is not that standards were not present then. ISCII was prevalent at that time and it had provided standards for representation of several Indian languages. But even then there were not many takers for Indian language software except perhaps the DTP guys and universities engaged in linguistic studies. Compared to those days, today we have a host of Indian language solutions floating around. Though word processors (and generally the office suites) are found more than any other class of software, somewhere the wheel has stated moving. Does this mean there is a new market now for these things? I would love to think so. Because of explosion in Indian media combined with increasing access to Internet, the demand for representing content in Indian languages is rising. And there is small market out there for Indian languages in mobile phones. I am pretty sure this one will grow in coming days. It is true that Indian language computing does not come anywhere near the market for English. And this is the reason I often hear from private players when asked about their interest in Indian language software. But that does not mean that there is NO market at all. There is an intuitive feeling in me that this is similar to “rail paradox”. For uninitiated, railway paradox is described thus. The people of a town appealed to railways to stop the train in their station. Railways sent a team to study whether people were really waiting for that train in the station. Now since the train was not stopping there, there were no waiters. And since there was no one waiting, the railways decided not to stop the train there. This is how we are dealing with Indian language computing. Like the people in the town, even we are waiting for language tools. And software companies are behaving like railways and claiming that there is no market. I know of a few companies in Bangalore who work extensively in foreign languages. But they don’t seem to care much about our languages. But there are some organizations which are obligated to try out these things. For example NIC has done some good work in automating land records. Now it is focusing on land registration. They have made it a point to introduce local language in their solutions. Some other government portals are also available in Indian languages. For a long time local language computing has been taken up by research agencies, universities or interested user groups and sometimes even motivated individuals. And this model has worked most times. Some of the tools we have today for Kannada did not come by organized development (except NUDI) but from motivated individuals or groups. But as with any unmanaged activity, Indian language computing has suffered with problems. The key areas of concern are – interoperability and continued support. Largely the problem could be traced to lack of suitable standards in critical areas like encoding and rendering. UNICODE has partially solved the problem. Even I had first hand experience with using UNICODE for Kannada. When I first started writing my Kannada blog, I was using NUDI fonts (since NUDI had govt. stamp). But then I discovered that UNICODE was far better option. And now I am pretty happy with it but issues remain. I want to give an example. The first letter of my name is transliterated as “kRu” in Baraha. But when the use the “Baraha Direct”, I cannot do this. In fact combinational keys are not easy to come in Direct. The problem is not only of Baraha but has to do partially with browsers and W3C standards also. I do not want to discuss these technical issues here. The policies and processes for making Indian languages available on net are already undertaken. W3C India, Dept of IT (Govt. of India) and lot of institutes like C-DAC and NIC are working on standardization required for these things. There are many issues surrounding localization that are discussed these days but I take up one issue. That is providing domain names in Indian languages. Now I have understood that this will be possibility in near future, I am expressing my concerns regarding this. Hope these issues will be resolved in coming days. My first question is whether in future we can see a complete Indian www? That is whether I can write the complete URL (including http://www) in local language? Or is it that I enter only the domain name in Kannada but enter www and http in English? If it is the later, then sorry, I am not interested in that. Let me explain why. We should first understand the need for Indian language domain names (iDN). I think the argument is that the people who do not know English should also be able to surf, so we need iDN. But see the catch – one has to enter http and www in English, use keyboard which has English alphabets, operated in a PC which has an OS with English rendering and we are talking of helping people who do not know English! This argument for iDN is flawed. But that is not the only reason for me to take a tough stand. There is one more serious technical issue. Do you think “xyz.com”, “XyZ.com” & “xyZ.com” should be different websites? Are you thinking that I am a fool because every kid knows that case differences do not matter in URL? But in Indian languages “rAma” and “rama” are different names. So when these two site names are registered, they should be separate. The ways to handle this is being worked out but the point I am making is there is a need for us to look at things differently. And there is a problem in having an input engine. The support for iDN must come from many corners. The browser companies should do some changes. At present the space available for entering URL in most browsers are not sufficient for displaying the Indian scripts. So we need their support to do this and many such things. Through a synergetic interaction between standardizing agencies and IT companies such things can be a possibility. But there are 2 more concerns I have about these localization efforts in general. This is not to say that I do not support these activities. Being a participant in some areas, I do contribute to these efforts in my own way. But I should spell out some itchy things that trouble me. My linguistic identity is a thing I am proud about. I speak Tulu at home and consider Kodava equally close to me. Kannada is a language of my mind. Like many I use English in my profession. Tamil and Malayalam came to me through my surroundings. And Hindi and Sanskrit is what I learnt through formal training. This is not my case alone. Lot many of them have such multiple linguistic identity. When Kannada gets its place on web I start thinking about Tulu and Kodava. Imagine someday even these languages get represented in web. You may say it is the matter of time this gets done since both these languages share Kannada script. And since Kannada scripts are available in UNICODE, the job is simpler. But the issues are much deeper. There are words in Kannada and say Tulu which sound same (and written in same way) but mean different. If iDN’s becomes a reality and if everyone stakes claim for that, then we should know how to solve this issue. This is a technical issue and so somehow we can solve this but my next concern is philosophical. Remember China and (somewhat partially) Japan had attempted something similar. They created “their own version” of www. The result was that they were left out from global view. They are (still) disconnected from main stream if you think the idea of www is seamless exchange of data across the globe. I sometimes think whether we will also become “islands” of users in www disconnected from “global” www. Again this is just a speculation. I keep wondering what will happen if one day I am able to open my laptop and see everything in Kannada. Will the local www limit my knowledge bandwidth or be more useful since I comprehend more easily in Kannada? When I posed this question to few people I received varied responses. One interesting possibility some one told me was the complete translation being in place. Extending the use of locale data, it is predicted that the www will someday look the way you want. So it no longer matters whether the site is in English or Spanish, it all appears to you in Kannada!! We as technology evangelists have the habit of overestimating the technical leap in 10 years and underestimating its progress in say 2 years. The complete semantic, auto translator-enabled web may be far away but I do not think iDN needs time. There are reports that some companies (VeriSign??) are accepting domain names in some Indian languages. So time is running out for us to finalize the standards (for input, rendering and related things) and get those standards accepted by all stakeholders (W3C, UNICODE, Browser developers, IT solution providers). There are also questions about the availability of development tools (for authoring, validating etc) in Indian languages. India has 22 official languages and more than 1000 dialects. Accommodating everyone in the journey is an exiting challenge. Most of the languages have standardized their typefaces. But in few cases there is still ongoing debate (Konkani). As I see it there is no much hope of commercial software houses being involved in these things. This is still a research area and as usual the government agencies and academia may have to completely involve themselves in this interdisciplinary work. But more than anything it is the pride of one’s language which will make individuals take up this challenge upfront. Earlier experiments in open source projects has proven that community based development (“bazaar model” according to Eric Raymond) can work efficiently compared to managed and centralized development (“cathedral model”). I do not see any reason why the same cannot happen with respect to Indian language computing. |