david on 21 Nov 2000 19:09:06 -0000 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: <nettime> Asia and domain names, etc. (@) |
At 08:59 +0900 00.11.17, Benjamin Geer wrote: > On Fri, Nov 17, 2000 at 12:25:30AM +0900, david@2dk.net wrote: > > Unicode, as I understand it, is a project to develop a global > > localisation standard -- a way to learn how to write one(=uni) > > source code that will be expedient to localise for any market. > > Well, not really. It's actually a single character encoding that > includes all the characters of nearly all known languages that have > writing systems. I really have very little faith that this discussion is serving more than two individuals' egos. We've left the original topic completely and are simply trying to help David and Benjamin find some point where they both understand that they've not been proven 'wrong' in public. How boring. Let's try and take the conversation somewhere that it has some relationship to nettime-like topics. As the 'David' half of the aforementioned ego issue, let me please try to offer this solution in the following: Yes, Unicode is a single encoding system, with a very large database including many of the characters necessary to represent most of the languages on earth. We've both addressed that question more than once. Problem one is that Unicode was made without the participation of those using the languages involved. (Which probably brings us back to my knee-jerk reaction to Benjamin's 'I would hope that' tone.) Unicode includes really blatant errors in 'far eastern' languages like the inclusion of one character for both the 'chinese' numeral one, and a hyphen, a distinction which is really really useful in languages like PRC and ROC and Japanese and Korean, and it is omissions like this are alienating and lead to its lack of adoption in asia, and gee isn't the introduction of new cultural spheres (their *intact* introduction) into the net what we're meant to be discussing here anyway. Anyone can access unicode's web page and make their own judgements about the general theory behind a universal linguistic encryption code and whether the world is now suddenly going to become multilingual, using hebraic and chinese and french in any given document, or whether the primary advantage to having a unicode is for conversion of basic protocols like multinational software so that Microsoft can monopolise all wordprocessing software for everyone in the entire world. Yes, I know... I'm not being cool and rational, but I do still believe that it is a valid position. > > This is a technical issue for software manufacturers who wish to > > become multinationals, and not one for finding universal ways of > > integrating living languages onto 'the' net. > > I think you've misunderstood it; it's the latter. A Japanese, Thai, > Russian, or English document that's encoded in Unicode (as opposed to > one of the many older, language-specific encodings) can be displayed > without modification in any web browser or word processor that > understands Unicode and has a Unicode font. This is why all of > Microsoft's products, the Java platform, and the XML standard use > Unicode as their underlying encoding. It completely removes the need > to `translate' documents from one encoding into another in order to > display them in an environment other than the one they were written > in. We're both assuming each other's ignorance too much. That's right, they can be displayed. Displayed, yes, displayed. Because Unicode features a large database. Right. Display is not the problem, once you've got the code written. Like in cases of localising multinational software products. The problem is that the input methods of each nation suggest very interesting linguistic aspects, which deserve a certain mention. And having one global standard for something always makes for tricky political questions about who gets to determine it. Then who benefits, who doesn't. If all cultures would adapt their on-line linguistic culture to the needs of Unicode, the world might be an easier place. But the fact is that the four cultures listed, which include a substantial portion of the world's brainpower also have other ideas, which may eventually be swayed to the Unicode camp, haven't yet, and the reasons why they might not could be of some interest to some nettime readers. I'd like to get that discussion started, please. > > ISO 10646 is an international standard in that somebody recognises > > that there is an issue here. It isn't a functioning initiative that > > has been actually globally adopted. > > It's been adopted by every major software manufacturer, and my > impression is that it's pretty well-supported on most operating > systems. To the best of my knowledge, if you use a recent version of > Microsoft Word, you're writing documents in Unicode. Actually, I use Jedit. I avoid Microsoft products whenever possible. Because Microsoft products make many of my other software products crash. Their files and often inconvertible. They're huge. They've done a shitty job integrating double byte character recognition. They don't modify their software so that word count functions meaningfully in double-byte projects. They're intrusive into other elements of the operating system. Etc. etc. etc. But that's not the point. The point is that we've got a considerable portion of humanity (and many yet to be 'major software manufactures') yet to come on line, and that they probably have fascinating things to contribute in doing so. If Unicode becomes the defacto standard by popular adoption, that's groovy, but I'm interested in what else is out there, and there still seems to be a bit out there. At least among 1/3 of human kind. > > I, with my Japanese system have immense problems sending the exact > > same 'chinese' characters (though I also have a PRC chinese > > character OS which I can reboot into) to my friends in Korea or > > Taiwan. This is not a Unicode problem, nor anything that it will > > solve in the forseeable future. Unicode means that all of us in > > these various countries may be attempting to send these files in > > various localised versions of MSWord which all function well in our > > markets. > > Not at all. In fact, that's exactly the problem Unicode is meant to > solve. Localisation and encoding are basically separate issues. > Localisation means, for example, that the a menu is marked `File' in > the English version of MS Word, and `Fichier' in the French version. > Encoding, on the other hand, is the way characters are represented as > bytes in the document produced. The idea of Unicode is to enable > every language to use the same encoding; therefore, you should be able > to use any Unicode-compliant version of MS Word, regardless of its > localisation, to read a document containing, say, a mixture of > Japanese, Hungarian, Korean, and Arabic. In single-byte character systems, 'File' can be converted into 'Fichier.' Groovy. Not try converting that into 'ファイル' or '封筒' or '引き出し' or w hatever you please. Now try inputting that in Japanese, Korean, ROC and PRC. You'll find that there are cultural relations between the spoken, printed and digitised word. There are linguistic conventions. Input cultures. And the just having linguistic databases does not yet solve the linguistic needs of each language. But maybe (most probably) we're asking the wrong question. Let me put it this way, does anyone on this list really believe that the Americans will be able to convince the Chinese (whichever chinese) that they should adopt their on-line conventions to a foreign developed linguistic database? Would the Americans adopt a language set developed in Beijing? > > (You should see what a nettime post sent from someone with a French > > character set looks like when recieved on a double-byte OS. It's a > > mess!!) > > The problem there is that French (along with some other European > languages) is traditionally encoded in ISO-8859-1, and Japanese > traditionally uses JIS or EUC. Yes, these are two (though there is more than one JIS). Does Unicode elegantly solve these things? Is the fact that they are not uni-versally adopted just a matter of everybody in these language spheres being poor sports? > Most people on Nettime use ISO-8859-1 > (probably without realising it). But if we all use Unicode-compliant > mail readers, and we all write our French and Japanese emails in > Unicode, everyone's emails will appear correctly on everyone else's > computers. That certainly is one solution. The world could also use one financial instrument, currency.. or language, for that matter. The issue that I initially proposed to Diwakar Agnihotri had to do with the 'one China' issue being highlighted in the introduction of written script (both in input and display) issues on-line. The question posed had to do with differing input methods for the Roman character-based keyboard, and with various encryption methods. There was a great discussion here in Japan, for example, when the Clinton administration passed a 'communications act' in February of 96 or so. Some of the encryption methods considered illegal were in use for the simple transmission of language on-line. This discussion is what I was alluding to in saying that someday each domain might have it's own encryption, like secure cyber wallets that exchanged information as transactions. Once you're dealing with complex character sets which require encryption, etc. the linguistic questions change. It highlights that fact of languages which are needed to input and display languages. I'm sure that there must be something that we can talk about related to this. # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body # archive: http://www.nettime.org contact: nettime@bbs.thing.net