Deutsch English Français Italiano |
<6124140226e28fd4afec0b435bdbeca1@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Unicode in strings Date: Tue, 14 May 2024 17:43:43 +0000 Organization: Rocksolid Light Message-ID: <6124140226e28fd4afec0b435bdbeca1@www.novabbs.org> References: <v0s17o$2okf4$2@dont-email.me> <4e0557bec2acda4df76f1ed01ebcbdf6@www.novabbs.org> <v1ep4i$1ptf$1@gal.iecc.com> <v1eruj$3o1r8$1@dont-email.me> <v1h8l6$1ttd$1@gal.iecc.com> <v1kifk$17qh0$1@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> <v1ns43$2260p$1@dont-email.me> <2024May11.173149@mips.complang.tuwien.ac.at> <v1preb$2jn47$1@dont-email.me> <2024May12.110053@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="1095532"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Site: $2y$10$QhFKQjyTzmHOBmYc3iDJY.q23wt9oDHCki0zBsRP5h5syeF161HkC Bytes: 3608 Lines: 61 Anton Ertl wrote: > Thomas Koenig <tkoenig@netcologne.de> writes: > E.g., consider the following Gforth code (others can tell you how to > do it in Python): > "Ko\u0308nig" cr type > The output is: > König > That is, the second character consists of two Unicode code points, the > "o" and the "\u0308" (Combining Diaeresis). > (I think that somewhere along the way from the Forth system to the > xterm through copying and pasting into Emacs the second character has > become precomposed, but that's probably just as well, so you can see > what I see). > If I replace the third code point with an e, I get "Koenig". So by > overwriting one code point, I insert a character into the string. > If instead I replace the second code point with a "\u0316" (Combining > Grave Accent Below): > "K\u0316\u0308nig" cr type > I get this (which looks as expected in my xterm, but not in Emacs) > K̖̈nig > The first character is now a K with a diaresis above and an accent > grave below and there are now a total of 4 characters, but still 6 > code points in the string; the second character has been deleted by > this code-point replacement. > It seems to me (in my vast ignorance) that names for things should be written in the most appropriate set of characters in the language of the person/thing being named. Then when such a name is "sent out to be displayed" that it is a property of the display what character set(s) it can properly emit, and thereby alter the string of characters as appropriate to its capabilities. For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig When displayed on a ASCII only line printer it would be written Koenig When displayed on a enhanced ASCII printer it would be written König When displayed on a full functional printer it would be written K̖̈nig The problem is the mapping function between how it should be encoded in its own native language to what can be expressed on a particular device. Only the display device needs to understand this mapping and NOT the program/software/device holding the string. I think people in Japan should be able to use printf by using プリントフ There is way to much "english" in the way computers are being used. It is similar to Anthropomorphizing animal behavior.