Deutsch English Français Italiano |
<v2adpi$2qp3t$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: Unicode in strings Date: Sat, 18 May 2024 16:25:54 +0200 Organization: A noiseless patient Spider Lines: 63 Message-ID: <v2adpi$2qp3t$1@dont-email.me> References: <v0s17o$2okf4$2@dont-email.me> <4e0557bec2acda4df76f1ed01ebcbdf6@www.novabbs.org> <v1ep4i$1ptf$1@gal.iecc.com> <v1eruj$3o1r8$1@dont-email.me> <v1h8l6$1ttd$1@gal.iecc.com> <v1kifk$17qh0$1@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> <v1ns43$2260p$1@dont-email.me> <2024May11.173149@mips.complang.tuwien.ac.at> <v1preb$2jn47$1@dont-email.me> <2024May12.110053@mips.complang.tuwien.ac.at> <6124140226e28fd4afec0b435bdbeca1@www.novabbs.org> <2024May18.104040@mips.complang.tuwien.ac.at> <v2acqr$2qj9l$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Date: Sat, 18 May 2024 16:25:55 +0200 (CEST) Injection-Info: dont-email.me; posting-host="0fe0f7db109a7ef45da29c690b90603b"; logging-data="2974845"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+tGV2z69C3gSgeA5eC5VR1EGi0/SnNVvVL7H5vT8EUWw==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:V9ClCLBt9SK1cC7GmPPkyGlLXFs= In-Reply-To: <v2acqr$2qj9l$1@dont-email.me> Bytes: 3880 Thomas Koenig wrote: > Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb: >> mitchalsup@aol.com (MitchAlsup1) writes: >>> It seems to me (in my vast ignorance) that names for things should be= >>> written in the most appropriate set of characters in the language of >>> the person/thing being named. >>> >>> Then when such a name is "sent out to be displayed" that it is a prop= erty >>> of the display what character set(s) it can properly emit, and thereb= y >>> alter the string of characters as appropriate to its capabilities. >>> >>> For example:: Take > "K\u0316\u0308nig" cr type =3D=3D> K=CC=96=CC=88= nig >>> When displayed on a ASCII only line printer it would be written Koeni= g >>> When displayed on a enhanced ASCII printer it would be written K=C3=B6= nig >>> When displayed on a full functional printer it would be written K=CC=96= =CC=88nig >> >> Why do you think that K=CC=96=CC=88nig should be written as Koenig or = K=C3=B6nig? >=20 > On my display, this read K, n with a diacritic and something close to > a cedille under the n. >=20 >> >> However, for Ko=CC=88nig >=20 > Again, the diaresis is over the n, not the o. >=20 >> Unicode specifies that the precomposed form is >> K=C3=B6nig. And if you want a transcription into ASCII with the knowl= edge >> that it's German, the result would be Koenig. >=20 > This is actually sometimes a (fairly minor) problem because the > name on my passport actually reads "K=C3=B6nig" (o-diacritic), but > people without knowledge of German tend to translscribe this as > "Konig", whereas I transcribe it as "Koenig" on offical forms > such as the one I need to fill out prior to entering the US. >=20 > This is why modern EU passports have a canonical form of the > name, which then is "KOENIG". >=20 Same problem as my wife and kids who have Norl=C3=B8ff either a part of t= heir=20 surname or (my wife) as-is. Canonical simplification of the '=C3=B8' character is either 'o' or 'oe',= and=20 passports and airline tickets differ, something which can cause all=20 sorts of issues with US passport control. Terje --=20 - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"