Deutsch English Français Italiano |
<2024May18.174835@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Unicode in strings Date: Sat, 18 May 2024 15:48:35 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 28 Message-ID: <2024May18.174835@mips.complang.tuwien.ac.at> References: <v0s17o$2okf4$2@dont-email.me> <v1h8l6$1ttd$1@gal.iecc.com> <v1kifk$17qh0$1@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> <v1ns43$2260p$1@dont-email.me> <2024May11.173149@mips.complang.tuwien.ac.at> <v1preb$2jn47$1@dont-email.me> <2024May12.110053@mips.complang.tuwien.ac.at> <6124140226e28fd4afec0b435bdbeca1@www.novabbs.org> <2024May18.104040@mips.complang.tuwien.ac.at> <v2acqr$2qj9l$1@dont-email.me> <v2adpi$2qp3t$1@dont-email.me> <v2aem0$2qubl$1@dont-email.me> Injection-Date: Sat, 18 May 2024 18:07:43 +0200 (CEST) Injection-Info: dont-email.me; posting-host="452a7a22aa81cceac80468f54b7242fc"; logging-data="3012529"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Uip4EXc+5W2WwRes3S1f5" Cancel-Lock: sha1:FPf8ovqO1prJLdhBkn9hbowGoMo= X-newsreader: xrn 10.11 Bytes: 2688 Thomas Koenig <tkoenig@netcologne.de> writes: >Terje Mathisen <terje.mathisen@tmsw.no> schrieb: > >> Canonical simplification of the 'ø' character is either 'o' or 'oe', and >> passports and airline tickets differ, something which can cause all >> sorts of issues with US passport control. > >Reminds me of either "Asterix and the Great Crossing" or "Asterix >and the Normans", where Viking speach was indicated by having >slashes through letters (like ø). When Obelix tries to speak >their language, he also applies slashes, but does so randomly >(like through a c) so nobody can understand him. > >Hmm... a challenge, can this be represented as Unicode codepoints? Sure. See <https://en.wikipedia.org/wiki/Bar_(diacritic)>. Interestingly, the Obelix character ȼ you mention above has it's own precomposed code point U+023C (Latin Small Letter C with Stroke) and its own Wikipedia page: https://en.wikipedia.org/wiki/%C8%BB, but you can also compose it from c and the combining short solidus overlay: c̷ (this does not display correctly on emacs 27.1, but composes correctly on an xterm. There is no precomposed Latin Small Letter D with Stroke, but you can compose it in the same way: d̷. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>