Deutsch English Français Italiano |
<v1ns43$2260p$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.arch Subject: Re: Byte Addressability And Beyond Date: Sat, 11 May 2024 15:33:55 +0200 Organization: A noiseless patient Spider Lines: 29 Message-ID: <v1ns43$2260p$1@dont-email.me> References: <v0s17o$2okf4$2@dont-email.me> <4e0557bec2acda4df76f1ed01ebcbdf6@www.novabbs.org> <v1ep4i$1ptf$1@gal.iecc.com> <v1eruj$3o1r8$1@dont-email.me> <v1h8l6$1ttd$1@gal.iecc.com> <v1kifk$17qh0$1@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 11 May 2024 15:33:56 +0200 (CEST) Injection-Info: dont-email.me; posting-host="e065d27d964d081eeac047b5b066e87e"; logging-data="2168857"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AagMqc+30UBLk1vbo/tO97uurik8wk3I=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:Bz7mF3YYluh0V9m2awgfm6vIXK0= In-Reply-To: <2024May10.182047@mips.complang.tuwien.ac.at> Content-Language: en-GB Bytes: 2566 On 10/05/2024 18:20, Anton Ertl wrote: > David Brown <david.brown@hesbynett.no> writes: >> UTF-32 is fine for internal use, however - using whatever endianness >> your processor prefers. The trick is never to let it leave the one >> computer in any encoding other than UTF-8. > > An unnecessary complication. > > 1) I only came up with the following use cases where you need to deal > with individual non-ASCII characters: Palindrome checkers and anagram > programs; I remember somebody mentioning a third use (which I promptly > forgot), but anyway, there are few cases. > > 2) But even for those few cases, UTF-32 is not good enough, because a > code point is not a character. > I agree that it is usually unnecessary to convert to UTF-32 - I am merely saying that /if/ you feel you want to expand the code points, then UTF-32 is fine for the purpose and you should not have to worry about endianness because you should not be moving it off your computer, thus native endianness is all you need. People sometimes say they want to expand to code points to be able to see the length of the string in characters, or to index characters, or for easier splicing or joining strings. I don't think these are particularly useful in practice, but UTF-32 is fine for those that want it.