Article <v1kifk$17qh0$1@dont-email.me>

Deutsch English Français Italiano
<v1kifk$17qh0$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.arch
Subject: Re: Byte Addressability And Beyond
Date: Fri, 10 May 2024 09:31:00 +0200
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <v1kifk$17qh0$1@dont-email.me>
References: <v0s17o$2okf4$2@dont-email.me>
 <4e0557bec2acda4df76f1ed01ebcbdf6@www.novabbs.org>
 <v1ep4i$1ptf$1@gal.iecc.com> <v1eruj$3o1r8$1@dont-email.me>
 <v1h8l6$1ttd$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 10 May 2024 09:31:01 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="633d7e186bd79b4c56fa5f1c4cde0101";
	logging-data="1305120"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18X9s4/HXOOK9Nut5Ct0sSa1rSNIzmrFLQ="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:2r3fH6yOjv9iIilKZjNN8PM+VUU=
In-Reply-To: <v1h8l6$1ttd$1@gal.iecc.com>
Content-Language: en-GB
Bytes: 2506

On 09/05/2024 03:24, John Levine wrote:
> According to Lawrence D'Oliveiro  <ldo@nz.invalid>:
>> On Wed, 8 May 2024 02:47:46 -0000 (UTC), John Levine wrote:
>>
>>> It doesn't make sense to say that character strings are big- or little-
>>> endian.
>>
>> Yes it does, for just about any encoding other than UTF-8. Thus, you have
>> UTF16BE, and UTF16LE.
> 
> Not really, those are byte orders within a character, not order of characters.
> 

Or rather, they are byte orders used by different encodings of code 
points.  ("Characters" in Unicode are more complicated - nothing is ever 
simple in Unicode!)  There are no endian issues between code points, and 
a "string" as far as Unicode is concerned would be a sequence of code 
points.  You only have endian issues if you want to store these 21-bit 
integers in a format that is encoded in smaller lumps (like 
byte-addressed memory).

> If you look at surrogates, you can UTF16 is big-endian.  First there's the high
> surrogate, then the low one.
> 
> There's a reason that every encoding other than UTF-8 is dead.  Who needs the grief?

Indeed.

UTF-32 is fine for internal use, however - using whatever endianness 
your processor prefers.  The trick is never to let it leave the one 
computer in any encoding other than UTF-8.