Deutsch   English   Français   Italiano  
<v32lpv$1u25$1@gal.iecc.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: John Levine <johnl@taugh.com>
Newsgroups: comp.arch
Subject: Re: Byte Addressability And Beyond
Date: Mon, 27 May 2024 19:09:51 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <v32lpv$1u25$1@gal.iecc.com>
References: <v0s17o$2okf4$2@dont-email.me> <v31c4r$3u28v$1@dont-email.me> <v327n3$1use$1@gal.iecc.com> <BM25O.40665$HBac.4762@fx15.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 May 2024 19:09:51 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
	logging-data="63557"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <v0s17o$2okf4$2@dont-email.me> <v31c4r$3u28v$1@dont-email.me> <v327n3$1use$1@gal.iecc.com> <BM25O.40665$HBac.4762@fx15.iad>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)
Bytes: 2253
Lines: 25

According to EricP  <ThatWouldBeTelling@thevillage.com>:
>John Levine wrote:
>> If you mean an array of pointers to sequences of code points, well
>> sure, but now we're back to variable length encodings. I know that I
>> have no idea how big these fixed size things would have to be and i
>> suspect nobody else does either.
>
>One could have instructions that make it easier to parse the
>variable length UTF-8 sequences into codepoints.

That would be the CU14 instruction on zSeries, to turn UTF-8 into
UTF-32. CU41 goes the other way.

>It would still have to look up whether a codepoint was combining or
>stand alone. I don't see a firm definition whether combining codepoints
>come before or after, after requiring a lookahead parse and so extra
>checks to ensure it doesn't look past the buffer end.

I think they come after but I haven't looked in enough detail. And
then you have all of the issues with precomposed characters: do you
normalize as you go or denormaiize, or what?

-- 
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly