Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v3v7k7$24548$1@dont-email.me>
Deutsch   English   Français   Italiano  
<v3v7k7$24548$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: Byte Addressability And Beyond
Date: Fri, 7 Jun 2024 17:05:42 +0200
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <v3v7k7$24548$1@dont-email.me>
References: <v0s17o$2okf4$2@dont-email.me> <v31c4r$3u28v$1@dont-email.me>
 <v327n3$1use$1@gal.iecc.com> <BM25O.40665$HBac.4762@fx15.iad>
 <v32lpv$1u25$1@gal.iecc.com> <v33bqg$9cst$11@dont-email.me>
 <v34v62$ln01$1@dont-email.me> <v36bva$10k3v$2@dont-email.me>
 <2024May29.090435@mips.complang.tuwien.ac.at>
 <cIG5O.25483$gKW1.4042@fx13.iad> <jwvcyp4veqj.fsf-monnier+comp.arch@gnu.org>
 <I5I5O.9419$czG6.9020@fx02.iad> <jwv1q5kvcnm.fsf-monnier+comp.arch@gnu.org>
 <1uJ5O.2$gn%7.1@fx12.iad> <2024May30.173537@mips.complang.tuwien.ac.at>
 <pbI6O.19524$61Y8.11175@fx15.iad> <jwv7cf4mpug.fsf-monnier+comp.arch@gnu.org>
 <cKE8O.2$bR_f.1@fx07.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Fri, 07 Jun 2024 17:05:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2ae1113e35663d5bd33dd38d87f62943";
	logging-data="2233480"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+JZf0Pw6lzTYhXT7fxWH+DkgzRrEXqOXia/o9/YgTD1A=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:hsW0eTJXfVwRRK3XsHZquo5nm3Q=
In-Reply-To: <cKE8O.2$bR_f.1@fx07.iad>
Bytes: 3607

EricP wrote:
> Stefan Monnier wrote:
>>
>> Another issue with Unicode is the so-called "confusables": things that=

>> may look identical (or close enough) on screen yet are different (and
>> not just because of normalization).=C2=A0 E.g. =C3=8E=E2=80=99 vs B, =C3=
=90=C2=90 vs A, or =C3=A2=CB=86=E2=80=A2 vs=20
>> / vs =C3=A2=C2=81=E2=80=9E.
>> Unicode comes with a 700kB `confusables.txt` listing such issues.
>=20
> Eeewww... I didn't even think of that.
> What does one do about them? You can't treat them as equivalent in a
> string compare... the user might want the first B and not second B.
>=20
> I suppose one would want two compare equal functions,
> an exactly equal, and a visually approximately equal.
> Like using a soundex for words to catch misspellings.
>=20
> But then programmers need to decide when to use each compare.
>=20
> These character and code attribute lookup tables are looking awkward.
> With up to 2M codes, and some base character codes having multiple
> possible combiners, but very sparse. And links between entries
> for upper and lower case, and now links between confusables.
> And we don't want to roll over the L1 cache just to do a string compare=
=2E

Years ago I considered case-insensitive Boyer-Moore text search with a=20
wide alphabet and found that the only approach that made sense was to=20
maintain two copies of the string to be searched for, one lower and one=20
upper case, where each "character" was a length-encoded string. This was =

required to handle things like the German double s which can uppercase=20
into a single letter.

The lookup table for skip lengths was still far shorter than the=20
alphabet size, effectively a very short and fast hash of the current=20
character/codepoint/combined letter.

Terje

--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"