Deutsch English Français Italiano |
<v3vttb$5tk$1@gal.iecc.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.iecc.com!.POSTED.news.iecc.com!not-for-mail From: John Levine <johnl@taugh.com> Newsgroups: comp.arch Subject: Re: Character non-equivalence, was Byte Addressability And Beyond Date: Fri, 7 Jun 2024 21:26:03 -0000 (UTC) Organization: Taughannock Networks Message-ID: <v3vttb$5tk$1@gal.iecc.com> References: <v0s17o$2okf4$2@dont-email.me> <pbI6O.19524$61Y8.11175@fx15.iad> <jwv7cf4mpug.fsf-monnier+comp.arch@gnu.org> <cKE8O.2$bR_f.1@fx07.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Fri, 7 Jun 2024 21:26:03 -0000 (UTC) Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="6068"; mail-complaints-to="abuse@iecc.com" In-Reply-To: <v0s17o$2okf4$2@dont-email.me> <pbI6O.19524$61Y8.11175@fx15.iad> <jwv7cf4mpug.fsf-monnier+comp.arch@gnu.org> <cKE8O.2$bR_f.1@fx07.iad> Cleverness: some X-Newsreader: trn 4.0-test77 (Sep 1, 2010) Originator: johnl@iecc.com (John Levine) Bytes: 2193 Lines: 22 It appears that EricP <ThatWouldBeTelling@thevillage.com> said: >Eeewww... I didn't even think of that. >What does one do about them? You can't treat them as equivalent in a >string compare... the user might want the first B and not second B. People keep rediscovering that when you're using Unicode, nothing is simple. One of its canonical forms is NFKC which uses composed versions of accented characters, and uses a canonical equivalence rule to turn some kinds of characters that look similar into a single form. That solves some of the problems but not even close to all of them. The rules about whether two strings are upper/lower caase equivalent depend on the language and sometimes even the local version of the language, e.g. French French and Quebec French have different conventions about accented capital letters. The only thing I can say with confidence is that any rule that starts with "You can just ..." is wrong. -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly