Path: ...!npeer.as286.net!npeer-ng0.as286.net!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Stefan Monnier <monnier@iro.umontreal.ca>
Newsgroups: comp.arch
Subject: Re: Byte Addressability And Beyond
Date: Wed, 29 May 2024 11:55:18 -0400
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <jwv1q5kvcnm.fsf-monnier+comp.arch@gnu.org>
References: <v0s17o$2okf4$2@dont-email.me> <v31c4r$3u28v$1@dont-email.me>
	<v327n3$1use$1@gal.iecc.com> <BM25O.40665$HBac.4762@fx15.iad>
	<v32lpv$1u25$1@gal.iecc.com> <v33bqg$9cst$11@dont-email.me>
	<v34v62$ln01$1@dont-email.me> <v36bva$10k3v$2@dont-email.me>
	<2024May29.090435@mips.complang.tuwien.ac.at>
	<cIG5O.25483$gKW1.4042@fx13.iad>
	<jwvcyp4veqj.fsf-monnier+comp.arch@gnu.org>
	<I5I5O.9419$czG6.9020@fx02.iad>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Wed, 29 May 2024 17:55:18 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b4f73329fad870866166a1ffc8c05f07";
	logging-data="1278199"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+Q0ou+fnLe4ty1WQ4aGRRsjjAwm4VAbVo="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:+T7QVkPlbGNosY5zzYkwO0fA+1U=
	sha1:8VdosCpQbTZ85KofQkzp+FkKO9Q=
Bytes: 2542

>>> I've not dealt with UTF-8 or code points but that's because I've not
>>> written software that interacts with the non 1-byte character markets.
>>> But even something as simple as sanitizing a character string to feed
>>> into SQL will have to.
>> AFAIK you can do that by treating the UTF-8 byte sequence as if it were
>> an ASCII byte-sequence: all the Unicode weirdness is neatly stashed in
>> bytes >127 which aren't used by SQL itself anyway.
>>         Stefan
>
> Of course with apologies to Herr Koenig's umlauts. :-)
>
> And what of all those new Asian customers your company was hoping
> to get by dealing with them in their native written language???
> You could always explain to the company president that
> you only work in ASCII so they should just get used to it.

I think you misunderstand: the code written to sanitize an ASCII string to
feed into SQL will *just work* to sanitize a UTF-8 string to feed
into SQL, no matter how many funny characters and joiners and combiners
and emojis you have in there.

That's part of the reason why UTF-8 is so popular: you can surprisingly
often treat it as "good old ASCII".


        Stefan