Path: ...!npeer.as286.net!npeer-ng0.as286.net!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: comp.arch Subject: Re: Byte Addressability And Beyond Date: Wed, 29 May 2024 11:55:18 -0400 Organization: A noiseless patient Spider Lines: 26 Message-ID: References: <2024May29.090435@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Wed, 29 May 2024 17:55:18 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b4f73329fad870866166a1ffc8c05f07"; logging-data="1278199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Q0ou+fnLe4ty1WQ4aGRRsjjAwm4VAbVo=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:+T7QVkPlbGNosY5zzYkwO0fA+1U= sha1:8VdosCpQbTZ85KofQkzp+FkKO9Q= Bytes: 2542 >>> I've not dealt with UTF-8 or code points but that's because I've not >>> written software that interacts with the non 1-byte character markets. >>> But even something as simple as sanitizing a character string to feed >>> into SQL will have to. >> AFAIK you can do that by treating the UTF-8 byte sequence as if it were >> an ASCII byte-sequence: all the Unicode weirdness is neatly stashed in >> bytes >127 which aren't used by SQL itself anyway. >> Stefan > > Of course with apologies to Herr Koenig's umlauts. :-) > > And what of all those new Asian customers your company was hoping > to get by dealing with them in their native written language??? > You could always explain to the company president that > you only work in ASCII so they should just get used to it. I think you misunderstand: the code written to sanitize an ASCII string to feed into SQL will *just work* to sanitize a UTF-8 string to feed into SQL, no matter how many funny characters and joiners and combiners and emojis you have in there. That's part of the reason why UTF-8 is so popular: you can surprisingly often treat it as "good old ASCII". Stefan