Deutsch English Français Italiano |
<jwv1q5kvcnm.fsf-monnier+comp.arch@gnu.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!npeer.as286.net!npeer-ng0.as286.net!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Stefan Monnier <monnier@iro.umontreal.ca> Newsgroups: comp.arch Subject: Re: Byte Addressability And Beyond Date: Wed, 29 May 2024 11:55:18 -0400 Organization: A noiseless patient Spider Lines: 26 Message-ID: <jwv1q5kvcnm.fsf-monnier+comp.arch@gnu.org> References: <v0s17o$2okf4$2@dont-email.me> <v31c4r$3u28v$1@dont-email.me> <v327n3$1use$1@gal.iecc.com> <BM25O.40665$HBac.4762@fx15.iad> <v32lpv$1u25$1@gal.iecc.com> <v33bqg$9cst$11@dont-email.me> <v34v62$ln01$1@dont-email.me> <v36bva$10k3v$2@dont-email.me> <2024May29.090435@mips.complang.tuwien.ac.at> <cIG5O.25483$gKW1.4042@fx13.iad> <jwvcyp4veqj.fsf-monnier+comp.arch@gnu.org> <I5I5O.9419$czG6.9020@fx02.iad> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Wed, 29 May 2024 17:55:18 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b4f73329fad870866166a1ffc8c05f07"; logging-data="1278199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Q0ou+fnLe4ty1WQ4aGRRsjjAwm4VAbVo=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:+T7QVkPlbGNosY5zzYkwO0fA+1U= sha1:8VdosCpQbTZ85KofQkzp+FkKO9Q= Bytes: 2542 >>> I've not dealt with UTF-8 or code points but that's because I've not >>> written software that interacts with the non 1-byte character markets. >>> But even something as simple as sanitizing a character string to feed >>> into SQL will have to. >> AFAIK you can do that by treating the UTF-8 byte sequence as if it were >> an ASCII byte-sequence: all the Unicode weirdness is neatly stashed in >> bytes >127 which aren't used by SQL itself anyway. >> Stefan > > Of course with apologies to Herr Koenig's umlauts. :-) > > And what of all those new Asian customers your company was hoping > to get by dealing with them in their native written language??? > You could always explain to the company president that > you only work in ASCII so they should just get used to it. I think you misunderstand: the code written to sanitize an ASCII string to feed into SQL will *just work* to sanitize a UTF-8 string to feed into SQL, no matter how many funny characters and joiners and combiners and emojis you have in there. That's part of the reason why UTF-8 is so popular: you can surprisingly often treat it as "good old ASCII". Stefan