Deutsch English Français Italiano |
<2024May27.082528@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: python text, Byte Addressability And Beyond Date: Mon, 27 May 2024 06:25:28 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 22 Message-ID: <2024May27.082528@mips.complang.tuwien.ac.at> References: <v0s17o$2okf4$2@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> <v1ns43$2260p$1@dont-email.me> <2024May11.173149@mips.complang.tuwien.ac.at> <v1ossl$1ps0$1@gal.iecc.com> <2024May12.074045@mips.complang.tuwien.ac.at> <v1q840$2mk58$1@dont-email.me> <2024May12.181226@mips.complang.tuwien.ac.at> <v30mjq$3min8$4@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Mon, 27 May 2024 08:44:47 +0200 (CEST) Injection-Info: dont-email.me; posting-host="ea08719dbf87fa9ea1796ef43ff1a7b7"; logging-data="4120485"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+PQQcH+oR14rB49CYcmrEM" Cancel-Lock: sha1:v4uW50xxs/OH+Nz99zG1FQKzGP0= X-newsreader: xrn 10.11 Bytes: 2264 Lawrence D'Oliveiro <ldo@nz.invalid> writes: >On Sun, 12 May 2024 16:12:26 GMT, Anton Ertl wrote: > >> Plus at some point (not sure when) they decided that characters have to >> be composable ... > >I think that was true right from the beginning. Else you would have had a >combinatorial explosion of alphabetic characters with diacritic marks. Unicode has precomposed variants of the Latin characters that are used in normal text. It does not have a precomposed character for, e.g., K̖̈, but then such a character does not occur in normal text. Unicode 1.0 with its expansion to 16-bit code units only makes sense if the resulting code units are characters. If at that point they had planned to have variable-width characters, they could have gone with something like UTF-8 from the start and spared us a lot of pain. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>