Deutsch English Français Italiano |
<2024May29.090435@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Byte Addressability And Beyond Date: Wed, 29 May 2024 07:04:35 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 70 Message-ID: <2024May29.090435@mips.complang.tuwien.ac.at> References: <v0s17o$2okf4$2@dont-email.me> <v31c4r$3u28v$1@dont-email.me> <v327n3$1use$1@gal.iecc.com> <BM25O.40665$HBac.4762@fx15.iad> <v32lpv$1u25$1@gal.iecc.com> <v33bqg$9cst$11@dont-email.me> <v34v62$ln01$1@dont-email.me> <v36bva$10k3v$2@dont-email.me> Injection-Date: Wed, 29 May 2024 09:56:48 +0200 (CEST) Injection-Info: dont-email.me; posting-host="8dde46fdf7008275d2b3739552e1883a"; logging-data="1123671"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xPTSUe3O0RXToLBqJQqYf" Cancel-Lock: sha1:I1kYMU4BFCfCt+X0HrrJXqGuog4= X-newsreader: xrn 10.11 Bytes: 4136 Lawrence D'Oliveiro <ldo@nz.invalid> writes: >On Tue, 28 May 2024 16:02:10 -0000 (UTC), Thomas Koenig wrote: > >> Lawrence D'Oliveiro <ldo@nz.invalid> schrieb: >>> >>> On Mon, 27 May 2024 19:09:51 -0000 (UTC), John Levine wrote: >>> >>>> According to EricP <ThatWouldBeTelling@thevillage.com>: >>>>> >>>>> One could have instructions that make it easier to parse the variable >>>>> length UTF-8 sequences into codepoints. What for? Dealing with code points is rarely necessary, so adding instructions for that is a waste (and it's not surprising to me that neither AMD64 nor ARM A64 have such instructions; IBM z seems to be add special instructions that are rarely useful as marketing argument). >>>> That would be the CU14 instruction on zSeries, to turn UTF-8 into >>>> UTF-32. CU41 goes the other way. >>> >>> What is the point, in this day and age, of having special machine >>> instructions to convert character encodings? >> >> Have you looked at decoding algorithms for UTF-8? > >Of course. Isn’t the point of RISC that these complex operations are more >efficiently performed by a sequence of simpler instructions? The IBM z series are not RISCs. Anyway, such instructions can be done in a RISCy way (pure register-to-register instructions) or in a CISCy way (memory-to-memory). A RISCy way to do UTF-8 -> UTF-32 would be to have the first 4 bytes of the remaining string in a register and producing an UTF-32 code point in another register and a length in a third register (or in the high part of the destination register to reduce write port requirements). Similarly for UTF-32->UTF-8, with the length specifying the length of the result; that would need to be combined with a length masked store to make it easy to store the result. This approach can also be SIMDified, converting regbits/32 code points in one representation to the same number of code points in the other representation plus a length of the UTF-8 representation. The disadvantage of this approach exists particularly for UTF-8->UTF-32: this is a very sequential approach full of dependences: each use of the conversion instruction is followed by a dependent load of the next input fragment, and the next use of the conversion instruction depends on that load. We have been discussing shift buffers; those would be useful for such instructions. A CISCy approach is similar to a block copy: have a source operand in memory (represented by an address and maybe a length) and a destination operand (represented by an address and a length) start the instruction in a loop until it is finished (the loop is there to allow interrupting the instruction in the middle, e.g., for page faults). Looking at CU14 on page 7-136 of <https://www.ibm.com/docs/en/SSQ2R2_15.0.0/com.ibm.tpf.toolkit.hlasm.doc/dz9zr006.pdf>, CU14 takes the CISCy approach outlined above. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>