Deutsch English Français Italiano |
<v39lc7$1lfii$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: Byte Addressability And Beyond Date: Thu, 30 May 2024 12:45:27 +0200 Organization: A noiseless patient Spider Lines: 42 Message-ID: <v39lc7$1lfii$1@dont-email.me> References: <v0s17o$2okf4$2@dont-email.me> <v31c4r$3u28v$1@dont-email.me> <v327n3$1use$1@gal.iecc.com> <BM25O.40665$HBac.4762@fx15.iad> <v32lpv$1u25$1@gal.iecc.com> <v33bqg$9cst$11@dont-email.me> <v34v62$ln01$1@dont-email.me> <v36bva$10k3v$2@dont-email.me> <2024May29.090435@mips.complang.tuwien.ac.at> <v39dpj$1k4hm$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Thu, 30 May 2024 12:45:27 +0200 (CEST) Injection-Info: dont-email.me; posting-host="ce88e1757c152d61772d8ceed59fb007"; logging-data="1752658"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/2oPS75ZM5CYV4vQemBxTbCoFC92+jZqNkVWf9O6bfsw==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:zFbvj16qxZJXIt2TQQXdfQlpqmY= In-Reply-To: <v39dpj$1k4hm$1@dont-email.me> Bytes: 3160 Terje Mathisen wrote: > Anton Ertl wrote: >> This approach can also be SIMDified, converting regbits/32 code points >> in one representation to the same number of code points in the other >> representation plus a length of the UTF-8 representation. >> >> The disadvantage of this approach exists particularly for >> UTF-8->UTF-32: this is a very sequential approach full of dependences: >> each use of the conversion instruction is followed by a dependent load >> of the next input fragment, and the next use of the conversion >> instruction depends on that load. > > Rather the opposite: > > UTF8->UTF32 looks a _lot_ like an easier example of a byte-oriented > variable length (x86?) instruction decoder, but with the big > simplification that the first byte directly tells you how long the > sequence is. > > Doing a SIMD version corresponds to a superscalar x86 in that the > decoder needs to grab a variable number of bytes for each instruction, > starting the next immediately after. Even better (compared to a superscalar x86 instruction decoder), _every_ byte uses the top two bits to tell you if this is 7-bit ascii, the start of a UTF-8 encoded code point, or a follow-on byte inside a UTF-8 code point. This means that each decoder can work alone, without having to wait for the length decoding of the previous code point ("instruction") before deciding to discard or pass on the results it got from starting where it did. It seems like it would be very feasible to have (say) 8 parallel decoders starting at every corresponding byte offset, and return a SIMD register with 2-8 32-bit decoded code points, right? Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"