Deutsch English Français Italiano |
<v3cg8v$27oob$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: Byte Addressability And Beyond Date: Fri, 31 May 2024 14:36:47 +0200 Organization: A noiseless patient Spider Lines: 44 Message-ID: <v3cg8v$27oob$1@dont-email.me> References: <v0s17o$2okf4$2@dont-email.me> <v34v62$ln01$1@dont-email.me> <7yn5O.33584$9xU7.29321@fx17.iad> <v39caf$1jtgk$1@dont-email.me> <v3a386$14g8$1@gal.iecc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 31 May 2024 14:36:48 +0200 (CEST) Injection-Info: dont-email.me; posting-host="e440adb04cf0b31541af4bae9d80c849"; logging-data="2351883"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18UWBVc/w65uZ/sPeaLi0EwDQTUDAc8EegyROOcipw87A==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:92+2PX/gSA8pL2yTTEDqnt52bSY= In-Reply-To: <v3a386$14g8$1@gal.iecc.com> Bytes: 3034 John Levine wrote: > According to Terje Mathisen <terje.mathisen@tmsw.no>: >>> It's almost like the perfect application of risc instruction design: >>> a long sequence of individual instructions of conditional branches, >>> bit field extracts, inserts, and shifts, is replace in HW by >>> a small number of muxes that can to the same in one clock. >> >> If that CU14 can also return the number of bytes consumed, along with >> the resulting 32-bit character, then it would be perfect. Is that what >> it is doing? > > You give it registers with two addresses and two lengths, and it > converts the source UTF-8 code points to destination UTF-32 until it > runs out of input, fills the output, gets an invalid character, or an > interrupt. It updates the addresses and lengths. Other than optionally > checking for invalid UTF-8 it does not interpret the code points. > > The condition code tells you which it was. If it was an interrupt, you just > branch back and keep going. > > There's an extra cost flag whether to test for invalid UTF-8. > > Read all about it: https://www.vm.ibm.com/library/other/22783213.pdf > > It's on page 7-251. > Thanks! I did read all of it, and it was pretty close to how I would have designed a sw function to do the same, except for the very funky ABI: Both source and destination _must_ be an even register number, with the following odd register providing the count/length. Just from this little snippet I'm pretty sure this instruction has a sizeable startup overhead, compiler support is probably in the form of an intrinsic that knows about the need to allocate two pairs of register, each pair starting at an even-numbered register. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"