Deutsch English Français Italiano |
<2024Oct12.122318@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: is Vax addressing sane today Date: Sat, 12 Oct 2024 10:23:18 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 36 Message-ID: <2024Oct12.122318@mips.complang.tuwien.ac.at> References: <vbd6b9$g147$1@dont-email.me> <2024Sep10.094353@mips.complang.tuwien.ac.at> <vckf9d$178f2$1@dont-email.me> <O2DHO.184073$kxD8.113118@fx11.iad> <vcso7k$2s2da$1@dont-email.me> <efXIO.169388$1m96.45507@fx15.iad> <8f031f2b5082d97582b1231a060f2b9f@www.novabbs.org> <8DgJO.171468$1m96.17060@fx15.iad> <vd7peh$12kpl$2@dont-email.me> <KWUJO.41016$vtH3.33971@fx07.iad> <86msjr2bec.fsf@linuxsc.com> <vdbnlq$1pabr$1@dont-email.me> <20240929180026.00004160@yahoo.com> Injection-Date: Sat, 12 Oct 2024 12:58:15 +0200 (CEST) Injection-Info: dont-email.me; posting-host="29d5d06e203f002c66a9502ffb11d396"; logging-data="109941"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sApgeVs/qIybhkCiGipCj" Cancel-Lock: sha1:eVfCMc5YdL2T8cn0idR453coiPg= X-newsreader: xrn 10.11 Bytes: 3178 Michael S <already5chosen@yahoo.com> writes: >That's correct about intrinsics, but incorrect about ADCX/ADOX. >The later can be moderately helpful in special situuations, esp. >128b * 128b => 256b multiplication, but it is never necessary >and for addition/sbtraction is not needed at all. They are useful if there are two strings of additions. This happens naturally in wide multiplication (also beyond 256b results). But it also happens when you add three multi-precision numbers (say, X, Y, Z): You need C for the carry of XYi=X[i]+Y[i]+C, and O for the carry of XYZ[i]=XYi+Z[i]+O. If you have ADCX/ADOX, you can do both additions in one loop, so XYi can be in a register and does not need to be stored . If you don't have these instructions, only ADC, you need one loop to compute X+Y and store the result in memory, and one loop to compute XY+Z, i.e., the lack of ADCX/ADOX results in substantial additional cost. If you add 4 multi-precision numbers, AMD64 with ADX runs out of carry bits, so you have to spend the overhead of an additional loop (but not of two additional loops as without ADCX/ADOX). With carry bits in the general purpose registers <https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf> and 30 GPRs (one is zero, one is sp), you can add 14 multi-precision numbers per loop: 14 GPRs for source addresses, 1 GPR for the target address, 1 for the loop counter, 13 registers for loop-carried carry flags. Of course, the question is if this kind of computation is needed frequently enough to justify this kind of extension. For multi-precision multiplication and squaring, Intel considered the frequency relevant enough to introduce ADCX/ADOX/MULX. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>