Article <2024Oct12.122318@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2024Oct12.122318@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: is Vax addressing sane today
Date: Sat, 12 Oct 2024 10:23:18 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 36
Message-ID: <2024Oct12.122318@mips.complang.tuwien.ac.at>
References: <vbd6b9$g147$1@dont-email.me> <2024Sep10.094353@mips.complang.tuwien.ac.at> <vckf9d$178f2$1@dont-email.me> <O2DHO.184073$kxD8.113118@fx11.iad> <vcso7k$2s2da$1@dont-email.me> <efXIO.169388$1m96.45507@fx15.iad> <8f031f2b5082d97582b1231a060f2b9f@www.novabbs.org> <8DgJO.171468$1m96.17060@fx15.iad> <vd7peh$12kpl$2@dont-email.me> <KWUJO.41016$vtH3.33971@fx07.iad> <86msjr2bec.fsf@linuxsc.com> <vdbnlq$1pabr$1@dont-email.me> <20240929180026.00004160@yahoo.com>
Injection-Date: Sat, 12 Oct 2024 12:58:15 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="29d5d06e203f002c66a9502ffb11d396";
	logging-data="109941"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+sApgeVs/qIybhkCiGipCj"
Cancel-Lock: sha1:eVfCMc5YdL2T8cn0idR453coiPg=
X-newsreader: xrn 10.11
Bytes: 3178

Michael S <already5chosen@yahoo.com> writes:
>That's correct about intrinsics, but incorrect about ADCX/ADOX.
>The later can be moderately helpful in special situuations, esp.
>128b * 128b => 256b multiplication, but it is never necessary
>and for addition/sbtraction is not needed at all.

They are useful if there are two strings of additions.  This happens
naturally in wide multiplication (also beyond 256b results).  But it
also happens when you add three multi-precision numbers (say, X, Y,
Z): You need C for the carry of XYi=X[i]+Y[i]+C, and O for the carry
of XYZ[i]=XYi+Z[i]+O.  If you have ADCX/ADOX, you can do both
additions in one loop, so XYi can be in a register and does not need
to be stored .  If you don't have these instructions, only ADC, you
need one loop to compute X+Y and store the result in memory, and one
loop to compute XY+Z, i.e., the lack of ADCX/ADOX results in
substantial additional cost.

If you add 4 multi-precision numbers, AMD64 with ADX runs out of carry
bits, so you have to spend the overhead of an additional loop (but not
of two additional loops as without ADCX/ADOX).

With carry bits in the general purpose registers
<https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf> and 30 GPRs
(one is zero, one is sp), you can add 14 multi-precision numbers per
loop: 14 GPRs for source addresses, 1 GPR for the target address, 1
for the loop counter, 13 registers for loop-carried carry flags.

Of course, the question is if this kind of computation is needed
frequently enough to justify this kind of extension.  For
multi-precision multiplication and squaring, Intel considered the
frequency relevant enough to introduce ADCX/ADOX/MULX.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>