Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Thu, 19 Sep 2024 08:34:19 +0200 Organization: A noiseless patient Spider Lines: 52 Message-ID: References: <2024Sep10.101932@mips.complang.tuwien.ac.at> <2024Sep11.123824@mips.complang.tuwien.ac.at> <867cbhgozo.fsf@linuxsc.com> <20240912142948.00002757@yahoo.com> <20240915001153.000029bf@yahoo.com> <20240915154038.0000016e@yahoo.com> <32a15246310ea544570564a6ea100cab@www.novabbs.org> <50cd3ba7c0cbb587a55dd67ae46fc9ce@www.novabbs.org> <7cBGO.169512$_o_3.43954@fx17.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Date: Thu, 19 Sep 2024 08:34:20 +0200 (CEST) Injection-Info: dont-email.me; posting-host="9dcff06db3655d57972fd8e5503bf701"; logging-data="510975"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/JiOWWXjf8TOQFQt/tGZ3BQln01yVVMF3FUq2uSbxknw==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.19 Cancel-Lock: sha1:WvYWgK2jFWQrgdj631/jB4CCSuI= In-Reply-To: <7cBGO.169512$_o_3.43954@fx17.iad> Bytes: 3293 EricP wrote: > Terje Mathisen wrote: >> >> Very nice! >> >> This means that you can do integer IMAC(), right? >> >> (hi, lo) =3D imac(a, b, c); // =3D=3D a*b+c >> >> The only thing even nicer from the perspective of writing arbitrary=20 >> precision library code would be IMAA, i.e. a*b+c+d since that is the=20 >> largest combination which is guaranteed to never overflow the double=20 >> register target field. >> >=20 > I thought about IMAC but it was a bit too much. > And unlike FMA there is no precision gain in IMAC, just convenience. > IMAC requires 6 register specifiers, 2 dest and 4 source if you don't > care about overflow/carry on the accumulate. > =C2=A0 2-wide =3D 2-wide + narrow * narrow No, no! IMAC is three in, two out, so in your syntax: W =3D N*N+N or (rhi, rlo) =3D imac(r0,r1,r2) > It needs 7 registers, 3 dest and 4 source if you want overflow/carry > on the accumulate. > =C2=A0 3-wide =3D 2-wide + narrow * narrow Otoh, if you do have all the wide add forms you outlined below,=20 including the "full adder" with three inputs and a wirde/pair output,=20 then the carry propagations do become easier, and just doing (a,b) =3D muluw(e,f) (a,b) =3D addw1(a,b,g) would do the same as my suggested (a,b) =3D imac(a,f,g) Anyway, very nice! Terje --=20 - "almost all programming can be viewed as an exercise in caching"