Path: ...!3.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: Short Vectors Versus Long Vectors
Date: Fri, 3 May 2024 10:23:43 +0200
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <v126uf$e819$1@dont-email.me>
References: <v06vdb$17r2v$1@dont-email.me> <0D7YN.12641$oA33.7712@fx34.iad>
 <e9aa636b6b12f1ac0af12946151219f4@www.novabbs.org>
 <pycYN.33914$iMKd.26920@fx12.iad> <v0rtm6$2o1mj$5@dont-email.me>
 <66c323063468ebc28ce3b5ae8d28c2ac@www.novabbs.org>
 <v0su1a$32b6q$1@dont-email.me> <v0vhve$3o6jt$1@dont-email.me>
 <v0vrk4$3qch6$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 03 May 2024 10:23:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="58f9b6eb5362fbb494e46df1fdf360b8";
	logging-data="466985"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/z+4h6rZ+5THJlejdQn6KJ9nDDjmUJB+SAnKtoy6GisA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:EAh99Xoj4pSF3pC+9hjO+bhTDLw=
In-Reply-To: <v0vrk4$3qch6$1@dont-email.me>
Bytes: 2889

Thomas Koenig wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>> Thomas Koenig wrote:
>>> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>>>
>>>> Then contemplate for an instant that one would want SIMD instructions for
>>>> Complex numbers and Hamiltonian Quaterions......
>>>
>>> Quaternions would be a bit over the top, I tink.  Complex
>>> multiplication... implementing (e,f) = (a*c-b*d,a*d+b*c) is
>>>
>>>           fmul    Rt1,Rc,Rb
>>>           fmac    Re,Rd,Ra,Rt1
>>>
>>>           fmul    Rt2,Rd,Rb
>>>           fmac    Rf,Rc,Ra,-Rt2
>>>
>>> So, you'd need both operands on both lanes.  Not very SIMD-friendly,
>>> I would assume, but (probably) not impossible, either.
>>>
>> If you have the four operands spread across two SIMD registers, so
>> (Re,Im) in each, then you need an initial pair of permutes to make
>> flipped copies before you can start the fmul/fmac ops, right?
>>
>> This is exactly the kind of code where Mitch's transparent vector
>> processing would be very nice to have.
> 
> I'm actually not sure how that would help.  Could you elaborate?

Just that all his code is scalar, but when you have a bunch of these 
complex mul/mac operations in a loop, his hw will figure out the 
recurrences and run them as fast as possible, with all the (Re,Im) SIMD 
flips becoming NOPs.

Terje


-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"