Path: ...!3.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen Newsgroups: comp.arch Subject: Re: Short Vectors Versus Long Vectors Date: Fri, 3 May 2024 10:23:43 +0200 Organization: A noiseless patient Spider Lines: 40 Message-ID: References: <0D7YN.12641$oA33.7712@fx34.iad> <66c323063468ebc28ce3b5ae8d28c2ac@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 03 May 2024 10:23:44 +0200 (CEST) Injection-Info: dont-email.me; posting-host="58f9b6eb5362fbb494e46df1fdf360b8"; logging-data="466985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/z+4h6rZ+5THJlejdQn6KJ9nDDjmUJB+SAnKtoy6GisA==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:EAh99Xoj4pSF3pC+9hjO+bhTDLw= In-Reply-To: Bytes: 2889 Thomas Koenig wrote: > Terje Mathisen schrieb: >> Thomas Koenig wrote: >>> MitchAlsup1 schrieb: >>> >>>> Then contemplate for an instant that one would want SIMD instructions for >>>> Complex numbers and Hamiltonian Quaterions...... >>> >>> Quaternions would be a bit over the top, I tink. Complex >>> multiplication... implementing (e,f) = (a*c-b*d,a*d+b*c) is >>> >>> fmul Rt1,Rc,Rb >>> fmac Re,Rd,Ra,Rt1 >>> >>> fmul Rt2,Rd,Rb >>> fmac Rf,Rc,Ra,-Rt2 >>> >>> So, you'd need both operands on both lanes. Not very SIMD-friendly, >>> I would assume, but (probably) not impossible, either. >>> >> If you have the four operands spread across two SIMD registers, so >> (Re,Im) in each, then you need an initial pair of permutes to make >> flipped copies before you can start the fmul/fmac ops, right? >> >> This is exactly the kind of code where Mitch's transparent vector >> processing would be very nice to have. > > I'm actually not sure how that would help. Could you elaborate? Just that all his code is scalar, but when you have a bunch of these complex mul/mac operations in a loop, his hw will figure out the recurrences and run them as fast as possible, with all the (Re,Im) SIMD flips becoming NOPs. Terje -- - "almost all programming can be viewed as an exercise in caching"