Deutsch English Français Italiano |
<ut14l1$27450$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: Cray style vectors Date: Fri, 15 Mar 2024 10:33:21 +0100 Organization: A noiseless patient Spider Lines: 65 Message-ID: <ut14l1$27450$1@dont-email.me> References: <upq0cr$6b5m$1@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur1h0v$emi4$1@newsreader4.netcologne.de> <86r0h6wyil.fsf@linuxsc.com> <ur7v2r$ipnu$1@newsreader4.netcologne.de> <861q91ulhs.fsf@linuxsc.com> <urkbsu$34rpk$1@dont-email.me> <864jdcsqmn.fsf@linuxsc.com> <usp8lp$7i96$1@dont-email.me> <2024Mar12.232336@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 15 Mar 2024 09:33:21 -0000 (UTC) Injection-Info: dont-email.me; posting-host="6f1799d9c53815839a977e52488ad30c"; logging-data="2330784"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Aapd2zzykDf4dhyg4CDpq1Jj+yksBmF2DNOQTFghV+A==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.1 Cancel-Lock: sha1:7FfZLJU8dVfS54CQmGx3wHH3L5A= In-Reply-To: <2024Mar12.232336@mips.complang.tuwien.ac.at> Bytes: 3438 Anton Ertl wrote: > Terje Mathisen <terje.mathisen@tmsw.no> writes: >> Tim Rentsch wrote: >>> Terje Mathisen <terje.mathisen@tmsw.no> writes: >>> >>>> If I really had to write a 64x64->128 MUL, with no widening MUL or >>>> MULH which returns the high half, then I would punt and do it using >>>> 32-bit parts (all variables are u64): [...] >>> >>> I wrote some code along the same lines. A difference is you >>> are considering unsigned multiplication, and I am considering >>> signed multiplication. >>> >> Signed mul is just a special case of unsigned mul, right? >> >> I.e. in case of a signed widening mul, you'd first extract the signs, >> convert the inputs to unsigned, then do the unsigned widening mul, >> before finally resotirng the sign as the XOR of the input signs? > > In Gforth we use: > > DCell mmul (Cell a, Cell b) /* signed multiply, mixed precision */ > { > DCell res; > > res = UD2D(ummul (a, b)); > if (a < 0) > res.hi -= b; > if (b < 0) > res.hi -= a; > return res; > } > > I have this technique from Andrew Haley. It relies on twos-complement > representation. Nice! Subtracting the results of having used the sign bit as part of the multiplication. Here you can probably schedule the fixup to happen in parallel with the actual multiplication: ;; inputs in r9 & r10, result in rdx:rax, rbx & rcx as scratch mov rax,r9 ;; All these can start in the first cycle mul r10 mov rbx,r9 ;; The MOV can be handled by the renamer sar r9,63 mov rcx,r10 ;; Ditto sar r10,63 and rbx,r9 ;; Second set of ops and rcx,r10 add rbx,rcx ;; Third cycle sub rdx,rbx ;; Do a single adjustment as soon as the MUL finishes Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"