Deutsch English Français Italiano |
<uvimv7$629s$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: "Mini" tags to reduce the number of op codes Date: Mon, 15 Apr 2024 10:02:46 +0200 Organization: A noiseless patient Spider Lines: 89 Message-ID: <uvimv7$629s$1@dont-email.me> References: <uuk100$inj$1@dont-email.me> <2024Apr3.192405@mips.complang.tuwien.ac.at> <86d1dd03deee83e339afa725524ab259@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Date: Mon, 15 Apr 2024 10:02:47 +0200 (CEST) Injection-Info: dont-email.me; posting-host="9d6aa58f39643660529f6affbcde0704"; logging-data="198972"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Q8obOmFV0bmQWPLwD+xHwmwY2B6ywk06mpuem5bpdhg==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:iik6fZTZy6fBNvgzpmfOpcp6JM4= In-Reply-To: <86d1dd03deee83e339afa725524ab259@www.novabbs.org> Bytes: 4132 MitchAlsup1 wrote: > Anton Ertl wrote: >=20 >> I have a similar problem for the carry and overflow bits in >> < http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf >, and chose to= >> let those bits not survive across calls; if there was a cheap solution= >> for the problem, it would eliminate this drawback of my idea. >=20 > My 66000 ISA can encode the mpn_add_n() inner loop in 5-instructions > whereas RISC-V encodes the inner loop in 11 instructions. >=20 > Source code: >=20 > void mpn_add_n( uint64_t sum, uint64_t a, unit64_t b, int n ) > { > =C2=A0=C2=A0=C2=A0 uint64_t c =3D 0; > =C2=A0=C2=A0=C2=A0 for( int i =3D 0; i < n; i++ ) > =C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 {c, sum[i]} =3D a[i] = + b[i] + c; > =C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0 return > } >=20 > Assembly code:: >=20 > =C2=A0=C2=A0=C2=A0 .global mpn_add_n > mpn_add_n: > =C2=A0=C2=A0=C2=A0 MOV=C2=A0=C2=A0 R5,#0=C2=A0=C2=A0=C2=A0=C2=A0 // c > =C2=A0=C2=A0=C2=A0 MOV=C2=A0=C2=A0 R6,#0=C2=A0=C2=A0=C2=A0=C2=A0 // i >=20 > =C2=A0=C2=A0=C2=A0 VEC=C2=A0=C2=A0 R7,{} > =C2=A0=C2=A0=C2=A0 LDD=C2=A0=C2=A0 R8,[R2,Ri<<3] > =C2=A0=C2=A0=C2=A0 LDD=C2=A0=C2=A0 R9,[R3,Ri<<3] > =C2=A0=C2=A0=C2=A0 CARRY R5,{{IO}} > =C2=A0=C2=A0=C2=A0 ADD=C2=A0=C2=A0 R10,R8,R9 > =C2=A0=C2=A0=C2=A0 STD=C2=A0=C2=A0 R10,[R1,Ri<<3] > =C2=A0=C2=A0=C2=A0 LOOP=C2=A0 LT,R6,#1,R4 > =C2=A0=C2=A0=C2=A0 RET >=20 > So, adding a few "bells and whistles" to RISC-V does give you a > performance gain (1.38=C3=83=E2=80=94); using a well designed ISA gives= you a > performance gain of 2.00=C3=83=E2=80=94 !! {{moral: don't stop too earl= y}} >=20 > Note that all the register bookkeeping has disappeared !! because > of the indexed memory reference form. >=20 > As I count executing instructions, VEC does not execute, nor does > CARRY--CARRY causes the subsequent ADD to take C input as carry and > the carry produced by ADD goes back in C. Loop performs the ADD-CMP- > BC sequence in a single instruction and in a single clock. ; RSI->a[n], RDX->b[n], RDI->sum[n], RCX=3D-n xor rax,rax ;; Clear carry next: mov rax,[rsi+rcx*8] adc rax,[rdx+rcx*8] mov [rdi+rcx*8],rax inc rcx jnz next The code above is 5 instructions, or 6 if we avoid the load-op, doing=20 two loads and one store, so it should only be limited by the latency of=20 the ADC, i.e. one or two cycles. In the non-OoO (i.e Pentium) days, I would have inverted the loop in=20 order to hide the latencies as much as possible, resulting in an inner=20 loop something like this: next: adc eax,ebx mov ebx,[edx+ecx*4] ; First cycle mov [edi+ecx*4],eax mov eax,[esi+ecx*4] ; Second cycle inc ecx jnz next ; Third cycle Terje --=20 - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"