Deutsch English Français Italiano |
<2acbec9e370181a0586943e3817141f5@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.misty.com!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: number of registers Date: Tue, 20 Aug 2024 21:05:41 +0000 Organization: Rocksolid Light Message-ID: <2acbec9e370181a0586943e3817141f5@www.novabbs.org> References: <v98asi$rulo$1@dont-email.me> <38055f09c5d32ab77b9e3f1c7b979fb4@www.novabbs.org> <v991kh$vu8g$1@dont-email.me> <e4352bad7240a6276e453226136ea0b3@www.novabbs.org> <va049n$2vnr7$1@dont-email.me> <a566ca0c8b5c41f402b60e8bac445e24@www.novabbs.org> <2024Aug20.090149@mips.complang.tuwien.ac.at> <a3a57791722f7c21c4218f5be6226e97@www.novabbs.org> <E65xO.87567$WT8.2770@fx45.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="3272793"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Site: $2y$10$HEVjqJQwzhB3brqGqyWdcuonx5V1r43QfY1phqXBVK/f1lqdWRTvC X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 Bytes: 4764 Lines: 104 On Tue, 20 Aug 2024 18:18:25 +0000, EricP wrote: > MitchAlsup1 wrote: >> On Tue, 20 Aug 2024 7:01:49 +0000, Anton Ertl wrote: >> >>> mitchalsup@aol.com (MitchAlsup1) writes: >>>> On Mon, 19 Aug 2024 18:52:39 +0000, Brett wrote: >>>> >>>>> MitchAlsup1 <mitchalsup@aol.com> wrote: >>>>>> The thing is that one you go down the GBOoO route, your lack of >>>>>> registers >>>>>> "namable in ASM" ceases to become a performance degrader. With >>>>>> renaming >>>>>> one can have R7 in use 40 times in a 100 instruction deep execution >>>>>> window. >>>>> >>>>> If this was true we would have 16 or even 8 visible registers, and all >>>>> would be fine. x86 does mostly fine with 16 >>> >>> And yet Intel went to 32 SIMD registers with AVX-512 (which admittedly >>> was first developed for an in-order microarchitecture) and are now >>> going to 32 GPRs with APX (no in-order excuse here). And IIRC the >>> announcement of APX says something about 10% fewer memory accesses or >>> somesuch. >>> >>>> Careful, here:: >>>> >>>> x86 has LD-OPs and LD-OP-STs which makes the 16 register file feel more >>>> like it has 20-22 registers. >>> >>> You feeling is strong (as shown by your repeatedly ignoring the >>> counterevidence), but wrong: >>> >>> LD-OPs and LD-OP-STs as on AMD64 and PDP-11 make the 16 registers >>> equivalent to 17 registers on a load/store architecture: >>> >>> Let's call the 17th register r16: >>> >>> On a load-store architecture you replace "LD-OP dest,src" with: >>> >>> ld r16=src >>> op dest,dest,r16 >>> >>> On a load-store architecture you replace "LD-OP-ST dest,src" with: >>> >>> ld r16=dest >>> op r16,r16,src >>> st dest=r16 >>> >>> For a VAX-like three-memory-argument instruction you need two extra >>> registers, r16 and r17: >>> >>> "mem1 = mem2 op mem3" becomes: >>> >>> ld r16=mem2 >>> ld r17=mem3 >>> op r16,r16,r17 >>> st mem1=r17 >>> >>> - anton >> >> >> That is not what I am talking about:: >> >> i = i + 1; >> as >> ADD [&i],#1 >> >> 1 instruction = 1 add, 1 LD and 1 ST. And >> >> i = i + j; >> as >> ADD Ri,[&j] >> >> In neither case is an extra register needed, and you may have >> several of these in a local sequence of code. ... > > On an in-order pipeline you need someplace to stash the temp value. > If you want, call it a special in-flight pseudo-register that only > exists for forwarding, it is still an identifier for a value that > is outside the architectural register set. The LD-OP-ST machine would have this built into the pipeline-- such that nobody has to name the carrier of the value down the pipeline. > I think it might need two registers if you can have two such > instructions in the pipeline back-to-back as there could be > multiple temp values in-flight at once > > ADD [&i],#1 > ADD [&j],#1 > > could have &i doing its store while &j is doing its load. > > On OoO, if the reservation stations are valueless, you need a real > physical register to stash the temp value as there is no guarantee > the OP part of the uOp will launch just when the LD part finishes > doing its thing and forwards the value. In the LD-OP-ST microarchitecture there would be some buffer that carries the intermediate values through the execution window. And, Yes, you can build a LD-OP-ST reservation station (Athlon and Opteron did). It becomes easier if there is some buffer to carry the intermediate values {address, operand, result}