Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: number of registers (was: My 66000 and High word facility) Date: Tue, 20 Aug 2024 07:01:49 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 57 Message-ID: <2024Aug20.090149@mips.complang.tuwien.ac.at> References: <38055f09c5d32ab77b9e3f1c7b979fb4@www.novabbs.org> Injection-Date: Tue, 20 Aug 2024 09:17:48 +0200 (CEST) Injection-Info: dont-email.me; posting-host="e9a01421fd1657f6c834babb9ef5b959"; logging-data="3467060"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18wNzdOgju+qi1CiQqvvmve" Cancel-Lock: sha1:ucyDgxbEUJWjNtLhssht2HpeqSw= X-newsreader: xrn 10.11 Bytes: 2857 mitchalsup@aol.com (MitchAlsup1) writes: >On Mon, 19 Aug 2024 18:52:39 +0000, Brett wrote: > >> MitchAlsup1 wrote: >>> The thing is that one you go down the GBOoO route, your lack of >>> registers >>> "namable in ASM" ceases to become a performance degrader. With renaming >>> one can have R7 in use 40 times in a 100 instruction deep execution >>> window. >> >> If this was true we would have 16 or even 8 visible registers, and all >> would be fine. x86 does mostly fine with 16 And yet Intel went to 32 SIMD registers with AVX-512 (which admittedly was first developed for an in-order microarchitecture) and are now going to 32 GPRs with APX (no in-order excuse here). And IIRC the announcement of APX says something about 10% fewer memory accesses or somesuch. >Careful, here:: > >x86 has LD-OPs and LD-OP-STs which makes the 16 register file feel more >like it has 20-22 registers. You feeling is strong (as shown by your repeatedly ignoring the counterevidence), but wrong: LD-OPs and LD-OP-STs as on AMD64 and PDP-11 make the 16 registers equivalent to 17 registers on a load/store architecture: Let's call the 17th register r16: On a load-store architecture you replace "LD-OP dest,src" with: ld r16=src op dest,dest,r16 On a load-store architecture you replace "LD-OP-ST dest,src" with: ld r16=dest op r16,r16,src st dest=r16 For a VAX-like three-memory-argument instruction you need two extra registers, r16 and r17: "mem1 = mem2 op mem3" becomes: ld r16=mem2 ld r17=mem3 op r16,r16,r17 st mem1=r17 - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup,