Deutsch English Français Italiano |
<2024Aug21.195444@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: number of registers Date: Wed, 21 Aug 2024 17:54:44 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 48 Message-ID: <2024Aug21.195444@mips.complang.tuwien.ac.at> References: <v98asi$rulo$1@dont-email.me> <38055f09c5d32ab77b9e3f1c7b979fb4@www.novabbs.org> <v991kh$vu8g$1@dont-email.me> <e4352bad7240a6276e453226136ea0b3@www.novabbs.org> <va049n$2vnr7$1@dont-email.me> <a566ca0c8b5c41f402b60e8bac445e24@www.novabbs.org> <2024Aug20.090149@mips.complang.tuwien.ac.at> <a3a57791722f7c21c4218f5be6226e97@www.novabbs.org> <20240820204050.00003d56@yahoo.com> <48438024ccdbcc373e4cfa51d18066f5@www.novabbs.org> <2024Aug21.121312@mips.complang.tuwien.ac.at> Injection-Date: Wed, 21 Aug 2024 20:03:01 +0200 (CEST) Injection-Info: dont-email.me; posting-host="610b2bfe0b10fb60c5ef8f925c413124"; logging-data="4158745"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+DX4zLxiPAiFXdfYIV7w4" Cancel-Lock: sha1:QkCWT12BVWItd7DyB198tN3MVkc= X-newsreader: xrn 10.11 Bytes: 3318 anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >mitchalsup@aol.com (MitchAlsup1) writes: >>The point is that the cost of not getting allocated into a register >>is vastly lower--the count of instructions remains 1 while the >>latency increases. That increase in latency does not hurt those >>use once/seldom variables. > >Latency is not the issue in modern high-performance AMD64 cores, which >have zero-cycle store-to-load forwarding ><http://www.complang.tuwien.ac.at/anton/memdep/>. > >And yet, putting variables in registers gives a significant speedup: >On a Rocket Lake, numbers are times in seconds: > > sieve bubble matrix fib fft > 0.075 0.070 0.036 0.049 0.017 TOS in reg, RP in reg, IP in reg > 0.100 0.149 0.054 0.106 0.037 TOS in mem, RP in mem, IP write-through to mem > >In the first line, I used gforth-fast and tried to disable all >optimizations except those that keep certain variables in registers: > >gforth-fast --ss-states=1 --ss-number=31 --opt-ip-updates=0 onebench.fs > >I could not reduce the static superinstructions below 31 and still get >a result; I will have to investigate why, but that probably does not >make that much of a difference for several of these benchmarks. Fixed that, so now with gforth-fast --ss-states=1 --ss-number=0 --opt-ip-updates=0 onebench.fs sieve bubble matrix fib fft 0.069 0.074 0.036 0.052 0.017 TOS in reg, RP in reg, IP in reg 0.100 0.149 0.054 0.106 0.037 TOS in mem, RP in mem, IP write-through to mem Or on a Golden Cove: sieve bubble matrix fib fft 0.059 0.059 0.024 0.047 0.020 TOS in reg, RP in reg, IP in reg 0.108 0.156 0.065 0.098 0.037 TOS in mem, RP in mem, IP write-through to mem So even on these advanced cores with zero-cycle store-to-load forwarding it hurts quite a bit to keep variables in memory. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>