Deutsch English Français Italiano |
<acb76cee233f19672f2ad0380c9cd06e@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: My 66000 and High word facility Date: Mon, 12 Aug 2024 20:12:59 +0000 Organization: Rocksolid Light Message-ID: <acb76cee233f19672f2ad0380c9cd06e@www.novabbs.org> References: <v98asi$rulo$1@dont-email.me> <38055f09c5d32ab77b9e3f1c7b979fb4@www.novabbs.org> <v991kh$vu8g$1@dont-email.me> <2024Aug11.163333@mips.complang.tuwien.ac.at> <v9ath5$2qgnb$1@dont-email.me> <2024Aug12.082936@mips.complang.tuwien.ac.at> <130df049c4c97984986767736b5b037a@www.novabbs.org> <v9dnmv$3efnj$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="2358227"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Rslight-Site: $2y$10$OoLPv08BkAv3yzPnvGMkle8InZwJsmDZHHhjvrPbd3KOtm.JeJ6Ya X-Spam-Checker-Version: SpamAssassin 4.0.0 Bytes: 7887 Lines: 193 On Mon, 12 Aug 2024 19:27:22 +0000, BGB wrote: > On 8/12/2024 12:36 PM, MitchAlsup1 wrote: >> On Mon, 12 Aug 2024 6:29:36 +0000, Anton Ertl wrote: >> >>> Brett <ggtgp@yahoo.com> writes: >>>> Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >>>>> Brett <ggtgp@yahoo.com> writes: >>>>>> The lack of CPU’s with 64 registers is what makes for a market, >>>>>> that 4% >>>>>> that could benefit have no options to pick from. >>>>> >>>>> They had: >>>>> >>>>> SPARC: Ok, only 32 GPRs available at a time, but more in hardware >>>>> through the Window mechanism. >>>>> >>>>> AMD29K: IIRC a 128-register stack and 64 additional registers >>>>> >>>>> IA-64: 128 GPRs and 128 FPRs with register stack and rotating register >>>>> files to make good use of them. >>>> >>>> All antiques no longer available. >>> >>> SPARC is still available: <https://en.wikipedia.org/wiki/SPARC> says: >>> >>> |Fujitsu will also discontinue their SPARC production [...] end-of-sale >>> |in 2029, of UNIX servers and a year later for their mainframe. >>> >>> No word of when Oracle will discontinue (or has discontinued) sales, >>> but both companies introduced their last SPARC CPUs in 2017. >>> >>> In any case, my point still stands: these architectures were >>> available, and the large number of registers failed to give them a >>> decisive advantage. Maybe it even gave them a decisive disadvantage: >>> AMD29K and IA-64 never had OoO implementations, and SPARC got them >>> only with the Fujitsu SPARC64 V in 2002 and the Oracle SPARC T4 in >>> 2011, years after Intel, MIPS, HP switched to OoO im 1995/1996 and >>> Power and Alpha switched in 1998 (POWER3, 21264). >>> >>>>> Where is your 4% number coming from? >>>> >>>> The 4% number is poor memory and a guess. >>>> Here is an antique paper on the issue: >>>> >>>> https://www.eecs.umich.edu/techreports/cse/00/CSE-TR-434-00.pdf >>> >>> Interesting. I only skimmed the paper, but I read a lot about >>> inlining and interprocedural register allocation. SPARCs register >>> windows and AMD29K's and IA-64's register stacks were intended to be >>> useful for that, but somehow the other architectures did not suffer a >>> big-enough disadvantage to make them adopt one of these concepts, and >>> that's despite register windows/stacks working even for indirect calls >>> (e.g., method calls in the general case), where interprocedural >>> register allocation or inlining don't help. >>> >>> It seems to me that with OoO the cycle cost of spilling and refilling >>> on call boundaries was lowered: the spills can be delayed until the >>> computation is complete, and the refills can start early because the >>> stack pointer tends to be available early. >>> >>> And recent OoO CPUs even have zero-cycle store-to-load forwarding, so >>> even if the called function is short, the spilling and refilling >>> around it (if any) does not increase the latency of the value that's >>> spilled and refilled. But that consideration is only relevant for >>> Intel APX, ARM A64 and RISC-V went for 32 registers several years >>> before zero-cycle store-to-load-forwarding was implemented. >>> >>> One other optimization that they use the additional registers for is >>> "register promotion", i.e., putting values from memory into registers >>> for a while (if absence of aliasing can be proven). One interesting >>> aspect here is that register promotion with 64 or 256 registers (RP-64 >>> and RP-256) is usually not much better (if better at all) than >>> register promotion with 32 registers (RP-32); see Figure 1. So >>> register promotion does not make a strong case for more registers, >>> either, at least in this paper. >> >> With full access to constants, there is even less need to promote >> addresses or immediates into registers as you can simply poof them >> up anything you want one. > > > There are tradeoffs still, if constants need space to encode... > > Inline is still better than a memory load, granted. > > May make sense to consolidate multiple uses of a value into a register > rather than try encoding them as an immediate each time. See polpak:: r8_erf() r8_erf: ; @r8_erf ; %bb.0: fabs r2,r1 fcmp r3,r2,#0x3EF00000 bngt r3,.LBB141_5 ; %bb.1: fcmp r3,r2,#4 bngt r3,.LBB141_6 ; %bb.2: fcmp r3,r2,#0x403A8B020C49BA5E bnlt r3,.LBB141_7 ; %bb.3: fmul r3,r1,r1 fdiv r3,#1,r3 mov r4,#0x3F90B4FB18B485C7 fmac r4,r3,r4,#0x3FD38A78B9F065F6 fadd r5,r3,#0x40048C54508800DB fmac r4,r3,r4,#0x3FD70FE40E2425B8 fmac r5,r3,r5,#0x3FFDF79D6855F0AD fmac r4,r3,r4,#0x3FC0199D980A842F fmac r5,r3,r5,#0x3FE0E4993E122C39 fmac r4,r3,r4,#0x3F9078448CD6C5B5 fmac r5,r3,r5,#0x3FAEFC42917D7DE7 fmac r4,r3,r4,#0x3F4595FD0D71E33C fmul r4,r3,r4 fmac r3,r3,r5,#0x3F632147A014BAD1 fdiv r3,r4,r3 fadd r3,#0x3FE20DD750429B6D,-r3 fdiv r3,r3,r2 br .LBB141_4 LBB141_5: fmul r3,r1,r1 fcmp r2,r2,#0x3C9FFE5AB7E8AD5E sra r2,r2,#8,#1 cvtsd r4,#0 mux r2,r2,r3,r4 mov r3,#0x3FC7C7905A31C322 fmac r3,r2,r3,#0x400949FB3ED443E9 fadd r4,r2,#0x403799EE342FB2DE fmac r3,r2,r3,#0x405C774E4D365DA3 fmac r4,r2,r4,#0x406E80C9D57E55B8 fmac r3,r2,r3,#0x407797C38897528B fmac r4,r2,r4,#0x40940A77529CADC8 fmac r3,r2,r3,#0x40A912C1535D121A fmul r1,r3,r1 fmac r2,r2,r4,#0x40A63879423B87AD fdiv r2,r1,r2 mov r1,r2 ret LBB141_6: mov r3,#0x3E571E703C5F5815 fmac r3,r2,r3,#0x3FE20DD508EB103E fadd r4,r2,#0x402F7D66F486DED5 fmac r3,r2,r3,#0x4021C42C35B8BC02 fmac r4,r2,r4,#0x405D6C69B0FFCDE7 fmac r3,r2,r3,#0x405087A0D1C420D0 fmac r4,r2,r4,#0x4080C972E588749E fmac r3,r2,r3,#0x4072AA2986ABA462 fmac r4,r2,r4,#0x4099558EECA29D27 fmac r3,r2,r3,#0x408B8F9E262B9FA3 fmac r4,r2,r4,#0x40A9B599356D1202 fmac r3,r2,r3,#0x409AC030C15DC8D7 fmac r4,r2,r4,#0x40B10A9E7CB10E86 fmac r3,r2,r3,#0x40A0062821236F6B fmac r4,r2,r4,#0x40AADEBC3FC90DBD fmac r3,r2,r3,#0x4093395B7FD2FC8E fmac r4,r2,r4,#0x4093395B7FD35F61 fdiv r3,r3,r4 LBB141_4: fmul r4,r2,#16 fmul r4,r4,#0x3D800000 rnd r4,r4,#5 fadd r5,r2,-r4 fadd r2,r2,r4 fmul r4,r4,-r4 fexp r4,r4 fmul r2,r2,-r5 fexp r2,r2 fmul r2,r4,r2 fadd r2,#0,-r2 fmac r2,r2,r3,#0x3F000000 fadd r2,r2,#0x3F000000 pdlt r1,T fadd r2,#0,-r2 mov r1,r2 ret LBB141_7: ========== REMAINDER OF ARTICLE TRUNCATED ==========