Article <acb76cee233f19672f2ad0380c9cd06e@www.novabbs.org>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <acb76cee233f19672f2ad0380c9cd06e@www.novabbs.org>
Deutsch English Français Italiano
<acb76cee233f19672f2ad0380c9cd06e@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: My 66000 and High word facility
Date: Mon, 12 Aug 2024 20:12:59 +0000
Organization: Rocksolid Light
Message-ID: <acb76cee233f19672f2ad0380c9cd06e@www.novabbs.org>
References: <v98asi$rulo$1@dont-email.me> <38055f09c5d32ab77b9e3f1c7b979fb4@www.novabbs.org> <v991kh$vu8g$1@dont-email.me> <2024Aug11.163333@mips.complang.tuwien.ac.at> <v9ath5$2qgnb$1@dont-email.me> <2024Aug12.082936@mips.complang.tuwien.ac.at> <130df049c4c97984986767736b5b037a@www.novabbs.org> <v9dnmv$3efnj$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2358227"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Rslight-Site: $2y$10$OoLPv08BkAv3yzPnvGMkle8InZwJsmDZHHhjvrPbd3KOtm.JeJ6Ya
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 7887
Lines: 193

On Mon, 12 Aug 2024 19:27:22 +0000, BGB wrote:

> On 8/12/2024 12:36 PM, MitchAlsup1 wrote:
>> On Mon, 12 Aug 2024 6:29:36 +0000, Anton Ertl wrote:
>>
>>> Brett <ggtgp@yahoo.com> writes:
>>>> Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>>>> Brett <ggtgp@yahoo.com> writes:
>>>>>> The lack of CPU’s with 64 registers is what makes for a market,
>>>>>> that 4%
>>>>>> that could benefit have no options to pick from.
>>>>>
>>>>> They had:
>>>>>
>>>>> SPARC: Ok, only 32 GPRs available at a time, but more in hardware
>>>>> through the Window mechanism.
>>>>>
>>>>> AMD29K: IIRC a 128-register stack and 64 additional registers
>>>>>
>>>>> IA-64: 128 GPRs and 128 FPRs with register stack and rotating register
>>>>> files to make good use of them.
>>>>
>>>> All antiques no longer available.
>>>
>>> SPARC is still available: <https://en.wikipedia.org/wiki/SPARC> says:
>>>
>>> |Fujitsu will also discontinue their SPARC production [...] end-of-sale
>>> |in 2029, of UNIX servers and a year later for their mainframe.
>>>
>>> No word of when Oracle will discontinue (or has discontinued) sales,
>>> but both companies introduced their last SPARC CPUs in 2017.
>>>
>>> In any case, my point still stands: these architectures were
>>> available, and the large number of registers failed to give them a
>>> decisive advantage.  Maybe it even gave them a decisive disadvantage:
>>> AMD29K and IA-64 never had OoO implementations, and SPARC got them
>>> only with the Fujitsu SPARC64 V in 2002 and the Oracle SPARC T4 in
>>> 2011, years after Intel, MIPS, HP switched to OoO im 1995/1996 and
>>> Power and Alpha switched in 1998 (POWER3, 21264).
>>>
>>>>> Where is your 4% number coming from?
>>>>
>>>> The 4% number is poor memory and a guess.
>>>> Here is an antique paper on the issue:
>>>>
>>>> https://www.eecs.umich.edu/techreports/cse/00/CSE-TR-434-00.pdf
>>>
>>> Interesting.  I only skimmed the paper, but I read a lot about
>>> inlining and interprocedural register allocation.  SPARCs register
>>> windows and AMD29K's and IA-64's register stacks were intended to be
>>> useful for that, but somehow the other architectures did not suffer a
>>> big-enough disadvantage to make them adopt one of these concepts, and
>>> that's despite register windows/stacks working even for indirect calls
>>> (e.g., method calls in the general case), where interprocedural
>>> register allocation or inlining don't help.
>>>
>>> It seems to me that with OoO the cycle cost of spilling and refilling
>>> on call boundaries was lowered: the spills can be delayed until the
>>> computation is complete, and the refills can start early because the
>>> stack pointer tends to be available early.
>>>
>>> And recent OoO CPUs even have zero-cycle store-to-load forwarding, so
>>> even if the called function is short, the spilling and refilling
>>> around it (if any) does not increase the latency of the value that's
>>> spilled and refilled.  But that consideration is only relevant for
>>> Intel APX, ARM A64 and RISC-V went for 32 registers several years
>>> before zero-cycle store-to-load-forwarding was implemented.
>>>
>>> One other optimization that they use the additional registers for is
>>> "register promotion", i.e., putting values from memory into registers
>>> for a while (if absence of aliasing can be proven).  One interesting
>>> aspect here is that register promotion with 64 or 256 registers (RP-64
>>> and RP-256) is usually not much better (if better at all) than
>>> register promotion with 32 registers (RP-32); see Figure 1.  So
>>> register promotion does not make a strong case for more registers,
>>> either, at least in this paper.
>>
>> With full access to constants, there is even less need to promote
>> addresses or immediates into registers as you can simply poof them
>> up anything you want one.
>
>
> There are tradeoffs still, if constants need space to encode...
>
> Inline is still better than a memory load, granted.
>
> May make sense to consolidate multiple uses of a value into a register
> rather than try encoding them as an immediate each time.

See polpak:: r8_erf()


r8_erf:                                 ; @r8_erf
; %bb.0:
	fabs	r2,r1
	fcmp	r3,r2,#0x3EF00000
	bngt	r3,.LBB141_5
; %bb.1:
	fcmp	r3,r2,#4
	bngt	r3,.LBB141_6
; %bb.2:
	fcmp	r3,r2,#0x403A8B020C49BA5E
	bnlt	r3,.LBB141_7
; %bb.3:
	fmul	r3,r1,r1
	fdiv	r3,#1,r3
	mov	r4,#0x3F90B4FB18B485C7
	fmac	r4,r3,r4,#0x3FD38A78B9F065F6
	fadd	r5,r3,#0x40048C54508800DB
	fmac	r4,r3,r4,#0x3FD70FE40E2425B8
	fmac	r5,r3,r5,#0x3FFDF79D6855F0AD
	fmac	r4,r3,r4,#0x3FC0199D980A842F
	fmac	r5,r3,r5,#0x3FE0E4993E122C39
	fmac	r4,r3,r4,#0x3F9078448CD6C5B5
	fmac	r5,r3,r5,#0x3FAEFC42917D7DE7
	fmac	r4,r3,r4,#0x3F4595FD0D71E33C
	fmul	r4,r3,r4
	fmac	r3,r3,r5,#0x3F632147A014BAD1
	fdiv	r3,r4,r3
	fadd	r3,#0x3FE20DD750429B6D,-r3
	fdiv	r3,r3,r2
	br	.LBB141_4
LBB141_5:
	fmul	r3,r1,r1
	fcmp	r2,r2,#0x3C9FFE5AB7E8AD5E
	sra	r2,r2,#8,#1
	cvtsd	r4,#0
	mux	r2,r2,r3,r4
	mov	r3,#0x3FC7C7905A31C322
	fmac	r3,r2,r3,#0x400949FB3ED443E9
	fadd	r4,r2,#0x403799EE342FB2DE
	fmac	r3,r2,r3,#0x405C774E4D365DA3
	fmac	r4,r2,r4,#0x406E80C9D57E55B8
	fmac	r3,r2,r3,#0x407797C38897528B
	fmac	r4,r2,r4,#0x40940A77529CADC8
	fmac	r3,r2,r3,#0x40A912C1535D121A
	fmul	r1,r3,r1
	fmac	r2,r2,r4,#0x40A63879423B87AD
	fdiv	r2,r1,r2
	mov	r1,r2
	ret
LBB141_6:
	mov	r3,#0x3E571E703C5F5815
	fmac	r3,r2,r3,#0x3FE20DD508EB103E
	fadd	r4,r2,#0x402F7D66F486DED5
	fmac	r3,r2,r3,#0x4021C42C35B8BC02
	fmac	r4,r2,r4,#0x405D6C69B0FFCDE7
	fmac	r3,r2,r3,#0x405087A0D1C420D0
	fmac	r4,r2,r4,#0x4080C972E588749E
	fmac	r3,r2,r3,#0x4072AA2986ABA462
	fmac	r4,r2,r4,#0x4099558EECA29D27
	fmac	r3,r2,r3,#0x408B8F9E262B9FA3
	fmac	r4,r2,r4,#0x40A9B599356D1202
	fmac	r3,r2,r3,#0x409AC030C15DC8D7
	fmac	r4,r2,r4,#0x40B10A9E7CB10E86
	fmac	r3,r2,r3,#0x40A0062821236F6B
	fmac	r4,r2,r4,#0x40AADEBC3FC90DBD
	fmac	r3,r2,r3,#0x4093395B7FD2FC8E
	fmac	r4,r2,r4,#0x4093395B7FD35F61
	fdiv	r3,r3,r4
LBB141_4:
	fmul	r4,r2,#16
	fmul	r4,r4,#0x3D800000
	rnd	r4,r4,#5
	fadd	r5,r2,-r4
	fadd	r2,r2,r4
	fmul	r4,r4,-r4
	fexp	r4,r4
	fmul	r2,r2,-r5
	fexp	r2,r2
	fmul	r2,r4,r2
	fadd	r2,#0,-r2
	fmac	r2,r2,r3,#0x3F000000
	fadd	r2,r2,#0x3F000000
	pdlt	r1,T
	fadd	r2,#0,-r2
	mov	r1,r2
	ret
LBB141_7:
========== REMAINDER OF ARTICLE TRUNCATED ==========