Article <uv56ec$ooj6$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <uv56ec$ooj6$1@dont-email.me>
Deutsch English Français Italiano
<uv56ec$ooj6$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Tue, 9 Apr 2024 22:01:00 -0700
Organization: A noiseless patient Spider
Lines: 146
Message-ID: <uv56ec$ooj6$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
 <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
 <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
 <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
 <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
 <uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me>
 <uv46rg$e4nb$1@dont-email.me>
 <a81256dbd4f121a9345b151b1280162f@www.novabbs.org>
 <uv4ghh$gfsv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 10 Apr 2024 05:01:01 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="1e0154287d270c974cd6798ddf950547";
	logging-data="811622"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/CyFgGugYkm242dRVEyXCRk9xl9KrwAQs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:LgSgyQw7ccsy/Cp+bxtc2xFNgWY=
Content-Language: en-US
In-Reply-To: <uv4ghh$gfsv$1@dont-email.me>
Bytes: 7236

On 4/9/2024 3:47 PM, BGB-Alt wrote:
> On 4/9/2024 4:05 PM, MitchAlsup1 wrote:
>> BGB wrote:
>>
>>> On 4/9/2024 1:24 PM, Thomas Koenig wrote:
>>>> I wrote:
>>>>
>>>>> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>>>>>> Thomas Koenig wrote:
>>>>>>
>>>> Maybe one more thing: In order to justify the more complex encoding,
>>>> I was going for 64 registers, and that didn't work out too well
>>>> (missing bits).
>>>>
>>>> Having learned about M-Core in the meantime, pure 32-register,
>>>> 21-bit instruction ISA might actually work better.
>>
>>
>>> For 32-bit instructions at least, 64 GPRs can work out OK.
>>
>>> Though, the gain of 64 over 32 seems to be fairly small for most 
>>> "typical" code, mostly bringing a benefit if one is spending a lot of 
>>> CPU time in functions that have large numbers of local variables all 
>>> being used at the same time.
>>
>>
>>> Seemingly:
>>> 16/32/48 bit instructions, with 32 GPRs, seems likely optimal for 
>>> code density;
>>> 32/64/96 bit instructions, with 64 GPRs, seems likely optimal for 
>>> performance.
>>
>>> Where, 16 GPRs isn't really enough (lots of register spills), and 128 
>>> GPRs is wasteful (would likely need lots of monster functions with 
>>> 250+ local variables to make effective use of this, *, which probably 
>>> isn't going to happen).
>>
>> 16 GPRs would be "almost" enough if IP, SP, FP, TLS, GOT were not part 
>> of GPRs AND you have good access to constants.
>>
> 
> On the main ISA's I had tried to generate code for, 16 GPRs was kind of 
> a pain as it resulted in fairly high spill rates.
> 
> Though, it would probably be less bad if the compiler was able to use 
> all of the registers at the same time without stepping on itself (such 
> as dealing with register allocation involving scratch registers while 
> also not conflicting with the use of function arguments, ...).
> 
> 
> My code generators had typically only used callee save registers for 
> variables in basic blocks which ended in a function call (in my compiler 
> design, both function calls and branches terminating the current 
> basic-block).
> 
> On SH, the main way of getting constants (larger than 8 bits) was via 
> PC-relative memory loads, which kinda sucked.
> 
> 
> This is slightly less bad on x86-64, since one can use memory operands 
> with most instructions, and the CPU tends to deal fairly well with code 
> that has lots of spill-and-fill. This along with instructions having 
> access to 32-bit immediate values.
> 
> 
>>> *: Where, it appears it is most efficient (for non-leaf functions) if 
>>> the number of local variables is roughly twice that of the number of 
>>> CPU registers. If more local variables than this, then spill/fill 
>>> rate goes up significantly, and if less, then the registers aren't 
>>> utilized as effectively.
>>
>>> Well, except in "tiny leaf" functions, where the criteria is instead 
>>> that the number of local variables be less than the number of scratch 
>>> registers. However, for many/most small leaf functions, the total 
>>> number of variables isn't all that large either.
>>
>> The vast majority of leaf functions use less than 16 GPRs, given one has
>> a SP not part of GPRs {including arguments and return values}. Once 
>> one starts placing things like memove(), memset(), sin(), cos(), 
>> exp(), log()
>> in the ISA, it goes up even more.
>>
> 
> Yeah.
> 
> Things like memcpy/memmove/memset/etc, are function calls in cases when 
> not directly transformed into register load/store sequences.
> 
> Did end up with an intermediate "memcpy slide", which can handle medium 
> size memcpy and memset style operations by branching into a slide.
> 
> 
> 
> As noted, on a 32 GPR machine, most leaf functions can fit entirely in 
> scratch registers. On a 64 GPR machine, this percentage is slightly 
> higher (but, not significantly, since there are few leaf functions 
> remaining at this point).
> 
> 
> If one had a 16 GPR machine with 6 usable scratch registers, it is a 
> little harder though (as typically these need to cover both any 
> variables used by the function, and any temporaries used, ...). There 
> are a whole lot more leaf functions that exceed a limit of 6 than of 14.
> 
> But, say, a 32 GPR machine could still do well here.
> 
> 
> Note that there are reasons why I don't claim 64 GPRs as a large 
> performance advantage:
> On programs like Doom, the difference is small at best.
> 
> 
> It mostly effects things like GLQuake in my case, mostly because TKRA-GL 
> has a lot of functions with a large numbers of local variables (some 
> exceeding 100 local variables).
> 
> Partly though this is due to code that is highly inlined and unrolled 
> and uses lots of variables tending to perform better in my case (and 
> tightly looping code, with lots of small functions, not so much...).
> 
> 
>>
>>> Where, function categories:
>>>    Tiny Leaf:
>>>      Everything fits in scratch registers, no stack frame, no calls.
>>>    Leaf:
>>>      No function calls (either explicit or implicit);
>>>      Will have a stack frame.
>>>    Non-Leaf:
>>>      May call functions, has a stack frame.
>>
>> You are forgetting about FP, GOT, TLS, and whatever resources are 
>> required
>> to do try-throw-catch stuff as demanded by the source language.
>>
> 
> Yeah, possibly true.
> 
> In my case:
>    There is no frame pointer, as BGBCC doesn't use one;
>      All stack-frames are fixed size, VLA's and alloca use the heap;
>    GOT, N/A in my ABI (stuff is GBR relative, but GBR is not a GPR);
>    TLS, accessed via TBR.[...]

alloca using the heap? Strange to me...