Deutsch   English   Français   Italiano  
<veehid$9gnd$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Tonights Tradeoff - Background Execution Buffers
Date: Sat, 12 Oct 2024 14:10:01 -0500
Organization: A noiseless patient Spider
Lines: 113
Message-ID: <veehid$9gnd$1@dont-email.me>
References: <vbgdms$152jq$1@dont-email.me> <vbog6d$2p2rc$1@dont-email.me>
 <f2d99c60ba76af28c8b63b9628fb56fa@www.novabbs.org>
 <vc61e6$21skv$1@dont-email.me> <vc8gl4$2m5tp$1@dont-email.me>
 <vcv5uj$3arh6$1@dont-email.me>
 <37067f65c5982e4d03825b997b23c128@www.novabbs.org>
 <vd352q$3s1e$1@dont-email.me>
 <5f8ee3d3b2321ffa7e6c570882686b57@www.novabbs.org>
 <vd6a5e$o0aj$2@dont-email.me> <vdnpg4$3c9e$2@dont-email.me>
 <2024Oct4.081931@mips.complang.tuwien.ac.at> <vdp343$9d38$1@dont-email.me>
 <2024Oct5.114309@mips.complang.tuwien.ac.at> <ve5mpq$2jt5k$1@dont-email.me>
 <b7191e6ab8492ad36abb76cc966d3b0b@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 12 Oct 2024 21:10:05 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="49ba8ae5553edb0f18c63300f704f6f1";
	logging-data="312045"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+d2IuHlqH7nX+2GbG04In51h9I7ytTaQk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:5NJeX9sFx9QoxFBtAbGmxO4BGTg=
In-Reply-To: <b7191e6ab8492ad36abb76cc966d3b0b@www.novabbs.org>
Content-Language: en-US
Bytes: 6117

On 10/9/2024 11:19 AM, MitchAlsup1 wrote:
> On Wed, 9 Oct 2024 10:44:08 +0000, Robert Finch wrote:
> 
>>
>> Been thinking some about the carry and overflow and what to do about
>> register spills and reloads during expression processing. My thought was
>> that on the machine with 256 registers, simply allocate a ridiculous
>> number of registers for expression processing, for example 25 or even
>> 50. Then if the expression is too complex, have the compiler spit out an
>> error message to the programmer to simplify the expression. Remnants of
>> the ‘expression too complex’ error in BASIC.
> 
> Both completely unacceptable, and in your case completely unnecessary.
> in 967 subroutines I read out of My 66000 LLVM compile, I only have
> 3 cases of spill-fill, and that is with only 32 registers with uni-
> versal constants.
> 

Tends to be a bit higher IME, but granted my compiler is a bit more naive:
   Either it can static-assign everything;
   Or, it needs to use spill-and-fill.

In RISC-V mode:
   Static-assign everything, Leaf: 13%
   Partial assign, Leaf: 7.1%
   Static-assign everything, Non-Leaf: 1.8%
   Partial assign, Non-Leaf: 85%
   Average, ~ 4.6 variables static-assigned
     Out of 16.6 variables in a function.

In XG2 mode:
   Static-assign everything, Leaf: 16%
   Partial assign, Leaf: 0.7%
   Static-assign everything, Non-Leaf: 1.9%
   Partial assign, Non-Leaf: 82%
   Average, ~ 4.8 variables static-assigned
     Out of 16.8 variables in a function.

Theoretically, the number of static-assigned variables and fully 
static-assigned functions could be higher, but it looks like the 
compiler is excluding a lot of them for some reason (may need to look 
into it).



> Of the RISC-V code I read alongside with 32+32 registers, I counted 8.
> 

With 64 GPRs, there can be less spill/fill, and without any increase in 
the number of hardware registers vs RV64G's 32+32 scheme.

Rarely is register pressure equally balanced in this way, and more often 
it is one of:
High integer register pressure, little or no FP pressure (most code);
Very high FP register pressure, low integer pressure (say, unrolled 
matrix multiply).

Where, an even-split X/F scheme serves neither, and a bigger unified 
register space serves both.



Though, I guess the usual argument for split GPR/FPR spaces is that with 
unified register spaces, both ALU and FPU need to use the same pipeline.


But, if it is a shared register pipeline, one can also leverage ALU for 
a lot of edge cases, like FPU compare.

If one uses a longer pipeline for FPU ops vs ALU, it seems like one will 
still need to pay the costs of the longer FPU pipeline regardless of 
whether they are a single or separate register file.



Apparently, similar reasoning for the V extension using separate vector 
registers (vs just aliasing with the F registers), but I don't really 
want to implement the V extension.


Almost more tempting to do a cut-down non-conforming "V in F" style 
implementation:
* Aliases V to F register pairs;
** TBD if better to use V0..V15 or even-only numbering.
** Or, V0..V31 exist (if aliased) for 64b vectors,
** but only even for 128b.
* Will drop mask bits and other more advanced features.
* Trying to set up V properly would result in the instructions faulting.
** Could allow the possibility of adding proper V later.


> With those statistics and 256 registers, If you can't get to essentially
> 0 spill=fill the problem is not with your architecture but with your
> compiler.

With 256 registers, probably 99% of functions could use a "statically 
assign every variable to a register" strategy (though, assuming a case 
where one can reuse registers for temporary values).

Where, most temporary values are created and used within a single basic 
block, and if no references to that specific temporary exist outside of 
the basic block (and if not marked with a phi operator), the value of 
the temporary can simply be assumed to disappear at the end of a basic 
block. This can also allow temporaries to be allocated into scratch 
registers.


My own thought though is that going much bigger in terms of the main 
register file likely isn't worth it.

Only real compelling use for a bigger register file (much over 64) at 
the moment would be more for optimizing interrupts and context switches.