Article <vd69n0$o0aj$1@dont-email.me>

Deutsch English Français Italiano
<vd69n0$o0aj$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Robert Finch <robfi680@gmail.com>
Newsgroups: comp.arch
Subject: Re: Misc: BGBCC targeting RV64G, initial results...
Date: Fri, 27 Sep 2024 08:50:37 -0400
Organization: A noiseless patient Spider
Lines: 131
Message-ID: <vd69n0$o0aj$1@dont-email.me>
References: <vd5uvd$mdgn$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 27 Sep 2024 14:50:41 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7c5098c7b1f41ab4d55ddf5c27ceca77";
	logging-data="786771"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19r41Gax/XIQOz5O/X0EsGM5oLwwTQ0LMI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:HFuzXfuEVHpGHB0ilaNBIp4Ofjc=
In-Reply-To: <vd5uvd$mdgn$1@dont-email.me>
Content-Language: en-US
Bytes: 5021

On 2024-09-27 5:46 a.m., BGB wrote:
> Had recently been working on getting BGBCC to target RV64G.
> 
> 
> So, for Doom, ".text" sizes at the moment:
>    BGBCC+XG2 : 292K (seems to have shrank in all this)
>    BGBCC+RV64: 438K
>    GCC  +RV64: 445K (PIE)
> 
> Doom Framerates:
>    BGBCC+XG2 : ~ 25-30
>    BGBCC+RV64: ~  8-14
>    GCC  +RV64: ~ 15-20
> 
> Start of E1M1 (framerate):
>    BGBCC+XG2 : ~ 25
>    BGBCC+RV64: ~ 12
>    GCC  +RV64: ~ 16
> 
> 
How does RV64 compare to BGBCC+XG2? IS it trying to execute more than 
one op at a time? I assume XG2 is.


> Comparably, it appears BGBCC leans more heavily into ADD and SLLI than 
> GCC does, with a fair chunk of the total instructions executed being 
> these two (more cycles are spent adding and shifting than doing memory 
> load or store...).

That seems to be a bit off. Mem ops are usually around 1/4 of 
instructions. Spending more than 25% on adds and shifts seems like a 
lot. Is it address calcs? Register loads of immediates?

> 
> Array Load/Store:
>    XG2: 1 instruction
>    RV64: 3 instructions
> 
> Global Variable:
>    XG2: 1 instruction (if within 2K of GBR)
>    RV64: 1 or 4 instructions
> 
> Constant Load into register (not R5):
>    XG2: 1 instruction
>    RV64: ~ 1-6
> 
> Operator with 32-bit immediate:
>    BJX2: 1 instruction;
>    RV64: 3 instructions.
> 
> Operator with 64-bit immediate:
>    BJX2: 1 instruction;
>    RV64: 4-9 instructions.
> 
> 
> Observations (RV64):
>    LUI+ADD can't actually represent all possible 32-bit constants.
>      Those near the signed-overflow point can't be expressed directly.
>    LUI+XOR can get a lot of these cases.
>      0x80000000ULL .. 0xFFFFFFFFULL can be partly covered by LUI+XOR.
> 
> For full 64-bit constants, generally need:
>    LUI+ADD+LUI+ADD+SLLI+ADD
> And, two registers.
> 
> There is currently an ugly edge case where BGBCC has to fall back to:
>    LUI X5, ImmHi
>    ADDI X5, X5, ImmMi
>    ( SLLI X5, X5, 12; ADD X5, X5, ImmFrag )+
> 
> Namely when needing to load a 64-bit constant and R5 is the only register.
> 
> So, if the compiler tries to emit, say:
>    AND R18, 0x7F7F7F7F7F7F7F7F, R10
> One may end up with, say:
>    LUI X5, 0x7F7F
>    ADDI X5, X5, 0x7F8
>    SLLI X5, X5, 12
>    ADDI X5, X5, 0xF7F
>    SLLI X5, X5, 12
>    ADDI X5, X5, 0x7F8
>    SLLI X5, X5, 12
>    ADDI X5, X5, 0xF7F
>    AND X10, X18, X5
> 
> Which, granted, kinda sucks...

> 
> This is partly because BGBCC's code generation currently assumes it can 
> just emit whatever here and the assembler will sort it out.
> 
> But, this case comes up rarely.
> In BJX2, 33 bit cases would be handled by Jumbo prefixes, and generally 
> 64-bit cases by loading the value into R0.
> 
> In RV64, this is needed for anything that doesn't fit in 12-bits; with 
> X5 taking on the role for scratch constants and similar.
> 
> ...
> 
> Floating point is still a bit of a hack, as it is currently implemented 
> by shuffling values between GPRs and FPRs, but sorta works.
> 
> 
> RV's selection of 3R compare ops is more limited:
>    RV: SLT, SLTU
>    BJX2: CMPEQ, CMPNE, CMPGT, CMPGE, CMPHI, CMPHS, TST, NTST
> A lot of these cases require a multi-op sequence to implement with just 
> SLT and SLTU.
> 
> 
> Doom isn't quite working correctly yet with BGBCC+RV64 (still has some 
> significant bugs), but in general game logic and rendering now seems to 
> be working.
> 
> 
> But, yeah, generating code for RV is more of a pain as the compiler has 
> to work harder to try to express what it wants to do in the instructions 
> that are available.
> 
> 
> But, yeah, it is what it is...
> 
> I sort of needed RV64 support for some possible later experiments (the 
> idea for the hybid XG3-CoEx ISA idea would depend on having working RV64 
> support as a prerequisite).
> 
> ...
>