Deutsch   English   Français   Italiano  
<vd5uvd$mdgn$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Misc: BGBCC targeting RV64G, initial results...
Date: Fri, 27 Sep 2024 04:46:01 -0500
Organization: A noiseless patient Spider
Lines: 118
Message-ID: <vd5uvd$mdgn$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 27 Sep 2024 11:47:25 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="bb109af784cbe3a996c54fcfa36067e1";
	logging-data="734743"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/ErKB+x8PqxKlJjsdNAxUF5srfuzaOh9A="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:FLkc/Z7KqRNQycF05ZMJq3GSwSs=
Content-Language: en-US
Bytes: 4288

Had recently been working on getting BGBCC to target RV64G.


So, for Doom, ".text" sizes at the moment:
   BGBCC+XG2 : 292K (seems to have shrank in all this)
   BGBCC+RV64: 438K
   GCC  +RV64: 445K (PIE)

Doom Framerates:
   BGBCC+XG2 : ~ 25-30
   BGBCC+RV64: ~  8-14
   GCC  +RV64: ~ 15-20

Start of E1M1 (framerate):
   BGBCC+XG2 : ~ 25
   BGBCC+RV64: ~ 12
   GCC  +RV64: ~ 16


Comparably, it appears BGBCC leans more heavily into ADD and SLLI than 
GCC does, with a fair chunk of the total instructions executed being 
these two (more cycles are spent adding and shifting than doing memory 
load or store...).

Array Load/Store:
   XG2: 1 instruction
   RV64: 3 instructions

Global Variable:
   XG2: 1 instruction (if within 2K of GBR)
   RV64: 1 or 4 instructions

Constant Load into register (not R5):
   XG2: 1 instruction
   RV64: ~ 1-6

Operator with 32-bit immediate:
   BJX2: 1 instruction;
   RV64: 3 instructions.

Operator with 64-bit immediate:
   BJX2: 1 instruction;
   RV64: 4-9 instructions.


Observations (RV64):
   LUI+ADD can't actually represent all possible 32-bit constants.
     Those near the signed-overflow point can't be expressed directly.
   LUI+XOR can get a lot of these cases.
     0x80000000ULL .. 0xFFFFFFFFULL can be partly covered by LUI+XOR.

For full 64-bit constants, generally need:
   LUI+ADD+LUI+ADD+SLLI+ADD
And, two registers.

There is currently an ugly edge case where BGBCC has to fall back to:
   LUI X5, ImmHi
   ADDI X5, X5, ImmMi
   ( SLLI X5, X5, 12; ADD X5, X5, ImmFrag )+

Namely when needing to load a 64-bit constant and R5 is the only register.

So, if the compiler tries to emit, say:
   AND R18, 0x7F7F7F7F7F7F7F7F, R10
One may end up with, say:
   LUI X5, 0x7F7F
   ADDI X5, X5, 0x7F8
   SLLI X5, X5, 12
   ADDI X5, X5, 0xF7F
   SLLI X5, X5, 12
   ADDI X5, X5, 0x7F8
   SLLI X5, X5, 12
   ADDI X5, X5, 0xF7F
   AND X10, X18, X5

Which, granted, kinda sucks...

This is partly because BGBCC's code generation currently assumes it can 
just emit whatever here and the assembler will sort it out.

But, this case comes up rarely.
In BJX2, 33 bit cases would be handled by Jumbo prefixes, and generally 
64-bit cases by loading the value into R0.

In RV64, this is needed for anything that doesn't fit in 12-bits; with 
X5 taking on the role for scratch constants and similar.

....

Floating point is still a bit of a hack, as it is currently implemented 
by shuffling values between GPRs and FPRs, but sorta works.


RV's selection of 3R compare ops is more limited:
   RV: SLT, SLTU
   BJX2: CMPEQ, CMPNE, CMPGT, CMPGE, CMPHI, CMPHS, TST, NTST
A lot of these cases require a multi-op sequence to implement with just 
SLT and SLTU.


Doom isn't quite working correctly yet with BGBCC+RV64 (still has some 
significant bugs), but in general game logic and rendering now seems to 
be working.


But, yeah, generating code for RV is more of a pain as the compiler has 
to work harder to try to express what it wants to do in the instructions 
that are available.


But, yeah, it is what it is...

I sort of needed RV64 support for some possible later experiments (the 
idea for the hybid XG3-CoEx ISA idea would depend on having working RV64 
support as a prerequisite).

....