Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Robert Finch Newsgroups: comp.arch Subject: Re: Misc: BGBCC targeting RV64G, initial results... Date: Fri, 27 Sep 2024 08:50:37 -0400 Organization: A noiseless patient Spider Lines: 131 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 27 Sep 2024 14:50:41 +0200 (CEST) Injection-Info: dont-email.me; posting-host="7c5098c7b1f41ab4d55ddf5c27ceca77"; logging-data="786771"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19r41Gax/XIQOz5O/X0EsGM5oLwwTQ0LMI=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:HFuzXfuEVHpGHB0ilaNBIp4Ofjc= In-Reply-To: Content-Language: en-US Bytes: 5021 On 2024-09-27 5:46 a.m., BGB wrote: > Had recently been working on getting BGBCC to target RV64G. > > > So, for Doom, ".text" sizes at the moment: >   BGBCC+XG2 : 292K (seems to have shrank in all this) >   BGBCC+RV64: 438K >   GCC  +RV64: 445K (PIE) > > Doom Framerates: >   BGBCC+XG2 : ~ 25-30 >   BGBCC+RV64: ~  8-14 >   GCC  +RV64: ~ 15-20 > > Start of E1M1 (framerate): >   BGBCC+XG2 : ~ 25 >   BGBCC+RV64: ~ 12 >   GCC  +RV64: ~ 16 > > How does RV64 compare to BGBCC+XG2? IS it trying to execute more than one op at a time? I assume XG2 is. > Comparably, it appears BGBCC leans more heavily into ADD and SLLI than > GCC does, with a fair chunk of the total instructions executed being > these two (more cycles are spent adding and shifting than doing memory > load or store...). That seems to be a bit off. Mem ops are usually around 1/4 of instructions. Spending more than 25% on adds and shifts seems like a lot. Is it address calcs? Register loads of immediates? > > Array Load/Store: >   XG2: 1 instruction >   RV64: 3 instructions > > Global Variable: >   XG2: 1 instruction (if within 2K of GBR) >   RV64: 1 or 4 instructions > > Constant Load into register (not R5): >   XG2: 1 instruction >   RV64: ~ 1-6 > > Operator with 32-bit immediate: >   BJX2: 1 instruction; >   RV64: 3 instructions. > > Operator with 64-bit immediate: >   BJX2: 1 instruction; >   RV64: 4-9 instructions. > > > Observations (RV64): >   LUI+ADD can't actually represent all possible 32-bit constants. >     Those near the signed-overflow point can't be expressed directly. >   LUI+XOR can get a lot of these cases. >     0x80000000ULL .. 0xFFFFFFFFULL can be partly covered by LUI+XOR. > > For full 64-bit constants, generally need: >   LUI+ADD+LUI+ADD+SLLI+ADD > And, two registers. > > There is currently an ugly edge case where BGBCC has to fall back to: >   LUI X5, ImmHi >   ADDI X5, X5, ImmMi >   ( SLLI X5, X5, 12; ADD X5, X5, ImmFrag )+ > > Namely when needing to load a 64-bit constant and R5 is the only register. > > So, if the compiler tries to emit, say: >   AND R18, 0x7F7F7F7F7F7F7F7F, R10 > One may end up with, say: >   LUI X5, 0x7F7F >   ADDI X5, X5, 0x7F8 >   SLLI X5, X5, 12 >   ADDI X5, X5, 0xF7F >   SLLI X5, X5, 12 >   ADDI X5, X5, 0x7F8 >   SLLI X5, X5, 12 >   ADDI X5, X5, 0xF7F >   AND X10, X18, X5 > > Which, granted, kinda sucks... > > This is partly because BGBCC's code generation currently assumes it can > just emit whatever here and the assembler will sort it out. > > But, this case comes up rarely. > In BJX2, 33 bit cases would be handled by Jumbo prefixes, and generally > 64-bit cases by loading the value into R0. > > In RV64, this is needed for anything that doesn't fit in 12-bits; with > X5 taking on the role for scratch constants and similar. > > ... > > Floating point is still a bit of a hack, as it is currently implemented > by shuffling values between GPRs and FPRs, but sorta works. > > > RV's selection of 3R compare ops is more limited: >   RV: SLT, SLTU >   BJX2: CMPEQ, CMPNE, CMPGT, CMPGE, CMPHI, CMPHS, TST, NTST > A lot of these cases require a multi-op sequence to implement with just > SLT and SLTU. > > > Doom isn't quite working correctly yet with BGBCC+RV64 (still has some > significant bugs), but in general game logic and rendering now seems to > be working. > > > But, yeah, generating code for RV is more of a pain as the compiler has > to work harder to try to express what it wants to do in the instructions > that are available. > > > But, yeah, it is what it is... > > I sort of needed RV64 support for some possible later experiments (the > idea for the hybid XG3-CoEx ISA idea would depend on having working RV64 > support as a prerequisite). > > ... >