Deutsch English Français Italiano |
<vd69n0$o0aj$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Robert Finch <robfi680@gmail.com> Newsgroups: comp.arch Subject: Re: Misc: BGBCC targeting RV64G, initial results... Date: Fri, 27 Sep 2024 08:50:37 -0400 Organization: A noiseless patient Spider Lines: 131 Message-ID: <vd69n0$o0aj$1@dont-email.me> References: <vd5uvd$mdgn$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 27 Sep 2024 14:50:41 +0200 (CEST) Injection-Info: dont-email.me; posting-host="7c5098c7b1f41ab4d55ddf5c27ceca77"; logging-data="786771"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19r41Gax/XIQOz5O/X0EsGM5oLwwTQ0LMI=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:HFuzXfuEVHpGHB0ilaNBIp4Ofjc= In-Reply-To: <vd5uvd$mdgn$1@dont-email.me> Content-Language: en-US Bytes: 5021 On 2024-09-27 5:46 a.m., BGB wrote: > Had recently been working on getting BGBCC to target RV64G. > > > So, for Doom, ".text" sizes at the moment: > BGBCC+XG2 : 292K (seems to have shrank in all this) > BGBCC+RV64: 438K > GCC +RV64: 445K (PIE) > > Doom Framerates: > BGBCC+XG2 : ~ 25-30 > BGBCC+RV64: ~ 8-14 > GCC +RV64: ~ 15-20 > > Start of E1M1 (framerate): > BGBCC+XG2 : ~ 25 > BGBCC+RV64: ~ 12 > GCC +RV64: ~ 16 > > How does RV64 compare to BGBCC+XG2? IS it trying to execute more than one op at a time? I assume XG2 is. > Comparably, it appears BGBCC leans more heavily into ADD and SLLI than > GCC does, with a fair chunk of the total instructions executed being > these two (more cycles are spent adding and shifting than doing memory > load or store...). That seems to be a bit off. Mem ops are usually around 1/4 of instructions. Spending more than 25% on adds and shifts seems like a lot. Is it address calcs? Register loads of immediates? > > Array Load/Store: > XG2: 1 instruction > RV64: 3 instructions > > Global Variable: > XG2: 1 instruction (if within 2K of GBR) > RV64: 1 or 4 instructions > > Constant Load into register (not R5): > XG2: 1 instruction > RV64: ~ 1-6 > > Operator with 32-bit immediate: > BJX2: 1 instruction; > RV64: 3 instructions. > > Operator with 64-bit immediate: > BJX2: 1 instruction; > RV64: 4-9 instructions. > > > Observations (RV64): > LUI+ADD can't actually represent all possible 32-bit constants. > Those near the signed-overflow point can't be expressed directly. > LUI+XOR can get a lot of these cases. > 0x80000000ULL .. 0xFFFFFFFFULL can be partly covered by LUI+XOR. > > For full 64-bit constants, generally need: > LUI+ADD+LUI+ADD+SLLI+ADD > And, two registers. > > There is currently an ugly edge case where BGBCC has to fall back to: > LUI X5, ImmHi > ADDI X5, X5, ImmMi > ( SLLI X5, X5, 12; ADD X5, X5, ImmFrag )+ > > Namely when needing to load a 64-bit constant and R5 is the only register. > > So, if the compiler tries to emit, say: > AND R18, 0x7F7F7F7F7F7F7F7F, R10 > One may end up with, say: > LUI X5, 0x7F7F > ADDI X5, X5, 0x7F8 > SLLI X5, X5, 12 > ADDI X5, X5, 0xF7F > SLLI X5, X5, 12 > ADDI X5, X5, 0x7F8 > SLLI X5, X5, 12 > ADDI X5, X5, 0xF7F > AND X10, X18, X5 > > Which, granted, kinda sucks... > > This is partly because BGBCC's code generation currently assumes it can > just emit whatever here and the assembler will sort it out. > > But, this case comes up rarely. > In BJX2, 33 bit cases would be handled by Jumbo prefixes, and generally > 64-bit cases by loading the value into R0. > > In RV64, this is needed for anything that doesn't fit in 12-bits; with > X5 taking on the role for scratch constants and similar. > > ... > > Floating point is still a bit of a hack, as it is currently implemented > by shuffling values between GPRs and FPRs, but sorta works. > > > RV's selection of 3R compare ops is more limited: > RV: SLT, SLTU > BJX2: CMPEQ, CMPNE, CMPGT, CMPGE, CMPHI, CMPHS, TST, NTST > A lot of these cases require a multi-op sequence to implement with just > SLT and SLTU. > > > Doom isn't quite working correctly yet with BGBCC+RV64 (still has some > significant bugs), but in general game logic and rendering now seems to > be working. > > > But, yeah, generating code for RV is more of a pain as the compiler has > to work harder to try to express what it wants to do in the instructions > that are available. > > > But, yeah, it is what it is... > > I sort of needed RV64 support for some possible later experiments (the > idea for the hybid XG3-CoEx ISA idea would depend on having working RV64 > support as a prerequisite). > > ... >