Deutsch   English   Français   Italiano  
<vd784r$sffs$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Misc: BGBCC targeting RV64G, initial results...
Date: Fri, 27 Sep 2024 16:28:41 -0500
Organization: A noiseless patient Spider
Lines: 241
Message-ID: <vd784r$sffs$1@dont-email.me>
References: <vd5uvd$mdgn$1@dont-email.me>
 <b17e4a241a5bc300250aab8c1c5b9348@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 27 Sep 2024 23:30:04 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="bb109af784cbe3a996c54fcfa36067e1";
	logging-data="933372"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19p+DQvvylbBUzHU58TdkSznOAFZsvN4AA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:vNTndenvU8kdMnFKC09zIcWme+U=
In-Reply-To: <b17e4a241a5bc300250aab8c1c5b9348@www.novabbs.org>
Content-Language: en-US
Bytes: 8657

On 9/27/2024 10:52 AM, MitchAlsup1 wrote:
> On Fri, 27 Sep 2024 9:46:01 +0000, BGB wrote:
> 
>> Had recently been working on getting BGBCC to target RV64G.
>>
>> Array Load/Store:
>      M66: 1 instruction
>>    XG2: 1 instruction
>>    RV64: 3 instructions
>>

Yeah.

It being not 1 instruction on RV64 is one of my major annoyances with RV64.

The mess of dealing with constants is another big annoyance.


>> Global Variable:
>      M66: 1 instruction (anywhere in 64-bit memory)
>>    XG2: 1 instruction (if within 2K of GBR)
>>    RV64: 1 or 4 instructions
>>

Screwed this up slightly:
   1-instruction, 2K of GP, is for RV64 (not XG2).

For BJX2:
   Baseline is 4K or 8K (depending on operand size).
   XG2 is 16K or 32K (depending on operand size).
For a simple 32-bit encoding.



With a jumbo prefix, it is still 1 instruction...

Technically, there is a ~ 2GB limit for the size of ".data"+".bss", but 
this is also a limit with PE/COFF; and isn't likely to be a big issue in 
practice.

Would need either to modify PE further or jump over to an ELF variant to 
support 64-bit RVAs.

But, if one wants to support large global uninitialized arrays (the main 
use case that is likely to exceed such a limit), could have the compiler 
silently turn them into statically-initialized "calloc()" calls.


Well, nevermind if BGBCC will currently break if the size of a section 
exceeds 8MB (due mostly to an issue for how it internally represents its 
base relocs). Fixing this is on an eventual TODO list.

Mostly it is due to say:
   (31:28): Base Reloc Type
   (27:23): Section Number
   (22: 0): Section Offset

Which is then converted to the PE/COFF format:
   Txxx:
     T = Base Reloc Type
     xxx = Offset within logical 4K page.
With an extension:
   0000: NOP
   0001..07FF: Advance current position by 1..2047 pages (8MB).
   0800..0FFF: Reverse current position by -1..-2048 pages.

Though, the negative case isn't generally used, as the relocs are sorted 
by address. Doing it this way (vs individual sub-blocks for each page) 
can further compact the base reloc table.

Either way, already significantly more compact than ELF symbol and reloc 
tables.


>> Constant Load into register (not R5):
>      M66: 0 instructions
>>    XG2: 1 instruction
>>    RV64: ~ 1-6
>>
>> Operator with 32-bit immediate:
>      M66:  1 instruction
>>    BJX2: 1 instruction;
>>    RV64: 3 instructions.
>>
>> Operator with 64-bit immediate:
>      M66:  1 instruction
>>    BJX2: 1 instruction;
>>    RV64: 4-9 instructions.
>>
>>
>>
>> Floating point is still a bit of a hack, as it is currently implemented
>> by shuffling values between GPRs and FPRs, but sorta works.
> 
> My 66000 has a common register file.
> 

Same with BJX2 and XG2.
Not true with RISC-V though.


BGBCC currently assumes a common register file, and the original FPU 
code (from SH-4) has atrophied (and, more so, the RISC-V FPU is somewhat 
different from the SH-4 FPU; so not like stale SH-4 code would work 
effectively on RV64 anyways).

So, for now, BGBCC is assuming that the FPU works like the one in 
BJX2-Baseline (with 32 GPRs, and all of the FPU values in GPRs).

But, this is crappy on RV64:
   FMV.D.X  F0, Xs
   FMV.D.X  F1, Xt
   FADD.D   F3, F0, F1
   FMV.X.D  Xn, F3

Though, it works for the time-being, and is mostly N/A to Doom, which is 
nearly entirely integer code.

Current thinking is to possibly have it as a funky sub-more where 
logical R32..R63 is allowed but only if the value is a floating-point type.

For now, BGBCC also assumes that (like for BJX2) all scalar 
floating-point values are represented in registers in Binary64 form.

Meanwhile, the assembler (in RV64 mode) assumes that:
   R0..R31 means X0..X31
   R32..R63 means F0..F31

Or, basically the same idea as XG2RV Mode.


For now, this means loading a Binary32 from memory looks kinda like:
   LW        Xn, ...
   // fake "FLDCF Rn, Rn"
   FMV.D.X   F0, Xn
   FCVT.D.S  F1, F0
   FMV.X.D   Xn, F1

But, could fake "FMOV.S" as:
   FLW       F0, ...
   FCVT.D.S  F1, F0
   FMV.X.D   Xn, F1

But, yeah...


Trying to target RV64 by having BGBCC pretend it is a crappier version 
of BJX2 probably isn't ideal, granted...

Did already run into a few cases where stuff was breaking because BGBCC 
dealt with BJX2 by pretending it was still BJX1 or SH-4; in ways that 
entirely broke for RISC-V (there being a few fundamental differences 
between them).


However, a few cases were more efficient with the "common case of BJX2 
and RV" than the "stale code pretending it was still SH-4" cases...

Because, say, both SH-4 and RISC-V are bad in their own ways...


Most common issue here was that RV64G lacks any concept of a direct 
equivalent to the SuperH SR.T bit, and currently BGBCC makes no effort 
to fake it.

Idea was for the XG3 idea to demote it to optional, but in CoEx mode 
(likely the primary use-case to make it worth the hassle), there would 
be no SR.T bit, ...


Though, an alternative would be to either figure out a scheme to shove 
all the predicated instructions into half the encoding space, or to only 
allow '?T' predication.

Well, or try to partly rework the encoding scheme to free up 1 bit of 
entropy.

One possibility could be that CoEx would deal with predication like:
========== REMAINDER OF ARTICLE TRUNCATED ==========