Deutsch   English   Français   Italiano  
<vf6s6l$15gfr$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Misc: BGBCC targeting RV64G, initial results...
Date: Mon, 21 Oct 2024 19:38:41 -0500
Organization: A noiseless patient Spider
Lines: 166
Message-ID: <vf6s6l$15gfr$1@dont-email.me>
References: <vd5uvd$mdgn$1@dont-email.me>
 <b17e4a241a5bc300250aab8c1c5b9348@www.novabbs.org>
 <vdcbe5$1s6so$1@dont-email.me>
 <852a1995ec32b2e03628885f9b5da124@www.novabbs.org>
 <veonu1$2ae17$1@dont-email.me> <veovcc$2b1fi$1@dont-email.me>
 <vep7be$2cs59$1@dont-email.me>
 <802b8c55ab0ba69a7fc324618f2c63df@www.novabbs.org>
 <vepk8h$2f0m6$1@dont-email.me>
 <a0e4aadcae0c6952b47b04a370c2da70@www.novabbs.org>
 <vf6cl2$12m6u$1@dont-email.me>
 <33cca57c6b6aefac5d59b2c3d0654b01@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 22 Oct 2024 02:38:45 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d61d12ed2f3cdc6f6346fdd8dea3e678";
	logging-data="1229307"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+tVvWLqfnujS37GlEGX6rWAxgglqHABE8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:QeYPRyPt4Aybsua67PczKqMlAqc=
Content-Language: en-US
In-Reply-To: <33cca57c6b6aefac5d59b2c3d0654b01@www.novabbs.org>
Bytes: 6555

On 10/21/2024 4:10 PM, MitchAlsup1 wrote:
> On Mon, 21 Oct 2024 20:13:20 +0000, BGB wrote:
> 
>> On 10/17/2024 9:28 PM, MitchAlsup1 wrote:
> 
>>>> Granted, I guess it could be decoded as if it were a normal 3RI op or
>>>> similar, but then split up the immediate into multiple parts in EX1.
>>>
>>> Why would you want do make it 3×11-bit displacements when you can
>>> make it 3×16-bit displacements.
>>>
>>>      +------+-----+-----+----------------+
>>>      | Bc   |  3W |  Rt |   .lb_lo       |
>>>      +------+-----+-----+----------------+
>>>      |   .lb_zero       |  .lb_hi        |
>>>      +------------------+----------------+
>>
>> Neither BJX2 nor RISC-V have the encoding space to pull this off...
>>    Even in a clean-slate ISA, it would be a big ask.
> 
> If you remove compressed instructions from RISC-V, you have enough
> room left over to put the entire My 66000 ISA. ... ... ...


Likewise, could also fit more or less all of XG2 encoding space into the 
space as well, if the bits were shuffled around to fit the encoding 
space around RISC-V...

I could have considered this, vs my previous BSR4I idea...
   Pro:
     Could potentially leverage my existing BJX2 decoders;
     BSR4I would have needed new decoders.
   Con:
     Possibly a bigger dog-chewed mess than my existing encoding.
     The BJX2 ISA is still a bit more complicated than RV;
     Would still need the resource cost of more decoders.


Say:
   NMOP-YwYY-nnnn-mmmm ZZZZ-Qnmo-oooo-XXXX (F0)
   NMOP-YwYY-nnnn-mmmm ZZZZ-Qnmo-oooo-oooo (F1/F2)
   NZZP-YwYY-nnnn-ZZZn iiii-iiii-iiii-iiii (F8)

Possible Repack:
   XXXX-oooo-oomm-mmmm-ZZZZ-nnnn-nnQY-YYPw (F0)
   oooo-oooo-oomm-mmmm-ZZZZ-nnnn-nnQY-YYPw (F1/F2)
   iiii-iiii-iiii-iiii-ZZZZ-nnnn-nnZY-YYPw (F8)
   00: OP?T
   01: OP?F
   10: OP
   11: RV OP32

If I did so though, would likely:
   Drop FA and FB blocks, and rework the F8 block
   Implicitly, WEX and PrWEX are dropped;
     Would need to use superscalar.
   The FA and FB blocks would take over the Jumbo-Prefix role.

Likely:
   Special case F8 so that it makes sense;
   Special case F1 and F2 so that immediate bits are contiguous;
   May make sense to relocate BRA and BSR from F0 to F8.
     Likely reduced from 23 to 22 bits.

Where, YYY:
   000: F0 (3R ops)
   001: F1 (LD/ST Disp10)
   010: F2 (3RI Imm10 Ops)
   011: F3 (Reserved / User)
   100: F8 (Imm16 ops)
   101: F9 (Reserved)
   110: FE (Jumbo Prefix)
   111: FF (Jumbo Prefix)

Probably using a variation of XG2RV rules (IOW: Uses same register space 
and ABI as RISC-V).



Ironically, repacking XG2 to fit into the RV encoding space might 
actually be easier than trying to expand RISC-V register fields to 6 
bits and fit it into the same space.

If doing so, it would likely make sense to only carry over certain 
encoding blocks, say:
   0z-000 -> 000: LD / ST (O select)
   11-000 -> 001: BEQ
   11-000 -> 010: -
   11-000 -> 011: -
   01-100 -> 100: ALU
   01-110 -> 101: ALUW
   10-100 -> 110: FPU
   00-1z0 -> 111: ALUI / ALUIW (O select)

ZZZZZZZ-ooooo-mmmmm-ZZZ-nnnnn-nm-YYY0o

Where, say:
   0z: RV, Expanded 6b
   10: -
   11: Original RV OP32


Or, more aggressive:
   0z-000 -> 00: LD / ST (O select)
   00-1z0 -> 01: ALUI / ALUIW (O select)
   01-100 -> 10: ALU
   01-110 -> 11: ALUW

ZZZZZZZ-ooooo-mmmmm-ZZZ-nnnnn-YY-nmo00

Where:
   00: RV, Expanded 6b
   01: -
   10: -
   11: Original RV OP32


Though, the top 4 blocks of RV is probably less useful than nearly the 
entire XG2 ISA...



Though, not sure how well "Repacked XG2RV hot glued onto RISC-V" would 
go over.

Would still have the downside of needing a special/separate operating 
mode. Well, and the wonk that it would still be essentially two ISA's 
awkwardly glued together.



But, then again, there seems to still be a roughly 19% performance delta 
between my current extended RISC-V and XG2 when it comes to running 
Doom. As, sadly, Jumbo Prefixes and Indexed Load/Store were still not 
enough to entirely close the gap.

Eg:
   XG2        : 25 fps
   RV+J       : 21 fps
   RV64G (GCC): 17 fps


Implementation would be easier, in that it would be mostly "take 
existing ISA and shuffle the bits around" on the encoder and decoder sides.


Some people really like the C extension though, but granted, it makes 
more sense for microcontrollers.

IME, performance oriented code isn't really limited by I$ miss rate. I$ 
misses are a bigger issue with 4K or 8K I$, but much less of an issue 
with 32K I$.


Well, and also XG2 is currently managing to be smaller than RV64GC as 
well, as fewer instructions is saving more than "common instructions 
using less space" (like, 'C' saves 35%, but avoiding most of the cases 
that need multi-instruction sequences saves 60%, ...).

Jumbo prefixes and similar help, but would still need to shave off 
another 20% here.


....