Article <vdgh8i$2m458$1@dont-email.me>

Deutsch English Français Italiano
<vdgh8i$2m458$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Robert Finch <robfi680@gmail.com>
Newsgroups: comp.arch
Subject: Re: Misc: BGBCC targeting RV64G, initial results...
Date: Tue, 1 Oct 2024 06:00:49 -0400
Organization: A noiseless patient Spider
Lines: 240
Message-ID: <vdgh8i$2m458$1@dont-email.me>
References: <vd5uvd$mdgn$1@dont-email.me> <vd69n0$o0aj$1@dont-email.me>
 <vd6tf8$r27h$1@dont-email.me>
 <1b8c005f36fd5a86532103a8fb6a9ad6@www.novabbs.org>
 <vd7gk6$tquh$1@dont-email.me>
 <abf735f7cab1885028cc85bf34130fe9@www.novabbs.org>
 <vd80r8$148fc$1@dont-email.me>
 <58bd95eee31b53933be111d0d941203a@www.novabbs.org>
 <vdd1s0$22tpk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 01 Oct 2024 12:00:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7a15341faf32647df2f482ed942ec811";
	logging-data="2822312"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX190lLfWt208b9i85/CvFJ0WnyY6ceE+zDY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:RiCufqn/KHu+t7sig2q62iIiUuc=
Content-Language: en-US
In-Reply-To: <vdd1s0$22tpk$1@dont-email.me>
Bytes: 9604

On 2024-09-29 10:19 p.m., BGB wrote:
> On 9/29/2024 2:11 PM, MitchAlsup1 wrote:
>> On Sat, 28 Sep 2024 4:30:12 +0000, BGB wrote:
>>
>>> On 9/27/2024 7:43 PM, MitchAlsup1 wrote:
>>>> On Fri, 27 Sep 2024 23:53:22 +0000, BGB wrote:
>>>>
>>>> One of the reasons reservation stations became in vouge.
>>>>
>>>
>>> Possibly, but is a CPU feature rather than a compiler feature...
>>
>> A good compiler should be able to make use of 98% of the instruction
>> set.
> 
> Yes, but a reservation station is not part of the ISA proper...
> 
> 
>>>
>> ------------
>>>
>>> Saw a video not too long ago where he was making code faster by undoing
>>> a lot of loop unrolling, as the code was apparently spending more in I$
>>> misses than it was gaining by being unrolled.
>>
>> I noticed this in 1991 when we got Mc88120 simulator up and running.
>> GBOoO chips are <nearly> best served when there is the smallest number
>> of instructions.
> 
> 
> Looking it up, seems the CPU in question (MIPS R4300) was:
>    16K L1 I$ cache;
>    8K L1 D$ cache;
>    No L2 cache (but could be supported off-die);
>    1-wide scalar, 32 or 64 bit
>    Non pipelined FPU and multiplier;
>    ...
> 
> 
> Oddly, some amount of these older CPUs seem to have larger I$ than D$, 
> whereas IME the D$ seems to have a higher miss rate (so is easier to 
> justify it being bigger).
> 
> 
>>> ------------
>>>
>>> In contrast, a jumbo prefix by itself does not make sense; its meaning
>>> depends on the thing that being is prefixed. Also the decoder will
>>> decode a jumbo prefix and suffix instruction at the same time.
>>
>> How many bits does one of these jumbo prefixes consume ?
> 
> The prefix itself is 32 bits.
>    In the context of XG3, it supplies 23 or 27 bits.
> 
> 
> For RISC-V ops, they could supply 21 or 26 bits.
> 
>     23+10 = 33 (XG3)
>     21+12 = 33 (RV op)
> 27+27+10 = 64 (XG3)
> 26+26+12 = 64 (RV op)
> 
> J27 could synthesize an immediate for non-immediate ops:
>    27+6 = 33 (XG3)
>    27+5 = 32 (RV)
> 
> 
> For BJX2, the prefixes supply 24 bits (can be stretched to 27 bits in XG2).
>    24+ 9/10=33 (Base)
>    24+24+16=64 (Base)
>    27+27+10=64 (XG2)
> 
> 
> 
> But, yeah, perhaps unsurprisingly, the RISC-V people are not so 
> optimistic about the idea of jumbo prefixes...
> 
> 
> Also apparently it seems "yeah, here is a prefix whose primary purpose 
> is just to make the immediate field bigger for the following 
> instruction" is not such an obvious or intuitive idea as I had thought.
> 
> 
> Well, and people obsessing on what happens if an interrupt somehow 
> occurs "between" the prefix and prefixed instruction.
> 
One reason I prefer postfix immediates. They are much easier to work 
with. Interrupts do not cause issues. The instruction plus postfix can 
be faked to be treated as one giant instruction. The bits following the 
instruction are often already present on the cache line. It is just a 
matter of checking for a postfix when decoding the immediate constants.
Q+ had postfixes that could override a register spec. as well as supply 
additional constant bits. If an interrupt occurs between the instruction 
and the postfix, the postfix can be treated as a NOP at the return point.

Note: Q+ was switched to not using postfixes. Almost everything is 
single 64-bit instructions now. 64-bit constants can be encoded with 
just two instructions.

> Which, as I have tended to implement them, is simply not possible, since 
> everything is fetched and decoded at the same time.
> 
> 
> Granted, yes, it does add the drawback of needing to have tag-bits to 
> remember the mode, and maybe the CPU hiding mode bits in the high order 
> bits of the link register and similar is not such an elegant idea.
> 
> 
> But, as I see it, still preferable to:
> Hey, why not just define a bunch of 48-bit encodings for ALU operations 
> with 32-bit immediate fields?...
> 
> 
> But, like, blarg, this is what I did originally.
> And, I dropped all this in favor of jumbo prefixes, because jumbo 
> prefixes did this job better.
> 
> 
> 
> Might still experiment with an "Extended RISC-V" and see if in-fact, 
> adding things like jumbo prefixes will make as much of a difference as I 
> expect.
> 
> Well, probably along with indexed load/store and Zba instructions and 
> similar.
> 
> I guess, an open question would be if a modified RISC-V variant could be 
> made more performance-competitive with BJX2 without making too much of a 
> mess of things.
> 
> I could maybe do so, but probably no one would be interested.
> 
> 
> 
> Though, looking online, seems I am really the only one calling them 
> "jumbo prefixes". Not sure if there is another more common term used for 
> these things.
> 
> 
>> -----
>>>
>>>
>>> For the jumbo prefix:
>>>    Recognize that is a jumbo prefix;
>>>    Inform the decoder for the following instruction of this fact
>>>      (via internal flag bits);
>>>    Provide the prefix's data bits to the corresponding decoder.
>>>
>>> Unlike a "real" instruction, a jumbo prefix does not need to provide
>>> behavior of its own, merely be able to be identified as such and to
>>> provide payload data bits.
>>>
>>>
>>> For now, there are not any encodings larger than 96 bits.
>>> Partly this is because 128 bit fetch would likely add more cost and
>>> complexity than it is worth at the moment.
>>
>> For your implementation, yes. For all others:: <at best> maybe.
> 
> Maybe.
> 
> I could maybe consider widening fetch/decode to 128-bits if there were a 
> compelling use case.
> 
> 
>>>
>>>
>>>>>
>>>>>>>
>>>>>>> Likewise, no one seems to be bothering with 64-bit ELF FDPIC for 
>>>>>>> RV64
========== REMAINDER OF ARTICLE TRUNCATED ==========