Article <58bd95eee31b53933be111d0d941203a@www.novabbs.org>

Deutsch English Français Italiano
<58bd95eee31b53933be111d0d941203a@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Misc: BGBCC targeting RV64G, initial results...
Date: Sun, 29 Sep 2024 19:11:47 +0000
Organization: Rocksolid Light
Message-ID: <58bd95eee31b53933be111d0d941203a@www.novabbs.org>
References: <vd5uvd$mdgn$1@dont-email.me> <vd69n0$o0aj$1@dont-email.me> <vd6tf8$r27h$1@dont-email.me> <1b8c005f36fd5a86532103a8fb6a9ad6@www.novabbs.org> <vd7gk6$tquh$1@dont-email.me> <abf735f7cab1885028cc85bf34130fe9@www.novabbs.org> <vd80r8$148fc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="4044135"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Rslight-Site: $2y$10$uvRILcKYhHR/xVy0X5WFuOAq1l7Ov2xe52P7pYmn0zii7y1lGkSTS
Bytes: 4656
Lines: 100

On Sat, 28 Sep 2024 4:30:12 +0000, BGB wrote:

> On 9/27/2024 7:43 PM, MitchAlsup1 wrote:
>> On Fri, 27 Sep 2024 23:53:22 +0000, BGB wrote:
>>
>> One of the reasons reservation stations became in vouge.
>>
>
> Possibly, but is a CPU feature rather than a compiler feature...

A good compiler should be able to make use of 98% of the instruction
set.
>
------------
>
> Saw a video not too long ago where he was making code faster by undoing
> a lot of loop unrolling, as the code was apparently spending more in I$
> misses than it was gaining by being unrolled.

I noticed this in 1991 when we got Mc88120 simulator up and running.
GBOoO chips are <nearly> best served when there is the smallest number
of instructions.
>------------
>
> In contrast, a jumbo prefix by itself does not make sense; its meaning
> depends on the thing that being is prefixed. Also the decoder will
> decode a jumbo prefix and suffix instruction at the same time.

How many bits does one of these jumbo prefixes consume ?
-----
>
>
> For the jumbo prefix:
>    Recognize that is a jumbo prefix;
>    Inform the decoder for the following instruction of this fact
>      (via internal flag bits);
>    Provide the prefix's data bits to the corresponding decoder.
>
> Unlike a "real" instruction, a jumbo prefix does not need to provide
> behavior of its own, merely be able to be identified as such and to
> provide payload data bits.
>
>
> For now, there are not any encodings larger than 96 bits.
> Partly this is because 128 bit fetch would likely add more cost and
> complexity than it is worth at the moment.

For your implementation, yes. For all others:: <at best> maybe.
>
>
>>>
>>>>>
>>>>> Likewise, no one seems to be bothering with 64-bit ELF FDPIC for RV64
>>>>> (there does seem to be some interest for ELF FDPIC but limited to
>>>>> 32-bit
>>>>> RISC-V ...). Ironically, ideas for doing FDPIC in RV aren't too far off
>>>>> from PBO (namely, using GP for a global section and then chaining the
>>>>> sections for each binary).
>>>>
>>>> How are you going to do dense PIC switch() {...} in RISC-V ??
>>>
>>> Already implemented...
>>>>>> With pseudo-instructions:
>>>     SUB Rs, $(MIN), R10
>>>     MOV $(MAX-MIN), R11
>>>     BGTU R11, R10, Lbl_Dfl
>>>
>>>     MOV   .L0, R6      //AUIPC+ADD
>>>     SHAD  R10, 2, R10  //SLLI
>>>     ADD   R6, R10, R6
>>>     JMP   R6           //JALR X0, X6, 0
>>>
>>>     .L0:
>>>     BRA  Lbl_Case0     //JAL X0, Lbl_Case0
>>>     BRA  Lbl_Case1
>>>     ...
>>
>> Compared to::
>> //      ADD        Rt,Rswitch,#-min
>>         JTT        Rt,#max
>>         .jttable   min, ... , max, default
>> adder:
>>
>> The ADD is not necessary if min == 0
>>
>> The JTT instruction compared Rt with 0 on the low side and max
>> on the high side. If Ri is out of bounds, default is selected.
>>
>> The table displacements come in {B,H,W,D} selected in the JTT
>> (jump through table) instruction. Rt indexes the table, its
>> signed value is <<2 and added to address which happens to be
>> address of JTT instruction + #(max+1)<<entry. {{The table is
>> fetched through the ICache with execute permission}}
>>
>> Thus, the table is PIC; and generally 1/4 the size of typical
>> switch tables.
>> -----
>
> Potentially it could be more compact.

Both more compact and just as fast; many times faster.