| Deutsch English Français Italiano |
|
<58bd95eee31b53933be111d0d941203a@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Misc: BGBCC targeting RV64G, initial results...
Date: Sun, 29 Sep 2024 19:11:47 +0000
Organization: Rocksolid Light
Message-ID: <58bd95eee31b53933be111d0d941203a@www.novabbs.org>
References: <vd5uvd$mdgn$1@dont-email.me> <vd69n0$o0aj$1@dont-email.me> <vd6tf8$r27h$1@dont-email.me> <1b8c005f36fd5a86532103a8fb6a9ad6@www.novabbs.org> <vd7gk6$tquh$1@dont-email.me> <abf735f7cab1885028cc85bf34130fe9@www.novabbs.org> <vd80r8$148fc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="4044135"; mail-complaints-to="usenet@i2pn2.org";
posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Rslight-Site: $2y$10$uvRILcKYhHR/xVy0X5WFuOAq1l7Ov2xe52P7pYmn0zii7y1lGkSTS
Bytes: 4656
Lines: 100
On Sat, 28 Sep 2024 4:30:12 +0000, BGB wrote:
> On 9/27/2024 7:43 PM, MitchAlsup1 wrote:
>> On Fri, 27 Sep 2024 23:53:22 +0000, BGB wrote:
>>
>> One of the reasons reservation stations became in vouge.
>>
>
> Possibly, but is a CPU feature rather than a compiler feature...
A good compiler should be able to make use of 98% of the instruction
set.
>
------------
>
> Saw a video not too long ago where he was making code faster by undoing
> a lot of loop unrolling, as the code was apparently spending more in I$
> misses than it was gaining by being unrolled.
I noticed this in 1991 when we got Mc88120 simulator up and running.
GBOoO chips are <nearly> best served when there is the smallest number
of instructions.
>------------
>
> In contrast, a jumbo prefix by itself does not make sense; its meaning
> depends on the thing that being is prefixed. Also the decoder will
> decode a jumbo prefix and suffix instruction at the same time.
How many bits does one of these jumbo prefixes consume ?
-----
>
>
> For the jumbo prefix:
> Recognize that is a jumbo prefix;
> Inform the decoder for the following instruction of this fact
> (via internal flag bits);
> Provide the prefix's data bits to the corresponding decoder.
>
> Unlike a "real" instruction, a jumbo prefix does not need to provide
> behavior of its own, merely be able to be identified as such and to
> provide payload data bits.
>
>
> For now, there are not any encodings larger than 96 bits.
> Partly this is because 128 bit fetch would likely add more cost and
> complexity than it is worth at the moment.
For your implementation, yes. For all others:: <at best> maybe.
>
>
>>>
>>>>>
>>>>> Likewise, no one seems to be bothering with 64-bit ELF FDPIC for RV64
>>>>> (there does seem to be some interest for ELF FDPIC but limited to
>>>>> 32-bit
>>>>> RISC-V ...). Ironically, ideas for doing FDPIC in RV aren't too far off
>>>>> from PBO (namely, using GP for a global section and then chaining the
>>>>> sections for each binary).
>>>>
>>>> How are you going to do dense PIC switch() {...} in RISC-V ??
>>>
>>> Already implemented...
>>>>>> With pseudo-instructions:
>>> SUB Rs, $(MIN), R10
>>> MOV $(MAX-MIN), R11
>>> BGTU R11, R10, Lbl_Dfl
>>>
>>> MOV .L0, R6 //AUIPC+ADD
>>> SHAD R10, 2, R10 //SLLI
>>> ADD R6, R10, R6
>>> JMP R6 //JALR X0, X6, 0
>>>
>>> .L0:
>>> BRA Lbl_Case0 //JAL X0, Lbl_Case0
>>> BRA Lbl_Case1
>>> ...
>>
>> Compared to::
>> // ADD Rt,Rswitch,#-min
>> JTT Rt,#max
>> .jttable min, ... , max, default
>> adder:
>>
>> The ADD is not necessary if min == 0
>>
>> The JTT instruction compared Rt with 0 on the low side and max
>> on the high side. If Ri is out of bounds, default is selected.
>>
>> The table displacements come in {B,H,W,D} selected in the JTT
>> (jump through table) instruction. Rt indexes the table, its
>> signed value is <<2 and added to address which happens to be
>> address of JTT instruction + #(max+1)<<entry. {{The table is
>> fetched through the ICache with execute permission}}
>>
>> Thus, the table is PIC; and generally 1/4 the size of typical
>> switch tables.
>> -----
>
> Potentially it could be more compact.
Both more compact and just as fast; many times faster.