Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Misc: BGBCC targeting RV64G, initial results... Date: Sun, 29 Sep 2024 19:11:47 +0000 Organization: Rocksolid Light Message-ID: <58bd95eee31b53933be111d0d941203a@www.novabbs.org> References:

<1b8c005f36fd5a86532103a8fb6a9ad6@www.novabbs.org>

MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="4044135"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Rslight-Site: $2y$10$uvRILcKYhHR/xVy0X5WFuOAq1l7Ov2xe52P7pYmn0zii7y1lGkSTS Bytes: 4656 Lines: 100 On Sat, 28 Sep 2024 4:30:12 +0000, BGB wrote: > On 9/27/2024 7:43 PM, MitchAlsup1 wrote: >> On Fri, 27 Sep 2024 23:53:22 +0000, BGB wrote: >> >> One of the reasons reservation stations became in vouge. >> > > Possibly, but is a CPU feature rather than a compiler feature... A good compiler should be able to make use of 98% of the instruction set. > ------------ > > Saw a video not too long ago where he was making code faster by undoing > a lot of loop unrolling, as the code was apparently spending more in I$ > misses than it was gaining by being unrolled. I noticed this in 1991 when we got Mc88120 simulator up and running. GBOoO chips are best served when there is the smallest number of instructions. >------------ > > In contrast, a jumbo prefix by itself does not make sense; its meaning > depends on the thing that being is prefixed. Also the decoder will > decode a jumbo prefix and suffix instruction at the same time. How many bits does one of these jumbo prefixes consume ? ----- > > > For the jumbo prefix: > Recognize that is a jumbo prefix; > Inform the decoder for the following instruction of this fact > (via internal flag bits); > Provide the prefix's data bits to the corresponding decoder. > > Unlike a "real" instruction, a jumbo prefix does not need to provide > behavior of its own, merely be able to be identified as such and to > provide payload data bits. > > > For now, there are not any encodings larger than 96 bits. > Partly this is because 128 bit fetch would likely add more cost and > complexity than it is worth at the moment. For your implementation, yes. For all others:: maybe. > > >>> >>>>> >>>>> Likewise, no one seems to be bothering with 64-bit ELF FDPIC for RV64 >>>>> (there does seem to be some interest for ELF FDPIC but limited to >>>>> 32-bit >>>>> RISC-V ...). Ironically, ideas for doing FDPIC in RV aren't too far off >>>>> from PBO (namely, using GP for a global section and then chaining the >>>>> sections for each binary). >>>> >>>> How are you going to do dense PIC switch() {...} in RISC-V ?? >>> >>> Already implemented... >>>>>> With pseudo-instructions: >>> SUB Rs, $(MIN), R10 >>> MOV $(MAX-MIN), R11 >>> BGTU R11, R10, Lbl_Dfl >>> >>> MOV .L0, R6 //AUIPC+ADD >>> SHAD R10, 2, R10 //SLLI >>> ADD R6, R10, R6 >>> JMP R6 //JALR X0, X6, 0 >>> >>> .L0: >>> BRA Lbl_Case0 //JAL X0, Lbl_Case0 >>> BRA Lbl_Case1 >>> ... >> >> Compared to:: >> // ADD Rt,Rswitch,#-min >> JTT Rt,#max >> .jttable min, ... , max, default >> adder: >> >> The ADD is not necessary if min == 0 >> >> The JTT instruction compared Rt with 0 on the low side and max >> on the high side. If Ri is out of bounds, default is selected. >> >> The table displacements come in {B,H,W,D} selected in the JTT >> (jump through table) instruction. Rt indexes the table, its >> signed value is <<2 and added to address which happens to be >> address of JTT instruction + #(max+1)<> fetched through the ICache with execute permission}} >> >> Thus, the table is PIC; and generally 1/4 the size of typical >> switch tables. >> ----- > > Potentially it could be more compact. Both more compact and just as fast; many times faster.