Path: ...!weretis.net!feeder6.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 11 Apr 2024 23:06:05 +0000
Organization: Rocksolid Light
Message-ID: <f4d64e33b721ff6c5bd37f01f2705316@www.novabbs.org>
References: <uuk100$inj$1@dont-email.me> <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me> <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org> <uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me> <uv46rg$e4nb$1@dont-email.me> <a81256dbd4f121a9345b151b1280162f@www.novabbs.org> <uv4ghh$gfsv$1@dont-email.me> <8e61b7c856aff15374ab3cc55956be9d@www.novabbs.org> <uv5err$ql29$1@dont-email.me> <e43623eb10619eb28a68b2bd7af93390@www.novabbs.org> <S%zRN.162255$_a1e.120745@fx16.iad> <8b6bcc78355b8706235b193ad2243ad0@www.novabbs.org> <20240411141324.0000090d@yahoo.com> <uv9ahu$1r74h$1@dont-email.me> <0b785ebc54c76e3a10316904c3febba5@www.novabbs.org> <uv9i0i$1srig$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="872144"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="PGd4t4cXnWwgUWG9VtTiCsm47oOWbHLcTr4rYoM0Edo";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Rslight-Site: $2y$10$ddEVLZ.ZcoaLXUjj2vZgneWmznvUA8tOQ3RsYDLZBvcjkRoqrUhLG
Bytes: 6518
Lines: 127

BGB-Alt wrote:

> On 4/11/2024 1:46 PM, MitchAlsup1 wrote:
>> BGB wrote:
>>
>>>>
>>>> Win-win under constraints of Load-Store Arch. Otherwise, it depends.
>> 
>> Never seen a LD-OP architecture where the inbound memory can be in the 
>> Rs1 position of the instruction.
>> 
>>>>
>> 
>>> FWIW:
>>> The LDSH / SHORI mechanism does provide a way to get 64-bit constants, 
>>> and needs less encoding space than the LUI route.
>> 
>>>    MOV Imm16. Rn
>>>    SHORI Imm16, Rn
>>>    SHORI Imm16, Rn
>>>    SHORI Imm16, Rn
>> 
>>> Granted, if each is a 1-cycle instruction, this still takes 4 clock 
>>> cycles.
>> 
>> As compared to::
>> 
>>      CALK   Rd,Rs1,#imm64
>> 
>> Which takes 3 words (12 bytes) and executes in CALK cycles, the loading
>> of the constant is free !! (0 cycles) !! {{The above example uses at least
>> 5 cycles to use the loaded/built constant.}}
>> 

> The main reason one might want SHORI is that it can fit into a 
> fixed-length 32-bit encoding. 

While 32-bit encoding is RISC mantra, it has NOT been shown to be best
just simplest. Then, once you start widening the microarchitecture, it
is better to fetch wider than decode-issue so that you suffer least 
from boundary conditions. Once you start fetching wide OR have wide
decode-issue, you have ALL the infrastructure to do variable length
instructions. Thus, complaining that VLE is hard has already been
eradicated.

>                               Also technically could be retrofitted onto 
> RISC-V without any significant change, unlike some other options (as 
> noted, I don't argue for adding Jumbo prefixes to RV under the basis 
> that there is no real viable way to add them to RV, *).

The issue is that once you do VLE RISC-Vs ISA is no longer helping you
get the job done, especially when you have to execute 40% more instructions

> Sadly, the closest option to viable for RV would be to add the SHORI 
> instruction and optionally pattern match it in the fetch/decode.

> Or, say:
>    LUI Xn, Imm20
>    ADD Xn, Xn, Imm12
>    SHORI Xn, Imm16
>    SHORI Xn, Imm16

> Then, combine LUI+ADD into a 32-bit load in the decoder (though probably 
> only if the Imm12 is positive), and 2x SHORI into a combined 
> "Xn=(Xn<<32)|Imm32" operation.

> This could potentially get it down to 2 clock cycles.

Universal constants gets this down to 0 cycles......

> *: To add a jumbo prefix, one needs an encoding that:
>    Uses up a really big chunk of encoding space;
>    Is otherwise illegal and unused.
> RISC-V doesn't have anything here.

Which is WHY you should not jump ship from SH to RV, but jump to an
ISA without these problems.

> Ironically, in XG2 mode, I still have 28x 24-bit chunks of encoding 
> space that aren't yet used for anything, but aren't usable as normal 
> encoding space mostly because if I put instructions in there (with the 
> existing encoding schemes), I couldn't use all the registers (and they 
> would not have predication or similar either). Annoyingly, the only 
> types of encodings that would fit in there at present are 2RI Imm16 ops 
> or similar (or maybe 3R 128-bit SIMD ops, where these ops only use 
> encodings for R0..R31 anyways, interpreting the LSB of the register 
> field as encoding R32..R63).

Just another reason not to stay with what you have developed.

In comparison, I reserve 6-major OpCodes so that a control transfer into
data is highly likely to get Undefined OpCode exceptions rather than a
try to execute what is in that data. Then, as it is, I still have 21-slots
in the major OpCode group free (27 if you count the permanently reserved).

Much of this comes from side effects of Universal Constants.


>>> An encoding that can MOV a 64-bit constant in 96-bits (12 bytes) and 
>>> 1-cycle, is preferable....
>> 
>> A consuming instruction where you don't even use a register is better
>> still !!


> Can be done, but thus far 33-bit immediate values. Luckily, Imm33s seems 
> to addresses around 99% of uses (for normal ALU ops and similar).

What do you do when accessing data that the linker knows is more than 4GB 
away from IP ?? or known to be outside of 0-4GB ?? externs, GOT, PLT, ...

> Had considered allowing an Imm57s case for SIMD immediates (4x S.E5.F8 
> or 2x S.E8.F19), which would have indirectly allowed the Imm57s case. By 
> themselves though, the difference doesn't seem enough to justify the cost.

While I admit that <basically> anything bigger than 50-bits will be fine
as displacements, they are not fine for constants and especially FP
constants and many bit twiddling constants.

> Don't have enough bits in the encoding scheme to pull off a 3RI Imm64 in 
> 12 bytes (and allowing a 16-byte encoding would have too steep of a cost 
> increase to be worthwhile).

And yet I did.

> So, alas...

Yes, alas..........