Article <uv9i0i$1srig$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <uv9i0i$1srig$1@dont-email.me>
Deutsch English Français Italiano
<uv9i0i$1srig$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB-Alt <bohannonindustriesllc@gmail.com>
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 11 Apr 2024 15:42:59 -0500
Organization: A noiseless patient Spider
Lines: 129
Message-ID: <uv9i0i$1srig$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
 <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
 <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
 <uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me>
 <uv46rg$e4nb$1@dont-email.me>
 <a81256dbd4f121a9345b151b1280162f@www.novabbs.org>
 <uv4ghh$gfsv$1@dont-email.me>
 <8e61b7c856aff15374ab3cc55956be9d@www.novabbs.org>
 <uv5err$ql29$1@dont-email.me>
 <e43623eb10619eb28a68b2bd7af93390@www.novabbs.org>
 <S%zRN.162255$_a1e.120745@fx16.iad>
 <8b6bcc78355b8706235b193ad2243ad0@www.novabbs.org>
 <20240411141324.0000090d@yahoo.com> <uv9ahu$1r74h$1@dont-email.me>
 <0b785ebc54c76e3a10316904c3febba5@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 11 Apr 2024 22:42:59 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="4e6cedd45fc4a12a57db9991b60fc324";
	logging-data="1994320"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18JyzN5Kxep3PJLAE4QVgTLkhqBNeEutPA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:QqHj4nEeeoq0FJFOjFLqAbkFbpQ=
In-Reply-To: <0b785ebc54c76e3a10316904c3febba5@www.novabbs.org>
Content-Language: en-US
Bytes: 6405

On 4/11/2024 1:46 PM, MitchAlsup1 wrote:
> BGB wrote:
> 
>> On 4/11/2024 6:13 AM, Michael S wrote:
>>> On Wed, 10 Apr 2024 23:30:02 +0000
>>> mitchalsup@aol.com (MitchAlsup1) wrote:
>>>
>>>>
>>>>> It does occupy some icache space, however; have you boosted the
>>>>> icache size to compensate?
>>>>
>>>> The space occupied in the ICache is freed up from being in the DCache
>>>> so the overall hit rate goes up !! At typical sizes, ICache miss rate
>>>> is about ¼ the miss rate of DCache.
>>>>
>>>> Besides:: if you had to LD the constant from memory, you use a LD
>>>> instruction and 1 or 2 words in DCache, while consuming a GPR. So,
>>>> overall, it takes fewer cycles, fewer GPRs, and fewer instructions.
>>>>
>>>> Alternatively:: if you paste constants together (LUI, AUPIC) you have
>>>> no direct route to either 64-bit constants or 64-bit address spaces.
>>>>
>>>> It looks to be a win-win !!
>>>
>>> Win-win under constraints of Load-Store Arch. Otherwise, it depends.
> 
> Never seen a LD-OP architecture where the inbound memory can be in the 
> Rs1 position of the instruction.
> 
>>>
> 
>> FWIW:
>> The LDSH / SHORI mechanism does provide a way to get 64-bit constants, 
>> and needs less encoding space than the LUI route.
> 
>>    MOV Imm16. Rn
>>    SHORI Imm16, Rn
>>    SHORI Imm16, Rn
>>    SHORI Imm16, Rn
> 
>> Granted, if each is a 1-cycle instruction, this still takes 4 clock 
>> cycles.
> 
> As compared to::
> 
>      CALK   Rd,Rs1,#imm64
> 
> Which takes 3 words (12 bytes) and executes in CALK cycles, the loading
> of the constant is free !! (0 cycles) !! {{The above example uses at least
> 5 cycles to use the loaded/built constant.}}
> 

The main reason one might want SHORI is that it can fit into a 
fixed-length 32-bit encoding. Also technically could be retrofitted onto 
RISC-V without any significant change, unlike some other options (as 
noted, I don't argue for adding Jumbo prefixes to RV under the basis 
that there is no real viable way to add them to RV, *).

Sadly, the closest option to viable for RV would be to add the SHORI 
instruction and optionally pattern match it in the fetch/decode.

Or, say:
   LUI Xn, Imm20
   ADD Xn, Xn, Imm12
   SHORI Xn, Imm16
   SHORI Xn, Imm16

Then, combine LUI+ADD into a 32-bit load in the decoder (though probably 
only if the Imm12 is positive), and 2x SHORI into a combined 
"Xn=(Xn<<32)|Imm32" operation.

This could potentially get it down to 2 clock cycles.



*: To add a jumbo prefix, one needs an encoding that:
   Uses up a really big chunk of encoding space;
   Is otherwise illegal and unused.
RISC-V doesn't have anything here.


Ironically, in XG2 mode, I still have 28x 24-bit chunks of encoding 
space that aren't yet used for anything, but aren't usable as normal 
encoding space mostly because if I put instructions in there (with the 
existing encoding schemes), I couldn't use all the registers (and they 
would not have predication or similar either). Annoyingly, the only 
types of encodings that would fit in there at present are 2RI Imm16 ops 
or similar (or maybe 3R 128-bit SIMD ops, where these ops only use 
encodings for R0..R31 anyways, interpreting the LSB of the register 
field as encoding R32..R63).

Though, 14x of these spaces would likely be alternate forms of Jumbo 
prefix (with another 14 in unconditional-scalar-op land). No immediate 
need to re-add an equivalent of the 40x2 encoding (from Baseline mode), 
as most of what 40x2 addressed can be encoded natively in XG2 Mode.


Technically, I also have 2 unused bits in the Imm16 ops as well in XG2 
Mode. I "could" in theory, if I wanted, use them to extend the:
   MOV Imm17s, Rn
Case, to:
   MOV Imm19s, Rn
Though, the other option is to leave them reserved if I later want more 
Imm16 ops.

For now, current plan is to leave this stuff as reserved.


>> An encoding that can MOV a 64-bit constant in 96-bits (12 bytes) and 
>> 1-cycle, is preferable....
> 
> A consuming instruction where you don't even use a register is better
> still !!


Can be done, but thus far 33-bit immediate values. Luckily, Imm33s seems 
to addresses around 99% of uses (for normal ALU ops and similar).

Had considered allowing an Imm57s case for SIMD immediates (4x S.E5.F8 
or 2x S.E8.F19), which would have indirectly allowed the Imm57s case. By 
themselves though, the difference doesn't seem enough to justify the cost.

Don't have enough bits in the encoding scheme to pull off a 3RI Imm64 in 
12 bytes (and allowing a 16-byte encoding would have too steep of a cost 
increase to be worthwhile).

So, alas...