Article <uva1fu$2010o$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <uva1fu$2010o$1@dont-email.me>
Deutsch English Français Italiano
<uva1fu$2010o$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Thu, 11 Apr 2024 20:07:08 -0500
Organization: A noiseless patient Spider
Lines: 273
Message-ID: <uva1fu$2010o$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
 <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
 <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
 <uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me>
 <uv46rg$e4nb$1@dont-email.me>
 <a81256dbd4f121a9345b151b1280162f@www.novabbs.org>
 <uv4ghh$gfsv$1@dont-email.me>
 <8e61b7c856aff15374ab3cc55956be9d@www.novabbs.org>
 <uv5err$ql29$1@dont-email.me>
 <e43623eb10619eb28a68b2bd7af93390@www.novabbs.org>
 <S%zRN.162255$_a1e.120745@fx16.iad>
 <8b6bcc78355b8706235b193ad2243ad0@www.novabbs.org>
 <20240411141324.0000090d@yahoo.com> <uv9ahu$1r74h$1@dont-email.me>
 <0b785ebc54c76e3a10316904c3febba5@www.novabbs.org>
 <uv9i0i$1srig$1@dont-email.me>
 <f4d64e33b721ff6c5bd37f01f2705316@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 12 Apr 2024 03:07:11 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="5521f5d032488c5ad7ae13ff64f338b6";
	logging-data="2098200"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19LtR3y7CIVKFBmGcYWw5HAdLIWRbjEGh0="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:nO+CV4X0LtQkrFxVTgZwwPuKPuI=
Content-Language: en-US
In-Reply-To: <f4d64e33b721ff6c5bd37f01f2705316@www.novabbs.org>
Bytes: 11672

On 4/11/2024 6:06 PM, MitchAlsup1 wrote:
> BGB-Alt wrote:
> 
>> On 4/11/2024 1:46 PM, MitchAlsup1 wrote:
>>> BGB wrote:
>>>
>>>>>
>>>>> Win-win under constraints of Load-Store Arch. Otherwise, it depends.
>>>
>>> Never seen a LD-OP architecture where the inbound memory can be in 
>>> the Rs1 position of the instruction.
>>>
>>>>>
>>>
>>>> FWIW:
>>>> The LDSH / SHORI mechanism does provide a way to get 64-bit 
>>>> constants, and needs less encoding space than the LUI route.
>>>
>>>>    MOV Imm16. Rn
>>>>    SHORI Imm16, Rn
>>>>    SHORI Imm16, Rn
>>>>    SHORI Imm16, Rn
>>>
>>>> Granted, if each is a 1-cycle instruction, this still takes 4 clock 
>>>> cycles.
>>>
>>> As compared to::
>>>
>>>      CALK   Rd,Rs1,#imm64
>>>
>>> Which takes 3 words (12 bytes) and executes in CALK cycles, the loading
>>> of the constant is free !! (0 cycles) !! {{The above example uses at 
>>> least
>>> 5 cycles to use the loaded/built constant.}}
>>>
> 
>> The main reason one might want SHORI is that it can fit into a 
>> fixed-length 32-bit encoding. 
> 
> While 32-bit encoding is RISC mantra, it has NOT been shown to be best
> just simplest. Then, once you start widening the microarchitecture, it
> is better to fetch wider than decode-issue so that you suffer least from 
> boundary conditions. Once you start fetching wide OR have wide
> decode-issue, you have ALL the infrastructure to do variable length
> instructions. Thus, complaining that VLE is hard has already been
> eradicated.
> 

As noted, BJX2 is effectively VLE.
   Just now split into two sub-variants.

So, as for lengths:
   Baseline: 16/32/64/96
   XG2: 32/64/96
Original version was 16/32/48.


But, the original 48-bit encoding was dropped, mostly to make the rest 
of the encoding more orthogonal, and these were replaced with Jumbo 
prefixes. An encoding space exists where 48-bit ops could in theory be 
re-added to Baseline, but have not done so as it does not seem be 
justifiable in a cost/benefit sense (and would still have some of the 
same drawbacks as the original 48 bit ops).

Had also briefly experimented with 24-bit ops, but these were quickly 
dropped due to "general suckage" (though, an alternate 16/24/32/48 
encoding scheme could have theoretically given better code-density).


However, RISC-V is either 32-bit, or 16/32.

For now, I am not bothering with the 16-bit C extension, not so much for 
sake of difficulty of dealing with VLE (the core can already deal with 
VLE), but more because the 'C' encodings are such a dog chewed mess that 
I don't feel terribly inclined to bother with them.


But, like, I can't really compare BJX2 Baseline with RV64G in terms of 
code density, because this wouldn't be a fair comparison. Would need to 
compare code-density between Baseline and RV64GC, which would imply 
needing to actually support the C extension.

I could already claim a "win" here if I wanted, but as I see it, doing 
so would not be valid.


Theoretically, encoding space exists for bigger ops in RISC-V, but no 
one has defined ops there yet as far as I know. Also, the way RISC-V 
represents larger ops is very different.

However, comparing fixed-length against VLE when the VLE only has larger 
instructions, is still acceptable as I see it (even if larger 
instructions can still allow a more compact encoding in some cases).


Say, for example, as I see it, SuperH vs Thumb2 would still be a fair 
comparison, as would Thumb2 vs RV32GC, but Thumb2 vs RV32G would not.

Unless one only cares about "absolute code density" irrespective of 
keeping parity in terms of feature-set.


>>                               Also technically could be retrofitted 
>> onto RISC-V without any significant change, unlike some other options 
>> (as noted, I don't argue for adding Jumbo prefixes to RV under the 
>> basis that there is no real viable way to add them to RV, *).
> 
> The issue is that once you do VLE RISC-Vs ISA is no longer helping you
> get the job done, especially when you have to execute 40% more instructions
> 

Yeah.

As noted, I had already been beating RISC-V in terms of performance, 
only there was a shortfall in terms of ".text" size (for the XG2 variant).


Initially this was around a 16% delta, now down to around 5%. Nearly all 
of the size reduction thus far, has been due to fiddling with stuff in 
my compiler.

In theory, BJX2 (XG2) should be able to win in terms of code-density, as 
the only cases where RISC-V has an advantage do not appear to be 
statistically significant.


As also noted, I am using "-ffunction-sections" and similar (to allow 
GCC to prune unreachable functions), otherwise there is "no contest" 
(easier to win against 540K than 290K...).


>> Sadly, the closest option to viable for RV would be to add the SHORI 
>> instruction and optionally pattern match it in the fetch/decode.
> 
>> Or, say:
>>    LUI Xn, Imm20
>>    ADD Xn, Xn, Imm12
>>    SHORI Xn, Imm16
>>    SHORI Xn, Imm16
> 
>> Then, combine LUI+ADD into a 32-bit load in the decoder (though 
>> probably only if the Imm12 is positive), and 2x SHORI into a combined 
>> "Xn=(Xn<<32)|Imm32" operation.
> 
>> This could potentially get it down to 2 clock cycles.
> 
> Universal constants gets this down to 0 cycles......
> 

Possibly.


>> *: To add a jumbo prefix, one needs an encoding that:
>>    Uses up a really big chunk of encoding space;
>>    Is otherwise illegal and unused.
>> RISC-V doesn't have anything here.
> 
> Which is WHY you should not jump ship from SH to RV, but jump to an
> ISA without these problems.
> 

Of the options that were available at the time:
   SuperH: Simple encoding and decent code density;
   RISC-V: Seemed like it would have had worse code density.
========== REMAINDER OF ARTICLE TRUNCATED ==========