Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Why I've Dropped In
Date: Sat, 14 Jun 2025 14:22:50 -0500
Organization: A noiseless patient Spider
Lines: 111
Message-ID: <102ki6a$bm2d$1@dont-email.me>
References: <0c857b8347f07f3a0ca61c403d0a8711@www.novabbs.com>
 <dd6e28b90190e249289add75780b204a@www.novabbs.com>
 <ec821d1d64555055271e3b72f241d39b@www.novabbs.com>
 <8addb3f96901904511fc9350c43917ef@www.novabbs.com>
 <102b5qh$1q55a$2@dont-email.me>
 <48c03284118d9d68d6ecf3c11b64a76b@www.novabbs.com>
 <577246053d33788ee71e2e04e8466450@www.novabbs.org>
 <bc81734a4df49aeb8c7e11c2ca5e99e4@www.novabbs.com>
 <a144d940c460364f80c5dbe8ca7ce22f@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 14 Jun 2025 21:22:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e0118f8ab1f557a2375dfc38f759b599";
	logging-data="383053"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18PNshncngryVJwi2+aTeLaoVLmWVFMFxQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ROXpstY1JNwitG5asX8xyKRjly4=
In-Reply-To: <a144d940c460364f80c5dbe8ca7ce22f@www.novabbs.org>
Content-Language: en-US

On 6/11/2025 2:16 PM, MitchAlsup1 wrote:
> On Wed, 11 Jun 2025 17:34:54 +0000, quadibloc wrote:
> 
>> On Wed, 11 Jun 2025 16:49:06 +0000, MitchAlsup1 wrote:
>>> On Wed, 11 Jun 2025 14:12:04 +0000, quadibloc wrote:
>>
>>>> Therefore, I reduced the index register and base register fields to
>>>> three bits each, using only some of the 32 integer registers for those
>>>> purposes.
>>
>>> This is going to hurt register allocation.
>>
>> Yes. It will. Unfortunately.
>>
>> Basically, as should be apparent by now, my overriding goal in defining
>> the Concertina II architecture - and its predecessor as well - was to
>> make it "just as good", or at least "just _about_ as good", as both the
>> 68020 and the IBM System/360.
>>
>> This meant that I had to be able to fit base plus index plus
>> displacement into 32 bits, since the System/360 did that, and I had to
>> have 16-bit displacements because the 68020, and indeed x86 and most
>> microprocessors did that.
> 
> There is enough evidence that 12-bit positive displacement (/360 model)
> is insufficient for modern applications, that I was surprised the RISC-V
> went in that direction. EMBench has many subroutines with more than 4K
> of stack variables that cause RISC-V to emit a LUI just to set the 12th
> or 13th bit and access. SPARC had enough problems with 13-bits that any-
> one with their ear to the rail should have heard the consternation.
> 

Wouldn't be as bad, except that standard RISC-V requires using 3 
instructions to deal with this.

But, yeah:
   RISC-V, 12-bit signed, unscaled: -2048..2047 bytes
   XG2/XG3, 10 bit signed, scaled, -4096..4096 bytes for 64b ld/st.

Fallback case:
   RISC-V: LUI+ADD+Ld/St.
   XG3: Jumbo+Ld/St
     One less instruction word, several cycles less latency.

Mostly works, so long as it works most of the time, and the fallback 
case is sufficiently cheap. Downside of RV I think is that the fallback 
case isn't cheap enough to deal with the frequency of cases where Disp12 
is not sufficient.


Could have considered a larger displacement SP-rel ld/st, except that 
stack-miss isn't common enough.

Did end up with 16-bit for GP-rel, as global variables were a more 
common issue here (12b unscaled doesn't count for much vs the size of 
the ".data" section or similar).

Disp16u with a 4 or 8 byte scale covering 256K or 512K. Does a fair bit 
better.

Failing this, the jumbo-encoded form has a reach of 32GB.



Situation with RV+Jumbo is passable.
Though, standard RISC-V kinda performs like dog crap in some cases 
(where neither GCC nor BGBCC manage to pull this off well).


I have been partly working on BGBCC to improve code generation with 
plain RISC-V (such as getting it to use FPU registers for holding FPU 
values, reducing register thrashing, ...).

Some bugs resulted (partly as a result of prologs/epilogs not saving 
some of the registers, ...), but have mostly got things working again.


Though, fixing issues in basic RV64G codegen doesn't make it competitive 
with RV+Jumbo, as fixing issues in basic RV code generation also makes 
RV+Jumbo faster...


Still don't get why a lot of the people defining extensions keep 
focusing on obscure niche cases (that don't really help general 
performance), while ignoring a lot of issues that would lead to "across 
the board" performance improvements.



>> And I had to have register-to-register operate instructions that fit
>> into only 16 bits. Because the System/360 had them, and indeed so do
>> many microprocessors.
>>
>> Otherwise, my ISA would be clearly and demonstrably inferior. Where I
>> couldn't attain a full match, I tried to be at least "almost" as good.
>> So either my 16-bit operate instructions have to come in pairs, and have
>> a very restricted set of operations, or they require the overhead of a
>> block header. I couldn't attain the goal of matching the S/360
>> completely, but at least I stayed close.
>>
>> So while having 32 registers like a RISC, I ended up having some
>> purposes for which I could only use a set of eight registers. Not great,
>> but it was the tradeoff that was left to me given the choice I made.
>>
>> So here it is - an ISA that offers RISC-like simplicity of decoding, but
>> an instruction set that approaches CISC in code compactness - and which
>> offers a choice of RISC, CISC, or VLIW programming styles. Which may
>> lead to VLIW speed and efficiency on suitable implementations.
>>
>> John Savard