Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: BGB Newsgroups: comp.arch Subject: Re: Why I've Dropped In Date: Sat, 14 Jun 2025 14:22:50 -0500 Organization: A noiseless patient Spider Lines: 111 Message-ID: <102ki6a$bm2d$1@dont-email.me> References: <0c857b8347f07f3a0ca61c403d0a8711@www.novabbs.com> <8addb3f96901904511fc9350c43917ef@www.novabbs.com> <102b5qh$1q55a$2@dont-email.me> <48c03284118d9d68d6ecf3c11b64a76b@www.novabbs.com> <577246053d33788ee71e2e04e8466450@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 14 Jun 2025 21:22:51 +0200 (CEST) Injection-Info: dont-email.me; posting-host="e0118f8ab1f557a2375dfc38f759b599"; logging-data="383053"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18PNshncngryVJwi2+aTeLaoVLmWVFMFxQ=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:ROXpstY1JNwitG5asX8xyKRjly4= In-Reply-To: Content-Language: en-US On 6/11/2025 2:16 PM, MitchAlsup1 wrote: > On Wed, 11 Jun 2025 17:34:54 +0000, quadibloc wrote: > >> On Wed, 11 Jun 2025 16:49:06 +0000, MitchAlsup1 wrote: >>> On Wed, 11 Jun 2025 14:12:04 +0000, quadibloc wrote: >> >>>> Therefore, I reduced the index register and base register fields to >>>> three bits each, using only some of the 32 integer registers for those >>>> purposes. >> >>> This is going to hurt register allocation. >> >> Yes. It will. Unfortunately. >> >> Basically, as should be apparent by now, my overriding goal in defining >> the Concertina II architecture - and its predecessor as well - was to >> make it "just as good", or at least "just _about_ as good", as both the >> 68020 and the IBM System/360. >> >> This meant that I had to be able to fit base plus index plus >> displacement into 32 bits, since the System/360 did that, and I had to >> have 16-bit displacements because the 68020, and indeed x86 and most >> microprocessors did that. > > There is enough evidence that 12-bit positive displacement (/360 model) > is insufficient for modern applications, that I was surprised the RISC-V > went in that direction. EMBench has many subroutines with more than 4K > of stack variables that cause RISC-V to emit a LUI just to set the 12th > or 13th bit and access. SPARC had enough problems with 13-bits that any- > one with their ear to the rail should have heard the consternation. > Wouldn't be as bad, except that standard RISC-V requires using 3 instructions to deal with this. But, yeah: RISC-V, 12-bit signed, unscaled: -2048..2047 bytes XG2/XG3, 10 bit signed, scaled, -4096..4096 bytes for 64b ld/st. Fallback case: RISC-V: LUI+ADD+Ld/St. XG3: Jumbo+Ld/St One less instruction word, several cycles less latency. Mostly works, so long as it works most of the time, and the fallback case is sufficiently cheap. Downside of RV I think is that the fallback case isn't cheap enough to deal with the frequency of cases where Disp12 is not sufficient. Could have considered a larger displacement SP-rel ld/st, except that stack-miss isn't common enough. Did end up with 16-bit for GP-rel, as global variables were a more common issue here (12b unscaled doesn't count for much vs the size of the ".data" section or similar). Disp16u with a 4 or 8 byte scale covering 256K or 512K. Does a fair bit better. Failing this, the jumbo-encoded form has a reach of 32GB. Situation with RV+Jumbo is passable. Though, standard RISC-V kinda performs like dog crap in some cases (where neither GCC nor BGBCC manage to pull this off well). I have been partly working on BGBCC to improve code generation with plain RISC-V (such as getting it to use FPU registers for holding FPU values, reducing register thrashing, ...). Some bugs resulted (partly as a result of prologs/epilogs not saving some of the registers, ...), but have mostly got things working again. Though, fixing issues in basic RV64G codegen doesn't make it competitive with RV+Jumbo, as fixing issues in basic RV code generation also makes RV+Jumbo faster... Still don't get why a lot of the people defining extensions keep focusing on obscure niche cases (that don't really help general performance), while ignoring a lot of issues that would lead to "across the board" performance improvements. >> And I had to have register-to-register operate instructions that fit >> into only 16 bits. Because the System/360 had them, and indeed so do >> many microprocessors. >> >> Otherwise, my ISA would be clearly and demonstrably inferior. Where I >> couldn't attain a full match, I tried to be at least "almost" as good. >> So either my 16-bit operate instructions have to come in pairs, and have >> a very restricted set of operations, or they require the overhead of a >> block header. I couldn't attain the goal of matching the S/360 >> completely, but at least I stayed close. >> >> So while having 32 registers like a RISC, I ended up having some >> purposes for which I could only use a set of eight registers. Not great, >> but it was the tradeoff that was left to me given the choice I made. >> >> So here it is - an ISA that offers RISC-like simplicity of decoding, but >> an instruction set that approaches CISC in code compactness - and which >> offers a choice of RISC, CISC, or VLIW programming styles. Which may >> lead to VLIW speed and efficiency on suitable implementations. >> >> John Savard