Deutsch English Français Italiano |
<uva1fu$2010o$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.arch Subject: Re: "Mini" tags to reduce the number of op codes Date: Thu, 11 Apr 2024 20:07:08 -0500 Organization: A noiseless patient Spider Lines: 273 Message-ID: <uva1fu$2010o$1@dont-email.me> References: <uuk100$inj$1@dont-email.me> <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me> <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org> <uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me> <uv46rg$e4nb$1@dont-email.me> <a81256dbd4f121a9345b151b1280162f@www.novabbs.org> <uv4ghh$gfsv$1@dont-email.me> <8e61b7c856aff15374ab3cc55956be9d@www.novabbs.org> <uv5err$ql29$1@dont-email.me> <e43623eb10619eb28a68b2bd7af93390@www.novabbs.org> <S%zRN.162255$_a1e.120745@fx16.iad> <8b6bcc78355b8706235b193ad2243ad0@www.novabbs.org> <20240411141324.0000090d@yahoo.com> <uv9ahu$1r74h$1@dont-email.me> <0b785ebc54c76e3a10316904c3febba5@www.novabbs.org> <uv9i0i$1srig$1@dont-email.me> <f4d64e33b721ff6c5bd37f01f2705316@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 12 Apr 2024 03:07:11 +0200 (CEST) Injection-Info: dont-email.me; posting-host="5521f5d032488c5ad7ae13ff64f338b6"; logging-data="2098200"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19LtR3y7CIVKFBmGcYWw5HAdLIWRbjEGh0=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:nO+CV4X0LtQkrFxVTgZwwPuKPuI= Content-Language: en-US In-Reply-To: <f4d64e33b721ff6c5bd37f01f2705316@www.novabbs.org> Bytes: 11672 On 4/11/2024 6:06 PM, MitchAlsup1 wrote: > BGB-Alt wrote: > >> On 4/11/2024 1:46 PM, MitchAlsup1 wrote: >>> BGB wrote: >>> >>>>> >>>>> Win-win under constraints of Load-Store Arch. Otherwise, it depends. >>> >>> Never seen a LD-OP architecture where the inbound memory can be in >>> the Rs1 position of the instruction. >>> >>>>> >>> >>>> FWIW: >>>> The LDSH / SHORI mechanism does provide a way to get 64-bit >>>> constants, and needs less encoding space than the LUI route. >>> >>>> MOV Imm16. Rn >>>> SHORI Imm16, Rn >>>> SHORI Imm16, Rn >>>> SHORI Imm16, Rn >>> >>>> Granted, if each is a 1-cycle instruction, this still takes 4 clock >>>> cycles. >>> >>> As compared to:: >>> >>> CALK Rd,Rs1,#imm64 >>> >>> Which takes 3 words (12 bytes) and executes in CALK cycles, the loading >>> of the constant is free !! (0 cycles) !! {{The above example uses at >>> least >>> 5 cycles to use the loaded/built constant.}} >>> > >> The main reason one might want SHORI is that it can fit into a >> fixed-length 32-bit encoding. > > While 32-bit encoding is RISC mantra, it has NOT been shown to be best > just simplest. Then, once you start widening the microarchitecture, it > is better to fetch wider than decode-issue so that you suffer least from > boundary conditions. Once you start fetching wide OR have wide > decode-issue, you have ALL the infrastructure to do variable length > instructions. Thus, complaining that VLE is hard has already been > eradicated. > As noted, BJX2 is effectively VLE. Just now split into two sub-variants. So, as for lengths: Baseline: 16/32/64/96 XG2: 32/64/96 Original version was 16/32/48. But, the original 48-bit encoding was dropped, mostly to make the rest of the encoding more orthogonal, and these were replaced with Jumbo prefixes. An encoding space exists where 48-bit ops could in theory be re-added to Baseline, but have not done so as it does not seem be justifiable in a cost/benefit sense (and would still have some of the same drawbacks as the original 48 bit ops). Had also briefly experimented with 24-bit ops, but these were quickly dropped due to "general suckage" (though, an alternate 16/24/32/48 encoding scheme could have theoretically given better code-density). However, RISC-V is either 32-bit, or 16/32. For now, I am not bothering with the 16-bit C extension, not so much for sake of difficulty of dealing with VLE (the core can already deal with VLE), but more because the 'C' encodings are such a dog chewed mess that I don't feel terribly inclined to bother with them. But, like, I can't really compare BJX2 Baseline with RV64G in terms of code density, because this wouldn't be a fair comparison. Would need to compare code-density between Baseline and RV64GC, which would imply needing to actually support the C extension. I could already claim a "win" here if I wanted, but as I see it, doing so would not be valid. Theoretically, encoding space exists for bigger ops in RISC-V, but no one has defined ops there yet as far as I know. Also, the way RISC-V represents larger ops is very different. However, comparing fixed-length against VLE when the VLE only has larger instructions, is still acceptable as I see it (even if larger instructions can still allow a more compact encoding in some cases). Say, for example, as I see it, SuperH vs Thumb2 would still be a fair comparison, as would Thumb2 vs RV32GC, but Thumb2 vs RV32G would not. Unless one only cares about "absolute code density" irrespective of keeping parity in terms of feature-set. >> Also technically could be retrofitted >> onto RISC-V without any significant change, unlike some other options >> (as noted, I don't argue for adding Jumbo prefixes to RV under the >> basis that there is no real viable way to add them to RV, *). > > The issue is that once you do VLE RISC-Vs ISA is no longer helping you > get the job done, especially when you have to execute 40% more instructions > Yeah. As noted, I had already been beating RISC-V in terms of performance, only there was a shortfall in terms of ".text" size (for the XG2 variant). Initially this was around a 16% delta, now down to around 5%. Nearly all of the size reduction thus far, has been due to fiddling with stuff in my compiler. In theory, BJX2 (XG2) should be able to win in terms of code-density, as the only cases where RISC-V has an advantage do not appear to be statistically significant. As also noted, I am using "-ffunction-sections" and similar (to allow GCC to prune unreachable functions), otherwise there is "no contest" (easier to win against 540K than 290K...). >> Sadly, the closest option to viable for RV would be to add the SHORI >> instruction and optionally pattern match it in the fetch/decode. > >> Or, say: >> LUI Xn, Imm20 >> ADD Xn, Xn, Imm12 >> SHORI Xn, Imm16 >> SHORI Xn, Imm16 > >> Then, combine LUI+ADD into a 32-bit load in the decoder (though >> probably only if the Imm12 is positive), and 2x SHORI into a combined >> "Xn=(Xn<<32)|Imm32" operation. > >> This could potentially get it down to 2 clock cycles. > > Universal constants gets this down to 0 cycles...... > Possibly. >> *: To add a jumbo prefix, one needs an encoding that: >> Uses up a really big chunk of encoding space; >> Is otherwise illegal and unused. >> RISC-V doesn't have anything here. > > Which is WHY you should not jump ship from SH to RV, but jump to an > ISA without these problems. > Of the options that were available at the time: SuperH: Simple encoding and decent code density; RISC-V: Seemed like it would have had worse code density. ========== REMAINDER OF ARTICLE TRUNCATED ==========