Path: ...!2.eu.feeder.erje.net!feeder.erje.net!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Wed, 18 Sep 2024 12:55:34 -0500 Organization: A noiseless patient Spider Lines: 89 Message-ID: References: <86r09ulqyp.fsf@linuxsc.com> <2024Sep8.173639@mips.complang.tuwien.ac.at> <2024Sep10.101932@mips.complang.tuwien.ac.at> <2024Sep11.123824@mips.complang.tuwien.ac.at> <867cbhgozo.fsf@linuxsc.com> <20240912142948.00002757@yahoo.com> <20240915001153.000029bf@yahoo.com> <20240915154038.0000016e@yahoo.com> <2024Sep15.194612@mips.complang.tuwien.ac.at> <45fb24ca46af5c388b0a44af2f72ddf6@www.novabbs.org> <77a593b0e8dcb7e4f38c006d3a148cdc@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 18 Sep 2024 19:56:49 +0200 (CEST) Injection-Info: dont-email.me; posting-host="f294aec13410514d4030784d3213085a"; logging-data="171826"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19F/rEDsjyClfqD9CB0AoBg1/KMofZFSL0=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:9o66BxDKTAQ/7E4B9oa8h/VH9+c= Content-Language: en-US In-Reply-To: <77a593b0e8dcb7e4f38c006d3a148cdc@www.novabbs.org> Bytes: 4967 On 9/18/2024 9:27 AM, MitchAlsup1 wrote: > On Wed, 18 Sep 2024 4:00:43 +0000, BGB wrote: > >> On 9/17/2024 6:04 PM, MitchAlsup1 wrote: > >>> Still limited to 32-bit displacement from IP. >>> >>> How would you perform the following call:: >>> current IP = 0x0000000000001234 >>> target  IP = 0x7FFFFFFF00001234 >>> >>> This is a single (2-word) instruction in my ISA, assuming GOT is >>> 32-bit displaceable and 64-bit entries. >>> >> >> Granted, but in plain RISC-V, there is no real better option. >> >> If one wants to generate 64-bit displacement, and doesn't want to load a >> constant from memory: >>    LUI X6, Disp20Hi       //20 bits >>    ADDI X6, X6, Disp12Hi  //12 bits >>    AUIPC X7, Disp20Lo >>    ADD X7, Disp12Lo >>    SLLI X6, X6, 32 >>    ADD X7, X7, X6 > > How very much simpler is:: > >     MEM    Rd,[IP,Ri< > 1 instruction, 3 words, 1 decode cycle, no forwarding, shorter latency. It is simpler, but N/E in RV64G... This is the whole issue of the idea: Remain backwards compatible with RV64G / RV64GC (in a binary sense). *and* try to allow extending it in a way such that performance can be less poor... Though, the new encodings, and 'C', could not be used at the same time. Most likely, it would be a per-binary option, with some wonk in the function prologs to deal with uncertain cases. Granted, less obvious how to approach this without adding extra overhead for ELF SO's, where pretty much every function may be potentially exported. Even if GCC output targeting RV64GC is still going to produce the same sort of code as before (typically generating constants via memory loads). >> >> Which is sort of the whole reason I am considering hacking around it >> with an alternate encoding scheme. > > Just put in real constants. >> >> New encoding scheme can in theory do: >>    LEA X7, PC, Disp64 >> In a single 96-bit instruction. > > Where is the indexing register? Generally the use of a displacement and index register are mutually exclusive (and, cases that can make use of Disp AND Index are much less common than Disp OR Index). I may still consider defining an encoding for this, but not yet. It is in a similar boat as auto-increment. Both add resource cost with relatively little benefit in terms of overall performance. Auto-increment because if one has superscalar, the increment can usually be co-executed. And, full [Rb+Ri*Sc+Disp], because it is just too infrequent to really justify the extra cost of a 3-way adder even if limited mostly to the low-order bits... >> > ------------ >>> >>> AUPIC is (and remains) a crutch (like LUI from MIPS) >>> a) it consumes an instruction (space and time) >>> b) it consumes a register unnecessarily >>> c) it consumes power that direct delivery of the constant would not >> >> Yeah, pretty much. >>    LUI + AUIPC + JAL, eat nearly 27 bits of encoding space. >>