Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Fri, 20 Sep 2024 13:42:34 -0500 Organization: A noiseless patient Spider Lines: 258 Message-ID: References: <2024Sep10.101932@mips.complang.tuwien.ac.at> <2024Sep11.123824@mips.complang.tuwien.ac.at> <867cbhgozo.fsf@linuxsc.com> <20240912142948.00002757@yahoo.com> <20240915001153.000029bf@yahoo.com> <20240915154038.0000016e@yahoo.com> <2024Sep15.194612@mips.complang.tuwien.ac.at> <45fb24ca46af5c388b0a44af2f72ddf6@www.novabbs.org> <77a593b0e8dcb7e4f38c006d3a148cdc@www.novabbs.org> <7a8a967098cb2558c1bbdda5cb3ce99f@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 20 Sep 2024 20:43:51 +0200 (CEST) Injection-Info: dont-email.me; posting-host="a023699a4ef7a97393a5f9b5d1924251"; logging-data="1277056"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/OVvO2E++9Fh8tsCO3WCCn89arl7WorcM=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:4WMfUFNUAbnJyJDrJeWrrMP/uX0= Content-Language: en-US In-Reply-To: Bytes: 10938 On 9/19/2024 9:09 PM, BGB wrote: > On 9/18/2024 1:42 PM, MitchAlsup1 wrote: >> On Wed, 18 Sep 2024 17:55:34 +0000, BGB wrote: >> >>> On 9/18/2024 9:27 AM, MitchAlsup1 wrote: >>>> On Wed, 18 Sep 2024 4:00:43 +0000, BGB wrote: >>>> >>>>> On 9/17/2024 6:04 PM, MitchAlsup1 wrote: >>>> >>>>>> Still limited to 32-bit displacement from IP. >>>>>> >>>>>> How would you perform the following call:: >>>>>> current IP = 0x0000000000001234 >>>>>> target  IP = 0x7FFFFFFF00001234 >>>>>> >>>>>> This is a single (2-word) instruction in my ISA, assuming GOT is >>>>>> 32-bit displaceable and 64-bit entries. >>>>>> >>>>> >>>>> Granted, but in plain RISC-V, there is no real better option. >>>>> >>>>> If one wants to generate 64-bit displacement, and doesn't want to >>>>> load a >>>>> constant from memory: >>>>>    LUI X6, Disp20Hi       //20 bits >>>>>    ADDI X6, X6, Disp12Hi  //12 bits >>>>>    AUIPC X7, Disp20Lo >>>>>    ADD X7, Disp12Lo >>>>>    SLLI X6, X6, 32 >>>>>    ADD X7, X7, X6 >>>> >>>> How very much simpler is:: >>>> >>>>      MEM    Rd,[IP,Ri<>>> >>>> 1 instruction, 3 words, 1 decode cycle, no forwarding, shorter latency. >>> >>> >>> It is simpler, but N/E in RV64G... >>> >>> This is the whole issue of the idea: >>>    Remain backwards compatible with RV64G / RV64GC (in a binary sense). >> >> So, you like sailing with an albatross tied around your neck:: Check. >> > > Likely for a custom CPU to be taken all that seriously at this point, > one is going to need binary compatibility with at least one semi-popular > ISA. > > And, main options at this point are: >   RISC-V, which is just kinda meh; >   ARMv7 / ARMv8, which are not free; >     And, v7/v8 are nowhere near patents expiring. >   x86-64, just no. >     Doable at least as far as x86-64 and SSE2 should be in the clear. >     But, making it not perform well seems harder. > > Well, or MIPS64 or SPARC64 or similar, but these are arguably worse > options than RISC-V. > > >>> *and* try to allow extending it in a way such that performance can be >>> less poor... >> >> I should remind you that if you eliminate the compressed parts of >> RISC-V you can fit the entire My 66000 ISA in the space remaining. >> All the constants, all transcendentals, all the far-control transfers, >> the efficient context switching, overhead free world switching,... >> --------- > > The idea is that the mode switching can allow swapping out the > Compressed instructions to make room for other stuff, while also leaving > the compressed instructions in existence for compatibility with binaries > built assuming them. > > And, is less drastic than gluing together two unrelated ISA's using > inter-ISA branches (say, the current situation of trying to mix RISC-V > code with XG2 via magic function pointers). > > > But, yeah, if you want to design a version of your ISA than can also co- > execute with RISC-V, not like I have any reason to complain. > > >>>>> >>>>> Which is sort of the whole reason I am considering hacking around it >>>>> with an alternate encoding scheme. >>>> >>>> Just put in real constants. >>>>> >>>>> New encoding scheme can in theory do: >>>>>    LEA X7, PC, Disp64 >>>>> In a single 96-bit instruction. >>>> >>>> Where is the indexing register? >>> >>> Generally the use of a displacement and index register are mutually >>> exclusive (and, cases that can make use of Disp AND Index are much less >>> common than Disp OR Index). >> >>        COMMON ?alpha/ a(100,100), b(300,300), >> >> .. >> >>        x = a(i,j)*b(j,i); >> >> I see large displacements with indexing all the time from ASM out >> of Brian's compiler. >> > > I tried adding this stuff experimentally with BGBCC in the past, in both > of my ISA efforts, but seemingly my attempts didn't use them all that > often (as opposed to [Rb+Disp] and [Rb+Ri*FixSc] which are used > extensively). > > Arguably, the main relevant cases would have been for stack-arrays or > arrays inside structs. > > But, if such an array is referenced multiple times in a given basic > block, it would likely still be more efficient to load the address of > the array into a register. > > > Though, if one were to go simply on usage frequency, likely auto- > increment would be slightly ahead. > > Say (roughly from memory): >   [Rb+Disp]        // ~ 60% (includes PC and GBR) >   [Rb+Ri*FixSc]    // ~ 30% (eg: "ptr[i]") >   [Rb]+            // ~ 6% (eg: "*ptr++") >   [Rb+Ri*Sc+Disp]  // ~ 4% (eg: "obj->arr[i]") > > Well, unless someone can find a table that shows significantly different > stats. Off hand, not easily finding such a table to compare with > (preferably from a relatively mature target which has the relevant modes). > > Can note that "*ptr++" seems most common for auto-increment, whereas > "*ptr--", "*--ptr", and "*++ptr" are rarer. > > > Seems like no one has made tables online for the relative usage > frequencies of the various x86-64 and ARM64 addressing mode... > > Might be useful to have this data for "relatively mature" architectures. > Would be a pain to write an x86-64 disassembler mostly to use it just to > stat up the ModR/M+SIB sequences. Does raise the question of if there is > a semi-reliable way to stat this without needing to write a full > disassembler. > > > One simple option would be to assume an instruction looks like: >   [Prefix Bytes] >   [REX byte] >   OP_Byte | 0F+OP_Byte >   Mod/RM + SIB + ... > > And then use a heuristic to try to guess how to interpret the > instruction stream based on "looks better" (more likely to be aligned > with the instruction stream vs random unaligned garbage). > > Though, such a "looks good" heuristic could itself risk skewing the > results. ========== REMAINDER OF ARTICLE TRUNCATED ==========