Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Fri, 20 Sep 2024 13:42:34 -0500
Organization: A noiseless patient Spider
Lines: 258
Message-ID: <vckfp6$16v40$1@dont-email.me>
References: <vaqgtl$3526$1@dont-email.me>
 <2024Sep10.101932@mips.complang.tuwien.ac.at> <ygn8qvztf16.fsf@y.z>
 <2024Sep11.123824@mips.complang.tuwien.ac.at> <vbsoro$3ol1a$1@dont-email.me>
 <867cbhgozo.fsf@linuxsc.com> <20240912142948.00002757@yahoo.com>
 <vbuu5n$9tue$1@dont-email.me> <20240915001153.000029bf@yahoo.com>
 <vc6jbk$5v9f$1@paganini.bofh.team> <20240915154038.0000016e@yahoo.com>
 <2024Sep15.194612@mips.complang.tuwien.ac.at> <vc8m5k$2nf2l$1@dont-email.me>
 <vc8tlj$2od19$3@dont-email.me> <vca209$319ci$1@dont-email.me>
 <vcbiov$3ecji$3@dont-email.me> <vccmm3$3m42h$1@dont-email.me>
 <e060fe2e0ee375efff2a9ab1223652f5@www.novabbs.org>
 <vccv3r$3nfqv$1@dont-email.me>
 <45fb24ca46af5c388b0a44af2f72ddf6@www.novabbs.org>
 <vcdjbn$3u259$1@dont-email.me>
 <77a593b0e8dcb7e4f38c006d3a148cdc@www.novabbs.org>
 <vcf491$57pi$1@dont-email.me>
 <7a8a967098cb2558c1bbdda5cb3ce99f@www.novabbs.org>
 <vciljc$q5fh$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 20 Sep 2024 20:43:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a023699a4ef7a97393a5f9b5d1924251";
	logging-data="1277056"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/OVvO2E++9Fh8tsCO3WCCn89arl7WorcM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:4WMfUFNUAbnJyJDrJeWrrMP/uX0=
Content-Language: en-US
In-Reply-To: <vciljc$q5fh$1@dont-email.me>
Bytes: 10938

On 9/19/2024 9:09 PM, BGB wrote:
> On 9/18/2024 1:42 PM, MitchAlsup1 wrote:
>> On Wed, 18 Sep 2024 17:55:34 +0000, BGB wrote:
>>
>>> On 9/18/2024 9:27 AM, MitchAlsup1 wrote:
>>>> On Wed, 18 Sep 2024 4:00:43 +0000, BGB wrote:
>>>>
>>>>> On 9/17/2024 6:04 PM, MitchAlsup1 wrote:
>>>>
>>>>>> Still limited to 32-bit displacement from IP.
>>>>>>
>>>>>> How would you perform the following call::
>>>>>> current IP = 0x0000000000001234
>>>>>> target  IP = 0x7FFFFFFF00001234
>>>>>>
>>>>>> This is a single (2-word) instruction in my ISA, assuming GOT is
>>>>>> 32-bit displaceable and 64-bit entries.
>>>>>>
>>>>>
>>>>> Granted, but in plain RISC-V, there is no real better option.
>>>>>
>>>>> If one wants to generate 64-bit displacement, and doesn't want to 
>>>>> load a
>>>>> constant from memory:
>>>>>    LUI X6, Disp20Hi       //20 bits
>>>>>    ADDI X6, X6, Disp12Hi  //12 bits
>>>>>    AUIPC X7, Disp20Lo
>>>>>    ADD X7, Disp12Lo
>>>>>    SLLI X6, X6, 32
>>>>>    ADD X7, X7, X6
>>>>
>>>> How very much simpler is::
>>>>
>>>>      MEM    Rd,[IP,Ri<<s,DISP64]
>>>>
>>>> 1 instruction, 3 words, 1 decode cycle, no forwarding, shorter latency.
>>>
>>>
>>> It is simpler, but N/E in RV64G...
>>>
>>> This is the whole issue of the idea:
>>>    Remain backwards compatible with RV64G / RV64GC (in a binary sense).
>>
>> So, you like sailing with an albatross tied around your neck:: Check.
>>
> 
> Likely for a custom CPU to be taken all that seriously at this point, 
> one is going to need binary compatibility with at least one semi-popular 
> ISA.
> 
> And, main options at this point are:
>    RISC-V, which is just kinda meh;
>    ARMv7 / ARMv8, which are not free;
>      And, v7/v8 are nowhere near patents expiring.
>    x86-64, just no.
>      Doable at least as far as x86-64 and SSE2 should be in the clear.
>      But, making it not perform well seems harder.
> 
> Well, or MIPS64 or SPARC64 or similar, but these are arguably worse 
> options than RISC-V.
> 
> 
>>> *and* try to allow extending it in a way such that performance can be
>>> less poor...
>>
>> I should remind you that if you eliminate the compressed parts of
>> RISC-V you can fit the entire My 66000 ISA in the space remaining.
>> All the constants, all transcendentals, all the far-control transfers,
>> the efficient context switching, overhead free world switching,...
>> ---------
> 
> The idea is that the mode switching can allow swapping out the 
> Compressed instructions to make room for other stuff, while also leaving 
> the compressed instructions in existence for compatibility with binaries 
> built assuming them.
> 
> And, is less drastic than gluing together two unrelated ISA's using 
> inter-ISA branches (say, the current situation of trying to mix RISC-V 
> code with XG2 via magic function pointers).
> 
> 
> But, yeah, if you want to design a version of your ISA than can also co- 
> execute with RISC-V, not like I have any reason to complain.
> 
> 
>>>>>
>>>>> Which is sort of the whole reason I am considering hacking around it
>>>>> with an alternate encoding scheme.
>>>>
>>>> Just put in real constants.
>>>>>
>>>>> New encoding scheme can in theory do:
>>>>>    LEA X7, PC, Disp64
>>>>> In a single 96-bit instruction.
>>>>
>>>> Where is the indexing register?
>>>
>>> Generally the use of a displacement and index register are mutually
>>> exclusive (and, cases that can make use of Disp AND Index are much less
>>> common than Disp OR Index).
>>
>>        COMMON ?alpha/ a(100,100), b(300,300),
>>
>> ..
>>
>>        x = a(i,j)*b(j,i);
>>
>> I see large displacements with indexing all the time from ASM out
>> of Brian's compiler.
>>
> 
> I tried adding this stuff experimentally with BGBCC in the past, in both 
> of my ISA efforts, but seemingly my attempts didn't use them all that 
> often (as opposed to [Rb+Disp] and [Rb+Ri*FixSc] which are used 
> extensively).
> 
> Arguably, the main relevant cases would have been for stack-arrays or 
> arrays inside structs.
> 
> But, if such an array is referenced multiple times in a given basic 
> block, it would likely still be more efficient to load the address of 
> the array into a register.
> 
> 
> Though, if one were to go simply on usage frequency, likely auto- 
> increment would be slightly ahead.
> 
> Say (roughly from memory):
>    [Rb+Disp]        // ~ 60% (includes PC and GBR)
>    [Rb+Ri*FixSc]    // ~ 30% (eg: "ptr[i]")
>    [Rb]+            // ~ 6% (eg: "*ptr++")
>    [Rb+Ri*Sc+Disp]  // ~ 4% (eg: "obj->arr[i]")
> 
> Well, unless someone can find a table that shows significantly different 
> stats. Off hand, not easily finding such a table to compare with 
> (preferably from a relatively mature target which has the relevant modes).
> 
> Can note that "*ptr++" seems most common for auto-increment, whereas 
> "*ptr--", "*--ptr", and "*++ptr" are rarer.
> 
> 
> Seems like no one has made tables online for the relative usage 
> frequencies of the various x86-64 and ARM64 addressing mode...
> 
> Might be useful to have this data for "relatively mature" architectures.
> Would be a pain to write an x86-64 disassembler mostly to use it just to 
> stat up the ModR/M+SIB sequences. Does raise the question of if there is 
> a semi-reliable way to stat this without needing to write a full 
> disassembler.
> 
> 
> One simple option would be to assume an instruction looks like:
>    [Prefix Bytes]
>    [REX byte]
>    OP_Byte | 0F+OP_Byte
>    Mod/RM + SIB + ...
> 
> And then use a heuristic to try to guess how to interpret the 
> instruction stream based on "looks better" (more likely to be aligned 
> with the instruction stream vs random unaligned garbage).
> 
> Though, such a "looks good" heuristic could itself risk skewing the 
> results.
========== REMAINDER OF ARTICLE TRUNCATED ==========