Article <vciljc$q5fh$1@dont-email.me>

Deutsch English Français Italiano
<vciljc$q5fh$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Thu, 19 Sep 2024 21:09:35 -0500
Organization: A noiseless patient Spider
Lines: 195
Message-ID: <vciljc$q5fh$1@dont-email.me>
References: <vaqgtl$3526$1@dont-email.me>
 <2024Sep10.101932@mips.complang.tuwien.ac.at> <ygn8qvztf16.fsf@y.z>
 <2024Sep11.123824@mips.complang.tuwien.ac.at> <vbsoro$3ol1a$1@dont-email.me>
 <867cbhgozo.fsf@linuxsc.com> <20240912142948.00002757@yahoo.com>
 <vbuu5n$9tue$1@dont-email.me> <20240915001153.000029bf@yahoo.com>
 <vc6jbk$5v9f$1@paganini.bofh.team> <20240915154038.0000016e@yahoo.com>
 <2024Sep15.194612@mips.complang.tuwien.ac.at> <vc8m5k$2nf2l$1@dont-email.me>
 <vc8tlj$2od19$3@dont-email.me> <vca209$319ci$1@dont-email.me>
 <vcbiov$3ecji$3@dont-email.me> <vccmm3$3m42h$1@dont-email.me>
 <e060fe2e0ee375efff2a9ab1223652f5@www.novabbs.org>
 <vccv3r$3nfqv$1@dont-email.me>
 <45fb24ca46af5c388b0a44af2f72ddf6@www.novabbs.org>
 <vcdjbn$3u259$1@dont-email.me>
 <77a593b0e8dcb7e4f38c006d3a148cdc@www.novabbs.org>
 <vcf491$57pi$1@dont-email.me>
 <7a8a967098cb2558c1bbdda5cb3ce99f@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 20 Sep 2024 04:10:52 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a023699a4ef7a97393a5f9b5d1924251";
	logging-data="857585"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19XYIv8O0D4WsAcDs1QqXX6Wmk4qqVUo/s="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:L/5FBzhGzlWo04Dg8IOn5QewHkY=
In-Reply-To: <7a8a967098cb2558c1bbdda5cb3ce99f@www.novabbs.org>
Content-Language: en-US
Bytes: 8817

On 9/18/2024 1:42 PM, MitchAlsup1 wrote:
> On Wed, 18 Sep 2024 17:55:34 +0000, BGB wrote:
> 
>> On 9/18/2024 9:27 AM, MitchAlsup1 wrote:
>>> On Wed, 18 Sep 2024 4:00:43 +0000, BGB wrote:
>>>
>>>> On 9/17/2024 6:04 PM, MitchAlsup1 wrote:
>>>
>>>>> Still limited to 32-bit displacement from IP.
>>>>>
>>>>> How would you perform the following call::
>>>>> current IP = 0x0000000000001234
>>>>> target  IP = 0x7FFFFFFF00001234
>>>>>
>>>>> This is a single (2-word) instruction in my ISA, assuming GOT is
>>>>> 32-bit displaceable and 64-bit entries.
>>>>>
>>>>
>>>> Granted, but in plain RISC-V, there is no real better option.
>>>>
>>>> If one wants to generate 64-bit displacement, and doesn't want to 
>>>> load a
>>>> constant from memory:
>>>>    LUI X6, Disp20Hi       //20 bits
>>>>    ADDI X6, X6, Disp12Hi  //12 bits
>>>>    AUIPC X7, Disp20Lo
>>>>    ADD X7, Disp12Lo
>>>>    SLLI X6, X6, 32
>>>>    ADD X7, X7, X6
>>>
>>> How very much simpler is::
>>>
>>>      MEM    Rd,[IP,Ri<<s,DISP64]
>>>
>>> 1 instruction, 3 words, 1 decode cycle, no forwarding, shorter latency.
>>
>>
>> It is simpler, but N/E in RV64G...
>>
>> This is the whole issue of the idea:
>>    Remain backwards compatible with RV64G / RV64GC (in a binary sense).
> 
> So, you like sailing with an albatross tied around your neck:: Check.
> 

Likely for a custom CPU to be taken all that seriously at this point, 
one is going to need binary compatibility with at least one semi-popular 
ISA.

And, main options at this point are:
   RISC-V, which is just kinda meh;
   ARMv7 / ARMv8, which are not free;
     And, v7/v8 are nowhere near patents expiring.
   x86-64, just no.
     Doable at least as far as x86-64 and SSE2 should be in the clear.
     But, making it not perform well seems harder.

Well, or MIPS64 or SPARC64 or similar, but these are arguably worse 
options than RISC-V.


>> *and* try to allow extending it in a way such that performance can be
>> less poor...
> 
> I should remind you that if you eliminate the compressed parts of
> RISC-V you can fit the entire My 66000 ISA in the space remaining.
> All the constants, all transcendentals, all the far-control transfers,
> the efficient context switching, overhead free world switching,...
> ---------

The idea is that the mode switching can allow swapping out the 
Compressed instructions to make room for other stuff, while also leaving 
the compressed instructions in existence for compatibility with binaries 
built assuming them.

And, is less drastic than gluing together two unrelated ISA's using 
inter-ISA branches (say, the current situation of trying to mix RISC-V 
code with XG2 via magic function pointers).


But, yeah, if you want to design a version of your ISA than can also 
co-execute with RISC-V, not like I have any reason to complain.


>>>>
>>>> Which is sort of the whole reason I am considering hacking around it
>>>> with an alternate encoding scheme.
>>>
>>> Just put in real constants.
>>>>
>>>> New encoding scheme can in theory do:
>>>>    LEA X7, PC, Disp64
>>>> In a single 96-bit instruction.
>>>
>>> Where is the indexing register?
>>
>> Generally the use of a displacement and index register are mutually
>> exclusive (and, cases that can make use of Disp AND Index are much less
>> common than Disp OR Index).
> 
>        COMMON ?alpha/ a(100,100), b(300,300),
> 
> ..
> 
>        x = a(i,j)*b(j,i);
> 
> I see large displacements with indexing all the time from ASM out
> of Brian's compiler.
> 

I tried adding this stuff experimentally with BGBCC in the past, in both 
of my ISA efforts, but seemingly my attempts didn't use them all that 
often (as opposed to [Rb+Disp] and [Rb+Ri*FixSc] which are used 
extensively).

Arguably, the main relevant cases would have been for stack-arrays or 
arrays inside structs.

But, if such an array is referenced multiple times in a given basic 
block, it would likely still be more efficient to load the address of 
the array into a register.


Though, if one were to go simply on usage frequency, likely 
auto-increment would be slightly ahead.

Say (roughly from memory):
   [Rb+Disp]        // ~ 60% (includes PC and GBR)
   [Rb+Ri*FixSc]    // ~ 30% (eg: "ptr[i]")
   [Rb]+            // ~ 6% (eg: "*ptr++")
   [Rb+Ri*Sc+Disp]  // ~ 4% (eg: "obj->arr[i]")

Well, unless someone can find a table that shows significantly different 
stats. Off hand, not easily finding such a table to compare with 
(preferably from a relatively mature target which has the relevant modes).

Can note that "*ptr++" seems most common for auto-increment, whereas 
"*ptr--", "*--ptr", and "*++ptr" are rarer.


Seems like no one has made tables online for the relative usage 
frequencies of the various x86-64 and ARM64 addressing mode...

Might be useful to have this data for "relatively mature" architectures.
Would be a pain to write an x86-64 disassembler mostly to use it just to 
stat up the ModR/M+SIB sequences. Does raise the question of if there is 
a semi-reliable way to stat this without needing to write a full 
disassembler.


One simple option would be to assume an instruction looks like:
   [Prefix Bytes]
   [REX byte]
   OP_Byte | 0F+OP_Byte
   Mod/RM + SIB + ...

And then use a heuristic to try to guess how to interpret the 
instruction stream based on "looks better" (more likely to be aligned 
with the instruction stream vs random unaligned garbage).

Though, such a "looks good" heuristic could itself risk skewing the results.


>> I may still consider defining an encoding for this, but not yet. It is
========== REMAINDER OF ARTICLE TRUNCATED ==========