Article <vff6vd$31kl9$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <vff6vd$31kl9$1@dont-email.me>
Deutsch English Français Italiano
<vff6vd$31kl9$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: x86S Specification
Date: Thu, 24 Oct 2024 23:31:35 -0500
Organization: A noiseless patient Spider
Lines: 265
Message-ID: <vff6vd$31kl9$1@dont-email.me>
References: <dqfQO.411015$WOde.295848@fx09.iad> <vf6j1l$144cr$1@dont-email.me>
 <3c6510cc947a1b59b62753de4cf98293@www.novabbs.org>
 <vf6ucr$g6j$1@gal.iecc.com> <2024Oct22.172620@mips.complang.tuwien.ac.at>
 <vf8rov$1jsqv$1@dont-email.me>
 <5d79e4ceda7bf46346a80da098645adc@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 25 Oct 2024 06:31:42 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="73a54a87420ae18ff9fa03b74f73f488";
	logging-data="3199657"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+aBNlv7Ix+OKAsxP8/Ihodxg2wBDmdrWs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:bZ97h0j9IqFuB6yZExGx78cQExg=
In-Reply-To: <5d79e4ceda7bf46346a80da098645adc@www.novabbs.org>
Content-Language: en-US
Bytes: 11059

On 10/22/2024 4:13 PM, MitchAlsup1 wrote:
> On Tue, 22 Oct 2024 18:43:40 +0000, BGB wrote:
> 
>> On 10/22/2024 10:26 AM, Anton Ertl wrote:
>>>
>>> Several things in this paragraph makes no sense.
>>>
>>> In particular, x86S is a proposal for a reduced version of the stuff
>>> that current Intel and AMD CPUs support: There is full 64-bit support,
>>> and 32-bit user-level support.  x86S eliminates a part of the
>>> compatibility path from systems of yesteryear, but not that many
>>> people use these parts nowadays anyway.  It's unclear to me what
>>> benefits these changes are supposed to buy (unlike the elimination of
>>> A32/T32 from some ARM chips, which obviously eliminates the whole
>>> A32/T32 decoding path).  It seems to me that most of the complexity of
>>> current CPUs would still be there.
>>>
>>> And I certainly prefer a CPU that has more capabilities to one that
>>> has less capabilities.  Sometimes I want to run old binaries.
>>>
>>> So what would be my incentive as a user to buy an x86S CPU?  Will they
>>> sell them for less?  I doubt it.
>>>
>>
>> Yeah, basically my thoughts as well.
>>    Business as usual...
>>
>> Main effect it achieves is breaking legacy boot, doesn't seem like it
>> would either save all that much nor "solve" x86's longstanding issues.
> 
> Intel needs a better way to exit reset--and that means the MMU/TLBs
> are already up and working at the time reset is exited. This cannot
> be made backwards compatible.
> -------------------------------

I am not sure how this would have much effect on cost either way.
A physical address mode could just be some edge case logic in the MMU 
(say, whenever there is a TLB miss with MMU disabled, it merely loads an 
identity mapped address into the TLB).


>>
>> *1: Probably, say (if I were designing the encoding):
>>    {Rb+Disp10s]        //32-bit encoding
>>    {Rb+Ri*FixSc]       //32-bit encoding
>>    {Rb+Ri*Sc]          //64-bit encoding
>>    [Rb+Disp33s]        //64-bit encoding
>>    [Rb+Ri*Sc+Disp11s]  //64-bit encoding
>>    [Rb+Ri*Sc+Disp33s]  //96-bit encoding
> 
>      [Rb+DISP16]         // 32-bit   16 > 10
>      [Rb+Ri<<sc]         // 32-bit
>      [Rb+Ri<<sc+DISP32]  // 64-bit   32 > 11
>      [Rb+Ri<<sc+DISP64]  // 96-bit   64 > 33


One doesn't want to burn too much encoding space...

If the goal is to redesign x86 as a RISC-like ISA, one is likely going 
to need a lot of space for opcode bits.

This is partly why I was thinking 32 registers rather than 64, along 
with the smaller immediate fields.

Say, one possible encoding scheme would be to use a similar base format 
to RISC-V:
   ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYYY1  //32-bit op
   ZZZZZZZ-ttttt-mmmmm-ZZZ-nnnnn-YY-YYYY0  //64/96-bit op

Then, say:
   1/2 the 32-bit encoding space is 3R ops:
   1/4 the 32-bit encoding space is 3RI ops:
   Remaining 1/4 for Imm16 and JMP/JCC and similar.

Say, could burn a 24/25-bit chunk of encoding space on JMP/CALL/JCC
   iiiiiii-iiiii-iiiii-iii-Zcccc-YY-YYYY1
Where:
   cccc is like x86 Jcc condition code,
     but maybe reuse P and NP for JMP and CALL.

Though, might make sense to do CALL/RET using a link-register rather 
than the stack, even if x86 traditionally used the stack.



For 64-bit:
   LD/ST/OPLD/OPST:  [Rb+Disp10] expands to [Rb+Disp33s]
   LD/ST/OPLD/OPST:  [Rb+Ri*Sc] expands to [Rb+Ri*Sc+Disp11s] or Disp17s.
     Remaining bits go to opcode.

Say:
   ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1  //MEM [Rm+Rt*Sc]
And:
   iiiiiii-iiiii-iiiii-xxx-xxxxx-xx-xxxx0 -
   ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1  //MEM [Rm+Rt*Sc+Disp17s]
And:
   iiiiiii-iiiii-iiiii-iii-iiiii-ii-iiii0 -
   kkkkkkk-kkkkk-kkkkk-xxx-xxxxx-ii-xxxx0 -
   ZZZZZZZ-ttttt-mmmmm-dss-nnnnn-YY-YYYY1  //MEM [Rm+Rt*Sc+Disp33s]


Could maybe use some of the extra bits encoding things like:
   ADD.Q [Rb+Ri*Sc+Disp33s], Imm17s.
Or:
   ADD.Q [Rb+Ri*Sc+Disp17s], Imm33s.
Say, by having a Rn/Imm bit, and a bit to specify which immediate is 
used as the constant and the other as the displacement.


But, with Disp10 base-forms, might expand to Disp33:
   iiiiiii-iiiii-iiiii-xxx-iiiii-xx-xxxx0 -
   iiiiiZZ-iiiii-mmmmm-dZi-nnnnn-YY-YYYY1  //MEM [Rm+Rt*Sc+Disp17s]

Where the 'd' flag could select between, say:
   "ADD Rn, [Rm+Disp]" or "ADD [Rm+Disp], Rn"

32-bit encodings only allowing a register, whereas 64-bit encodings 
could allow an immediate.


But, not really sure...





In other news, went and wrote up a spec and threw together Verilog code 
for a reworked BSR4K/XG3 ISA design:
   https://pastebin.com/yfrh50bk

There are still some holes (the spec is missing pretty much all the 2R 
ops for now), but alas. A few parts I have decided would not necessarily 
be carried over, as some newer instructions and the addition of a Zero 
Register made some amount of the former 2R and 2RI instructions no 
longer necessary (though, some could still be useful for efficiency; or 
have other useful roles like format conversion).


To make implementation cheaper/easier for me, it is essentially XG2RV 
with the bits shuffled around, a few inverted, and some special case 
changes (changes branch mechanics and some edge cases involving decoding 
immediate values).

Initially I tried putting the repacking logic at the front end of the ID 
stage, but (unsurprisingly), synthesis and timing wasn't too happy about 
this...

Ended up instead putting the repack logic at the end of the IF stage.


There was another possible idea that I could call BSR4J:
   Would have done a simpler repacking scheme:
     First 16 bits are repacked:
       NMOP-YwYY-nnnn-mmmm => NMOY-mmmm-nnnn-YYPw
       High 16 bits copied unmodified.

   So, overall instruction format, seen as 32-bits, could have been:
       ZZZZ-qnmo-oooo-XXXX-NMOY-mmmm-nnnn-YYPw


But, it was admittedly more tempting, if I am going to be repacking 
anyways, to make an attempt to "un-dog-chew" the instruction format (in 
an attempt to make it look nicer).

It is not fully settled yet, could jump over to the BSR4J strategy 
instead if the more aggressive repacking scheme is in-fact a bad idea. 
One arguable merit if does have is that all of the original 4-bit fields 
remain 4-bit aligned (and converting between XG2 and BSR4J would be 
significantly less bit-twiddling vs BSR4K; while still achieving the 
goal of being able to fit it into the same encoding space as RISC-V).



I have yet to decide on some specifics for the mapping of 2R instructions:
Simpler/cheaper: Use the same repacking as 3R ops for 2R ops;
========== REMAINDER OF ARTICLE TRUNCATED ==========