Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Split instruction and immediate stream
Date: Sat, 8 Mar 2025 14:15:16 -0600
Organization: A noiseless patient Spider
Lines: 139
Message-ID: <vqi8iu$9tsb$1@dont-email.me>
References: <vqhjpv$65am$1@dont-email.me>
 <c89e07f625546e34b17cbeb9fe3d7a0c@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 08 Mar 2025 21:16:30 +0100 (CET)
Injection-Info: dont-email.me; posting-host="ef4ea5626118293222527d879ad81244";
	logging-data="325515"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+u/0BCp0RYwQOrGsSNTNa6VGSxbeZf2bA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:mqiRma3e2TPsm0occ6EYpOlJhpg=
Content-Language: en-US
In-Reply-To: <c89e07f625546e34b17cbeb9fe3d7a0c@www.novabbs.org>

On 3/8/2025 11:53 AM, MitchAlsup1 wrote:
> On Sat, 8 Mar 2025 14:21:51 +0000, Thomas Koenig wrote:
> 
>> There was a recent post to the gcc mailing list which showed
>> interesting concept of dealing with large constants in an ISA:
>> Splitting a the instruction and constant stream.  It can be found
>> at https://github.com/michaeljclark/glyph/ , and is named "glyph".
> 
> I knew a guy with that name at AMD--he did microcode--and did it well.
> 

This was also posted to the RISC-V mailing list...


>> I think the problem the author is trying to solve is better addressed by
>> My 66000 (and I would absolutely _hate_ to write an assembler for it).
>> Still, I thought it worth mentioning.
> 
> I took a quick look, and it seems that
> a) too few registers
> b) too many OpCode bits
> although it does look easy to parse.


Yeah, a bit of a rebalance is needed...

The design goes to 12 bit register fields for 64-bit ops, which is just 
absurd, and doesn't really leave enough bits for immediate encodings in 
the instruction formats.




If I were to do a vaguely similar design, probably:
   Bit 0 of each 16-bit word indicates a following word is present;
   16-bit ops have 2R with 16 registers;
   32-bit ops have 3R with 64 registers.


Say, 16b:
   zzzz-mmmm-nnnn-zzz0  //2R
   zzzz-iiii-nnnn-zzz0  //2RI, Imm4
   zzzz-iiii-iiii-zzz0  //Imm8      (Branch, AddSP)

Then, 32b:
   mmnn-mmmm-nnnn-zzz1  zzzz-tttt-ttzz-zzz0  //3R
   mmnn-mmmm-nnnn-zzz1  zzzz-iiii-iizz-zzz0  //3RI, Imm6
   mmnn-mmmm-nnnn-zzz1  iiii-iiii-iizz-zzz0  //3RI, Imm10
   iinn-iiii-nnnn-zzz1  iiii-iiii-iizz-zzz0  //2RI, Imm16
   iiii-iiii-iiii-zzz1  iiii-iiii-iizz-zzz0  //Imm22 (Branch)

Could have 48 and 64 bit encodings, which keep the same base layout as 
the 32-bit ops, but maybe extend immediate and opcode bits.

Say, 48-bit:
   mmnn-mmmm-nnnn-zzz1  iiii-iiii-iizz-zzz1
   iiii-iiii-iiii-iiz0  //3RI, Imm24

And, 64-bit:
   mmnn-mmmm-nnnn-zzz1  iiii-iiii-iizz-zzz1
   iiii-iiii-iiii-iiz1  zzzz-iiii-iiii-izz0  //3RI, Imm33


For register space, might make sense to map the 16-bit ops to R16..R31, 
but then organize the registers such that it has access to both 
callee-save and argument registers.

Say:
   R0 ..R3   ZR, LR, SP, GP
   R4 ..R15  Callee Save (12)
   R16..R23  Callee Save ( 4)
   R24..R27  Scratch     ( 4)
   R28..R31  Args 0..3   ( 4)
   R32..R43  Args 4..15  (12)
   R44..R51  Scratch     ( 8)
   R52..R63  Callee Save (12)


16b opcode map, possible:
   00tt-mmmm-nnnn-0000  //Store (B/W/L/Q),  "MOV.x Rn, (Rm)"
   0100-iiii-nnnn-0000  MOV.Q  Rn, (SP, Imm4*8)
   0101-iiii-nnnn-0000  MOV.X  Xn, (SP, Imm4*8)  //Pair
   0110-iiii-nnnn-0000  MOV.Q  (SP, Imm4*8), Rn
   0111-iiii-nnnn-0000  MOV.X  (SP, Imm4*8), Xn  //Pair
   1ttt-mmmm-nnnn-0000  //Load  (SB/SW/SL/Q, UB/UW/UL/X)

   0000-mmmm-nnnn-0010  ADD   Rm, Rn
   0001-mmmm-nnnn-0010  SUB   Rm, Rn
   0010-mmmm-nnnn-0010  ADDSL Rm, Rn
   0011-mmmm-nnnn-0010  SUBSL Rm, Rn
   0100-mmmm-nnnn-0010  -
   0101-mmmm-nnnn-0010  AND   Rm, Rn
   0110-mmmm-nnnn-0010  OR    Rm, Rn
   0111-mmmm-nnnn-0010  XOR   Rm, Rn
   ...

   0000-iiii-nnnn-0100  ADD   Imm4u, Rn
   0001-iiii-nnnn-0100  SUB   Imm4u, Rn
   0010-iiii-nnnn-0100  ADDSL Imm4u, Rn
   0011-iiii-nnnn-0100  SUBSL Imm4u, Rn
   0100-iiii-iiii-0100  ADD   Imm8u*8, SP
   0101-iiii-iiii-0100  SUB   Imm8u*8, SP
   0110-iiii-iiii-0100  BRA   Imm8u (+512B)
   0111-iiii-iiii-0100  BRA   Imm8n (-512B)

   ...

   00nn-iiii-nnnn-1010 ? MOV   Imm4u, Yn
   01nn-iiii-nnnn-1010 ? ADD   Imm4u, Yn
   10nn-iiii-nnnn-1010 ? MOV   Imm4n, Yn
   11nn-iiii-nnnn-1010 ? ADD   Imm4n, Yn

   mmnn-mmmm-nnnn-1100 ? MOV   Ym, Yn   //2R MOV
   mmnn-mmmm-nnnn-1110 ? ADD   Ym, Yn   //2R ADD

There are only a few ops which have access to the full GPR space, as 
this is very expensive for 16-bit ops, so best limited to only the most 
common cases.

....


The 32-bit opcode map, not laid out here, would likely be entirely 
disconnected from the 16-bit map.


Usual tradeoff though that 16/32/64/48 bit encodings would make 
superscalar more difficult and more expensive than 32/64.

But, such a layout could potentially be good for code density at least I 
guess.

Best I could come up with with a quick/dirty pull it seems...


Don't have much time right now, so will leave it at this.

....