Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: BGB Newsgroups: comp.arch Subject: Re: Split instruction and immediate stream Date: Sat, 8 Mar 2025 14:15:16 -0600 Organization: A noiseless patient Spider Lines: 139 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 08 Mar 2025 21:16:30 +0100 (CET) Injection-Info: dont-email.me; posting-host="ef4ea5626118293222527d879ad81244"; logging-data="325515"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+u/0BCp0RYwQOrGsSNTNa6VGSxbeZf2bA=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:mqiRma3e2TPsm0occ6EYpOlJhpg= Content-Language: en-US In-Reply-To: On 3/8/2025 11:53 AM, MitchAlsup1 wrote: > On Sat, 8 Mar 2025 14:21:51 +0000, Thomas Koenig wrote: > >> There was a recent post to the gcc mailing list which showed >> interesting concept of dealing with large constants in an ISA: >> Splitting a the instruction and constant stream.  It can be found >> at https://github.com/michaeljclark/glyph/ , and is named "glyph". > > I knew a guy with that name at AMD--he did microcode--and did it well. > This was also posted to the RISC-V mailing list... >> I think the problem the author is trying to solve is better addressed by >> My 66000 (and I would absolutely _hate_ to write an assembler for it). >> Still, I thought it worth mentioning. > > I took a quick look, and it seems that > a) too few registers > b) too many OpCode bits > although it does look easy to parse. Yeah, a bit of a rebalance is needed... The design goes to 12 bit register fields for 64-bit ops, which is just absurd, and doesn't really leave enough bits for immediate encodings in the instruction formats. If I were to do a vaguely similar design, probably: Bit 0 of each 16-bit word indicates a following word is present; 16-bit ops have 2R with 16 registers; 32-bit ops have 3R with 64 registers. Say, 16b: zzzz-mmmm-nnnn-zzz0 //2R zzzz-iiii-nnnn-zzz0 //2RI, Imm4 zzzz-iiii-iiii-zzz0 //Imm8 (Branch, AddSP) Then, 32b: mmnn-mmmm-nnnn-zzz1 zzzz-tttt-ttzz-zzz0 //3R mmnn-mmmm-nnnn-zzz1 zzzz-iiii-iizz-zzz0 //3RI, Imm6 mmnn-mmmm-nnnn-zzz1 iiii-iiii-iizz-zzz0 //3RI, Imm10 iinn-iiii-nnnn-zzz1 iiii-iiii-iizz-zzz0 //2RI, Imm16 iiii-iiii-iiii-zzz1 iiii-iiii-iizz-zzz0 //Imm22 (Branch) Could have 48 and 64 bit encodings, which keep the same base layout as the 32-bit ops, but maybe extend immediate and opcode bits. Say, 48-bit: mmnn-mmmm-nnnn-zzz1 iiii-iiii-iizz-zzz1 iiii-iiii-iiii-iiz0 //3RI, Imm24 And, 64-bit: mmnn-mmmm-nnnn-zzz1 iiii-iiii-iizz-zzz1 iiii-iiii-iiii-iiz1 zzzz-iiii-iiii-izz0 //3RI, Imm33 For register space, might make sense to map the 16-bit ops to R16..R31, but then organize the registers such that it has access to both callee-save and argument registers. Say: R0 ..R3 ZR, LR, SP, GP R4 ..R15 Callee Save (12) R16..R23 Callee Save ( 4) R24..R27 Scratch ( 4) R28..R31 Args 0..3 ( 4) R32..R43 Args 4..15 (12) R44..R51 Scratch ( 8) R52..R63 Callee Save (12) 16b opcode map, possible: 00tt-mmmm-nnnn-0000 //Store (B/W/L/Q), "MOV.x Rn, (Rm)" 0100-iiii-nnnn-0000 MOV.Q Rn, (SP, Imm4*8) 0101-iiii-nnnn-0000 MOV.X Xn, (SP, Imm4*8) //Pair 0110-iiii-nnnn-0000 MOV.Q (SP, Imm4*8), Rn 0111-iiii-nnnn-0000 MOV.X (SP, Imm4*8), Xn //Pair 1ttt-mmmm-nnnn-0000 //Load (SB/SW/SL/Q, UB/UW/UL/X) 0000-mmmm-nnnn-0010 ADD Rm, Rn 0001-mmmm-nnnn-0010 SUB Rm, Rn 0010-mmmm-nnnn-0010 ADDSL Rm, Rn 0011-mmmm-nnnn-0010 SUBSL Rm, Rn 0100-mmmm-nnnn-0010 - 0101-mmmm-nnnn-0010 AND Rm, Rn 0110-mmmm-nnnn-0010 OR Rm, Rn 0111-mmmm-nnnn-0010 XOR Rm, Rn ... 0000-iiii-nnnn-0100 ADD Imm4u, Rn 0001-iiii-nnnn-0100 SUB Imm4u, Rn 0010-iiii-nnnn-0100 ADDSL Imm4u, Rn 0011-iiii-nnnn-0100 SUBSL Imm4u, Rn 0100-iiii-iiii-0100 ADD Imm8u*8, SP 0101-iiii-iiii-0100 SUB Imm8u*8, SP 0110-iiii-iiii-0100 BRA Imm8u (+512B) 0111-iiii-iiii-0100 BRA Imm8n (-512B) ... 00nn-iiii-nnnn-1010 ? MOV Imm4u, Yn 01nn-iiii-nnnn-1010 ? ADD Imm4u, Yn 10nn-iiii-nnnn-1010 ? MOV Imm4n, Yn 11nn-iiii-nnnn-1010 ? ADD Imm4n, Yn mmnn-mmmm-nnnn-1100 ? MOV Ym, Yn //2R MOV mmnn-mmmm-nnnn-1110 ? ADD Ym, Yn //2R ADD There are only a few ops which have access to the full GPR space, as this is very expensive for 16-bit ops, so best limited to only the most common cases. .... The 32-bit opcode map, not laid out here, would likely be entirely disconnected from the 16-bit map. Usual tradeoff though that 16/32/64/48 bit encodings would make superscalar more difficult and more expensive than 32/64. But, such a layout could potentially be good for code density at least I guess. Best I could come up with with a quick/dirty pull it seems... Don't have much time right now, so will leave it at this. ....