Article <vqek6h$3fro6$1@dont-email.me>

Deutsch English Français Italiano
<vqek6h$3fro6$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Why VAX Was the Ultimate CISC and Not RISC
Date: Fri, 7 Mar 2025 05:08:56 -0600
Organization: A noiseless patient Spider
Lines: 146
Message-ID: <vqek6h$3fro6$1@dont-email.me>
References: <vpufbv$4qc5$1@dont-email.me>
 <2025Mar1.125817@mips.complang.tuwien.ac.at> <vpvrn5$2hq0$1@gal.iecc.com>
 <2025Mar1.232526@mips.complang.tuwien.ac.at> <vq2dfr$2skk$1@gal.iecc.com>
 <2025Mar2.234011@mips.complang.tuwien.ac.at> <5pkg9l-kipt.ln1@msc27.me.uk>
 <2025Mar3.174417@mips.complang.tuwien.ac.at> <vq4qav$1dksd$1@dont-email.me>
 <vq5dm2$1h3mg$5@dont-email.me> <2025Mar4.110420@mips.complang.tuwien.ac.at>
 <vq829a$232tl$6@dont-email.me> <2025Mar5.083636@mips.complang.tuwien.ac.at>
 <vqdljd$29f8f$2@paganini.bofh.team> <vqdrh9$3cdrc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 07 Mar 2025 12:10:10 +0100 (CET)
Injection-Info: dont-email.me; posting-host="027fdba69f1db440195e9e85ee186d70";
	logging-data="3665670"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+LbMj3RFA/QGi0Sxhml/O3awrP2XxOpBc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:0z5HIys1HgpCRxXzTJ/il+U9BNU=
In-Reply-To: <vqdrh9$3cdrc$1@dont-email.me>
Content-Language: en-US
Bytes: 7247

On 3/6/2025 10:09 PM, Lawrence D'Oliveiro wrote:
> On Fri, 7 Mar 2025 02:27:59 -0000 (UTC), Waldek Hebisch wrote:
> 
>> VAX intstructions are very complex and much of that complexity is hard
>> to use in compilers.
> 
> A lot of them mapped directly to common high-level operations. E.g. MOVC3/
> MOVC5 for string copying, and of course POLYx for direct evaluation of
> polynomial functions.
> 
> In a way, one could say that, in many ways, VAX machine language was a
> higher-level language than Fortran.

In a lot of this, it is almost a mystery if one could get something 
similar with a simpler/cheaper ISA by having a pseudo-instruction that 
does, say:
   Move two register inputs to fixed special registers;
   Branch to a location within a table supplied in another SPR.

Say:
   RTTCALL 27, R18, R20

Which does the functional equivalent of, say, in RV terms:
   MV    X10, X18
   MV    X11, X20
   JALR  X1, 108(RTT)
Where RTT is a dedicated "Runtime Table" register (that is hard-coded in 
the decoder logic).

....





In other news, got around to implementing the BITMOV logic in BGBCC and 
similar, and implementing support for Verilog style notation.

So, writing things like:
   y[55:48]=x[19:12];
Is now technically possible in my C variant, and (if the BITMOV 
instruction is enabled) be encoded in a single instruction.

Where:
   y[55:48]=x[19:12];
Is 1 instruction, main requirement is that the source and destination 
bitfield are the same width (widening will require multiple ops).

If the instruction is not enabled, the fallback path is 4 to 6 
instructions (in BJX2), or around 12 to 16 instructions for RV64G. I 
decided to also add support for BITMOV to the RV decoder via my 
jumbo-extension encodings (though, with some limitations on the 128-bit 
case due to it needing to fit into a 21 bit immediate).

And:
   j=x[19:12];
Also a single instruction, or 2 or 3 in the fallback case (encoded as a 
shift and mask).

It is possible I could add something to infer a BITMOV from 
"(x>>SHL)&MASK", though inferring bitfield insert is likely to be too 
difficult for now.

At present, XG3 and RV will have a slight advantage over XG1 and XG2 for 
this instruction in that they have access to a zero register, which 
as-is doing a bitfield extract via BITMOV will require having access to 
0. I am half tempted to consider whether it may make sense to 
special-case the decoding to interpret R1 on the Ro/Rp input as meaning 
ZZR (well, or just live with the compiler needing to load 0 into a 
register to use it as a bitfield extract, albeit slightly diminishing 
its advantage over shift+mask in this case; though, statistically, 0 is 
likely to have already been loaded into a register either way).

For a simple test:
   lj[ 7: 0]=li[31:24];
   lj[15: 8]=li[23:16];
   lj[23:16]=li[15: 8];
   lj[31:24]=li[ 7: 0];
Does seem to compile down to 4 instructions.

Though, looking at the compiler code, it would be subject to the "side 
effects in lvalue may be applied twice" bug:
   (*ct++)[19:12]=(*cs++)[15:8];

Will result in the 'ct++' part happening twice (this is a pesky bug that 
is prone to reappear). Though, a lazy option could be, since this is a 
weird extension anyways, to simply claim it as undefined if the lvalue 
expression has side effects.



Maybe will add, say:
   (_BitInt(N)) { x, y, z };


Some uncertainties remain for how to approach adding VL support to BGBCC:
   How to best approach instantiating modules;
   How to best deal with "always" blocks;
     High-level control flow is very different in Verilog vs C;
   How to best deal with 'case' and 'casez'.
     They don't exactly match up 1:1 with a C switch block.

It may make sense to special-case some "case()" blocks into lookup 
tables. In small cases, "casez" could be turned into a lookup table or a 
generic switch, but other cases might require linear pattern matching, 
which would be slow...

I have some ideas for special cases, not no idea for a good "generic 
case" strategy that would be much faster than O(n) ...

Best I can come up with is:
   Walk all patterns, generate a mask of used bits;
   Generate a hash table of (bits&mask) pointing to a "guess" entry;
     Iterating over every possible bit-pattern.
     Only hitting for the first matching entry.
   If guessed match fails, it falls back to a linear search.

Landing on the first possible match and then doing a linear search for 
subsequent matches has usually worked in the past. Worst case is still a 
linear search though. Since casez patterns are formally not allowed to 
overlap, it should be possible to sort them into ascending order (which 
may reduce the probability of adverse cases) but is debatable.

Either way, it is a lot more complicated than a normal "switch()".


For module instantiation:
Probably the module itself is compiled with each "always @*" block or 
similar as a function;
Any "assign" statements can probably also be lumped together;
The module instance is itself turned into an object instance, with all 
of its registers as object fields;
The blocks are then essentially virtual methods.

The most naive way to do this:
Each module has an implicit "Run()" methods, which runs all of the 
always blocks, and selectively runs "posedge" or "negedge" blocks if the 
selected clock has changed (afterwards, the new clock is assigned to a 
variable holding the last-known value, so that the posedge/negedge only 
triggers when the value changes).


But, I am starting to have doubts as to the sanity of trying to add 
support for Verilog to BGBCC...