Article <v045in$hqoj$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v045in$hqoj$1@dont-email.me>
Deutsch English Français Italiano
<v045in$hqoj$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder6.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Stealing a Great Idea from the 6600
Date: Sun, 21 Apr 2024 17:56:21 -0500
Organization: A noiseless patient Spider
Lines: 502
Message-ID: <v045in$hqoj$1@dont-email.me>
References: <lge02j554ucc6h81n5q2ej0ue2icnnp7i5@4ax.com>
 <e2097beb24bf27eed0a92f14596bd59e@www.novabbs.org>
 <in312jlca131khq3vj0i24n6pb0hah2ur5@4ax.com>
 <71acfecad198c4e9a9b14ffab7fc1cb5@www.novabbs.org>
 <1s042jdli35gdo092v6uaupmrcmvo0i5vp@4ax.com>
 <oj742jdvpl21il2s5a1ndsp3oidsnfjmr6@4ax.com>
 <dd1866c4efb369b7b6cc499d718dc938@www.novabbs.org>
 <acq62j98dhmguil5ebce6lq4m9kkgt1fs2@4ax.com>
 <kkq62jppr53is4r70n151jl17bjd5kd6lv@4ax.com>
 <9d1fadaada2ec0683fc54688cce7cf27@www.novabbs.org>
 <v017mg$3rcg9$1@dont-email.me>
 <da6dc5fe28bb31b4c73d78ef1aac2ac5@www.novabbs.org>
 <v02eij$6d5b$1@dont-email.me>
 <152f8504112a37d8434c663e99cb36c5@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 22 Apr 2024 00:56:24 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7b1e3ac212388cea6886df46e04c8fee";
	logging-data="584467"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/dTiZJRtPbnAhpooR3G5aIarkjKEqCUcw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:+7J22inMRR2yeKhh/6pDKC46vxQ=
In-Reply-To: <152f8504112a37d8434c663e99cb36c5@www.novabbs.org>
Content-Language: en-US
Bytes: 20249

On 4/21/2024 1:57 PM, MitchAlsup1 wrote:
> BGB wrote:
> 
>> On 4/20/2024 5:03 PM, MitchAlsup1 wrote:
>>> BGB wrote:
>>>
>>> Compilers are notoriously unable to outguess a good branch predictor.
>>>
> 
>> Errm, assuming the compiler is capable of things like general-case 
>> inlining and loop-unrolling.
> 
>> I was thinking of simpler things, like shuffling operators between 
>> independent (sub)expressions to limit the number of register-register 
>> dependencies.
> 
>> Like, in-order superscalar isn't going to do crap if nearly every 
>> instruction depends on every preceding instruction. Even pipelining 
>> can't help much with this.
> 
> Pipelining CREATED this (back to back dependencies). No amount of
> pipelining can eradicate RAW data dependencies.
> 

Pretty much, this is the problem.

But, when one converts from expressions to instructions either via 
directly walking the AST, or by going to RPN and then generating 
instructions from the RPN. Then the generated code has this problem 
pretty bad.

Seemingly the only real fix is to try to shuffle things around, at the 
3AC or machine-instruction level, or both, to try to reduce the number 
of RAW dependencies.


Though, this is an areas where "things could have been done better" in 
BGBCC. Though, mostly it would be in the backend.

Ironically, the approach of first compiling everything into an RPN 
bytecode, then generating 3AC and machine code from the RPN, seems to 
work reasonably OK. Even if the bytecode itself is kinda weird.

Though, one area that could be improved is the memory overhead of BGBCC, 
where generally BGBCC uses too much RAM to really be viable to have 
TestKern be self-hosting.


>> The compiler can shuffle the instructions into an order to limit the 
>> number of register dependencies and better fit the pipeline. But, 
>> then, most of the "hard parts" are already done (so it doesn't take 
>> much more for the compiler to flag which instructions can run in 
>> parallel).
> 
> Compiler scheduling works for exactly 1 pipeline implementation and
> is suboptimal for all others.
> 

Possibly true.

But, can note, even crude shuffling is better than no shuffling this 
case. And, the shuffling needed to make an in-order superscalar not 
perform like crap, also happens to map over well to a LIW (and is the 
main hard part of the problem).


>> Meanwhile, a naive superscalar may miss cases that could be run in 
>> parallel, if it is evaluating the rules "coarsely" (say, evaluating 
>> what is safe or not safe to run things in parallel based on general 
>> groupings of opcodes rather than the rules of specific opcodes; or, 
>> say, false-positive register alias if, say, part of the Imm field of a 
>> 3RI instruction is interpreted as a register ID, ...).
> 
> 
>> Granted, seemingly even a naive approach is able to get around 20% ILP 
>> out of "GCC -O3" output for RV64G...
> 
>> But, the GCC output doesn't seem to be quite as weak as some people 
>> are claiming either.
> 
> 
>>>> ties the code to a specific pipeline structure, and becomes 
>>>> effectively moot with OoO CPU designs).
>>>
>>> OoO exists, in a practical sense, to abstract the pipeline out of the 
>>> compiler; or conversely, to allow multiple implementations to run the
>>> same compiled code optimally on each implementation.
>>>
> 
>> Granted, but OoO isn't cheap.
> 
> But it does get the job done.
> 

But... Also makes the CPU too big and expensive to fit into most 
consumer/hobbyist grade FPGAs.

They can do in-order designs pretty OK though.


People were doing some impressive looking things over on the Altera side 
of things, but it is harder to do a direct comparison between Cyclone V 
and Artix / Spartan.


Some stuff I was skimming though implied that I guess the free version 
of Quartus is more limited vs Vivado, and one effectively needs to pay 
for the commercial version to make full use of the FPGA (whereas Vivado 
allows mostly full use of the FPGA, but not any FPGA's larger than a 
certain cutoff).

Well, and the non-free version of Vivado costs well more than I could 
justify spending on a hobby project.


>>>> So, a case could be made that a "general use" ISA be designed 
>>>> without the use of explicit bundling. In my case, using the bundle 
>>>> flags also requires the code to use an instruction to signal to the 
>>>> CPU what configuration of pipeline it expects to run on, with the 
>>>> CPU able to fall back to scalar (or superscalar) execution if it 
>>>> does not match.
>>>
>>> Sounds like a bridge too far for your 8-wide GBOoO machine.
>>>
> 
>> For sake of possible fancier OoO stuff, I upheld a basic requirement 
>> for the instruction stream:
>> The semantics of the instructions as executed in bundled order needs 
>> to be equivalent to that of the instructions as executed in sequential 
>> order.
> 
>> In this case, the OoO CPU can entirely ignore the bundle hints, and 
>> treat "WEXMD" as effectively a NOP.
> 
> 
>> This would have broken down for WEX-5W and WEX-6W (where enforcing a 
>> parallel==sequential constraint effectively becomes unworkable, and/or 
>> renders the wider pipeline effectively moot), but these designs are 
>> likely dead anyways.
> 
>> And, with 3-wide, the parallel==sequential order constraint remains in 
>> effect.
> 
> 
>>>> For the most part, thus far nearly everything has ended up as "Mode 
>>>> 2", namely:
>>>>    3 lanes;
>>>>      Lane 1 does everything;
>>>>      Lane 2 does Basic ALU ops, Shift, Convert (CONV), ...
>>>>      Lane 3 only does Basic ALU ops and a few CONV ops and similar.
>>>>        Lane 3 originally also did Shift, dropped to reduce cost.
>>>>      Mem ops may eat Lane 3, ...
>>>
>>> Try 6-lanes:
>>>     1,2,3 Memory ops + integer ADD and Shifts
>>>     4     FADD   ops + integer ADD and FMisc
>>>     5     FMAC   ops + integer ADD
>>>     6     CMP-BR ops + integer ADD
>>>
> 
>> As can be noted, my thing is more a "LIW" rather than a "true VLIW".
> 
> Mine is neither LIW or VLIW but it definitely is LBIO through GBOoO
> 

I aimed for Scalar and LIW.
========== REMAINDER OF ARTICLE TRUNCATED ==========