Article <v054gb$r679$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v054gb$r679$1@dont-email.me>

Deutsch English Français Italiano

<v054gb$r679$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Stealing a Great Idea from the 6600
Date: Mon, 22 Apr 2024 02:44:09 -0500
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <v054gb$r679$1@dont-email.me>
References: <lge02j554ucc6h81n5q2ej0ue2icnnp7i5@4ax.com>
 <e2097beb24bf27eed0a92f14596bd59e@www.novabbs.org>
 <in312jlca131khq3vj0i24n6pb0hah2ur5@4ax.com>
 <71acfecad198c4e9a9b14ffab7fc1cb5@www.novabbs.org>
 <1s042jdli35gdo092v6uaupmrcmvo0i5vp@4ax.com>
 <oj742jdvpl21il2s5a1ndsp3oidsnfjmr6@4ax.com>
 <dd1866c4efb369b7b6cc499d718dc938@www.novabbs.org>
 <acq62j98dhmguil5ebce6lq4m9kkgt1fs2@4ax.com>
 <kkq62jppr53is4r70n151jl17bjd5kd6lv@4ax.com>
 <9d1fadaada2ec0683fc54688cce7cf27@www.novabbs.org>
 <v017mg$3rcg9$1@dont-email.me>
 <da6dc5fe28bb31b4c73d78ef1aac2ac5@www.novabbs.org>
 <v02eij$6d5b$1@dont-email.me>
 <152f8504112a37d8434c663e99cb36c5@www.novabbs.org>
 <v04tpb$pqus$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 22 Apr 2024 09:44:11 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7b1e3ac212388cea6886df46e04c8fee";
	logging-data="891113"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18qqVe6hmnslVneJ+RZBW9acL+8yWPd1mA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:3gTeugqaUUVegIUUX07vhnfkiAA=
In-Reply-To: <v04tpb$pqus$1@dont-email.me>
Content-Language: en-US
Bytes: 4480

On 4/22/2024 12:49 AM, Terje Mathisen wrote:
> MitchAlsup1 wrote:
>> BGB wrote:
>>
>>> On 4/20/2024 5:03 PM, MitchAlsup1 wrote:
>>> Like, in-order superscalar isn't going to do crap if nearly every 
>>> instruction depends on every preceding instruction. Even pipelining 
>>> can't help much with this.
>>
>> Pipelining CREATED this (back to back dependencies). No amount of
>> pipelining can eradicate RAW data dependencies.
>>
>>> The compiler can shuffle the instructions into an order to limit the 
>>> number of register dependencies and better fit the pipeline. But, 
>>> then, most of the "hard parts" are already done (so it doesn't take 
>>> much more for the compiler to flag which instructions can run in 
>>> parallel).
>>
>> Compiler scheduling works for exactly 1 pipeline implementation and
>> is suboptimal for all others.
> 
> Well, yeah.
> 
> OTOH, if your (definitely not my!) compiler can schedule a 4-wide static 
> ordering of operations, then it will be very nearly optimal on 2-wide 
> and 3-wide as well. (The difference is typically in a bit more loop 
> setup and cleanup code than needed.)
> 
> Hand-optimizing Pentium asm code did teach me to "think like a cpu", 
> which is probably the only part of the experience which is still kind of 
> relevant. :-)
> 

Mine is hard-pressed to even make effective use of the current pipeline, 
so going wider does not make sense at present.

As I had noted before, the main merit of 3 wide in my case is that it 
makes it easier to justify a 6R register file, which, unlike the 4R 
register file, doesn't choke up with trying to run other instructions in 
parallel with memory store and similar (which is actually a fairly 
serious restriction given how much memory operations tend to clog up 
Lane 1; opportunities for "ALU|ST" being more common than "ALU|ALU").

Granted, one could argue that (Reg, Disp) memory addressing could be 
supported entirely within a 2R1W pattern, which while true in premise, 
does not match my implementation (which always uses indexed addressing 
internally, treating the Disp as a virtual register; thus eating 3 
register ports).

Well, and for the 4R2W configuration, the main priority is minimizing 
LUT cost (which favors leaving it as-is, with the current restrictions).

Granted, some similar issues apply to 128-bit MOV.X and SIMD ops, which 
as-is can only exist as scalar ops. These could potentially also be 
hacked around (say, to allow ALU|SIMD or ALU|MOV.X, but the "fix" would 
cost a lot of LUTs). Mostly in that variability in terms of input 
routing does not come cheap.

Though, that said, the 3rd lane still gets used for a share of basic ALU 
instructions, so isn't entirely going to waste either.

> Terje
>