Article <96280554541a8a9b1a29a5cbd5b7c07b@www.novabbs.org>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <96280554541a8a9b1a29a5cbd5b7c07b@www.novabbs.org>

Deutsch English Français Italiano

<96280554541a8a9b1a29a5cbd5b7c07b@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Stealing a Great Idea from the 6600
Date: Wed, 19 Jun 2024 16:11:20 +0000
Organization: Rocksolid Light
Message-ID: <96280554541a8a9b1a29a5cbd5b7c07b@www.novabbs.org>
References: <lge02j554ucc6h81n5q2ej0ue2icnnp7i5@4ax.com> <v02eij$6d5b$1@dont-email.me> <152f8504112a37d8434c663e99cb36c5@www.novabbs.org> <v04tpb$pqus$1@dont-email.me> <v4f5de$2bfca$1@dont-email.me> <jwvzfrobxll.fsf-monnier+comp.arch@gnu.org> <v4f97o$2bu2l$1@dont-email.me> <613b9cb1a19b6439266f520e94e2046b@www.novabbs.org> <v4hsjk$2vk6n$1@dont-email.me> <6b5691e5e41d28d6cb48ff6257555cd4@www.novabbs.org> <v4tfu3$1ostn$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="510610"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="7opjq6o0gOhusEORo6KGlWDqrGdcQlz3IQ8pYKMWkuY";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Rslight-Site: $2y$10$Z7KFZN3eDp.aIXEKfhzbsuVIskjG1ydm823PTXkDTV4OOv.qTqYJC
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 4558
Lines: 99

BGB wrote:

> On 6/18/2024 4:09 PM, MitchAlsup1 wrote:
>> BGB wrote:
>> 
>>> On 6/13/2024 3:40 PM, MitchAlsup1 wrote:

>>> In this case, scheduling as-if it were an in-order core was leading to 
>>> better performance than a more naive ordering (such as directly using 
>>> the results of previous instructions or memory loads, vs shuffling
>>> other
>>>
>>> instructions in between them).
>> 
>>> Either way, seemed to be different behavior than seen on either the 
>>> Ryzen or on Intel Core based CPUs (where, seemingly, the CPU does not 
>>> care about the relative order).
>> 
>> Because it had no requirement of code scheduling, unlike 1st generation
>> 
>> RISCs, so the cores were designed to put up good performance scores 
>> without any code scheduling.
>> 

> Yeah, but why was Bulldozer/Piledriver seemingly much more sensitive to
> 
> instruction scheduling issues than either its predecessors (such as the
> 
> Phenom II) and successors (Ryzen)?...

They "blew" the microarchitecture.

It was a 12-gate machine (down from 16-gates from Athlon). this puts 
a "lot more stuff" on critical paths and some forwarding was not done,
particularly change in size between produced result and consumed
operand.

> Though, apparently "low IPC" was a noted issue with this processor 
> family (apparently trying to gain higher clock-speeds at the expense of
> 
> IPC; using a 20-stage pipeline, ...).

> Though, less obvious how having a longer pipeline than either its 
> predecessors or successors would effect instruction scheduling.

> 
>> 
>> One of the things we found in Mc 88120 was that the compiler should
>> NEVER
>> be allowed to put unnecessary instructions in decode-execute slots that
>> were unused--and that almost invariable--the best code for the GBOoO 
>> machine was almost invariably the one with the fewest instructions, and
>> if several sequences had equally few instructions, it basically did not
>> matter.
>> 
>> For example::
>> 
>>      for( i = 0; i < max, i++ )
>>           a[i] = b[i];
>> 
>> was invariably faster than::
>> 
>>      for( ap = &a[0], bp = & b[0];, i = 0; i < max; i++ )
>>           *ap++ = *bp++;
>> 
>> because the later has 3 ADDs in the loop wile the former has but 1.
>> Because of this, I altered my programming style and almost never end up
>> using ++ or -- anymore.



> In this case, it would often be something more like:
>    maxn4=max&(~3);
>    for(i=0; i<maxn4; i+=4)
>    {
>      ap=a+i;    bp=b+i;
>      t0=ap[0];  t1=ap[1];
>      t2=ap[2];  t3=ap[3];
>      bp[0]=t0;  bp[1]=t1;
>      bp[2]=t2;  bp[3]=t3;
>    }
>    if(max!=maxn4)
>    {
>      for(; i < max; i++ )
>        a[i] = b[i];
>    }

That is what VVM does, without you having to lift a finger.

> If things are partially or fully unrolled, they often go faster.

And ALWAYS eat more code space.

>                                                                 Using a
> large number of local variables seems to be effective (even in cases 
> where the number of local variables exceeds the number of CPU
> registers).

> Generally also using as few branches as possible.
> Etc...