Article <v4f97o$2bu2l$1@dont-email.me>

Deutsch English Français Italiano
<v4f97o$2bu2l$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Stealing a Great Idea from the 6600
Date: Thu, 13 Jun 2024 12:10:04 -0500
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <v4f97o$2bu2l$1@dont-email.me>
References: <lge02j554ucc6h81n5q2ej0ue2icnnp7i5@4ax.com>
 <v02eij$6d5b$1@dont-email.me>
 <152f8504112a37d8434c663e99cb36c5@www.novabbs.org>
 <v04tpb$pqus$1@dont-email.me> <v4f5de$2bfca$1@dont-email.me>
 <jwvzfrobxll.fsf-monnier+comp.arch@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 13 Jun 2024 19:11:21 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f62054195b0d37eeb9e8775bc23deeaa";
	logging-data="2488405"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18KfpyyPYRI1sCX3Kc9nNxNKOM5RHL1q9k="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:oVfIL0x6BinRSIAzShDCrdQw3Pw=
In-Reply-To: <jwvzfrobxll.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US
Bytes: 2816

On 6/13/2024 11:52 AM, Stefan Monnier wrote:
>> This is a late reply, but optimal static ordering for N-wide may be
>> very non-optimal for N-1 (or N-2, etc.).  As an example, assume a perfectly
> 
> AFAICT Terje was talking about scheduling for OoO CPUs, and wasn't
> talking about the possible worst case situations, but about how things
> usually turn out in practice.
> 
> For statically-scheduled or in-order CPUs, it can be indeed more
> difficult to generate code that will run (almost) optimally on a variety
> of CPUs.
> 

Yeah, you need to know the specifics of the pipeline for either optimal 
machine code (in-order superscalar) or potentially to be able to run at 
all (LIW / VLIW).


That said, on some OoO CPU's, such as when I was running a Piledriver 
based core, it did seem as if things were scheduled to assume an 
in-order CPU (such as putting other instructions between memory loads 
and the instructions using the results, etc), it did perform better 
(seemingly implying there are limits to the OoO magic).


Though, OTOH, a lot of the sorts of optimization tricks I found for the 
Piledriver were ineffective on the Ryzen, albeit mostly because the more 
generic stuff caught up.

For example, I had an LZ compressor that was faster than LZ4 on that CPU 
(it was based around doing everything in terms of aligned 32-bit dwords, 
gaining speed at the cost of worse compression), but then when going 
over to the Ryzen, LZ4 got faster...

Like, seemingly all my efforts in "aggressively optimizing" some things 
became moot simply by upgrading my PC.

....