Article <vbchiv$cde4$1@dont-email.me>

Deutsch English Français Italiano
<vbchiv$cde4$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Thu, 5 Sep 2024 17:09:19 +0200
Organization: A noiseless patient Spider
Lines: 227
Message-ID: <vbchiv$cde4$1@dont-email.me>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at>
 <memo.20240830164247.19028y@jgd.cix.co.uk> <vasruo$id3b$1@dont-email.me>
 <2024Aug30.195831@mips.complang.tuwien.ac.at> <vat5ap$jthk$2@dont-email.me>
 <vaunhb$vckc$1@dont-email.me> <vautmu$vr5r$1@dont-email.me>
 <2024Aug31.170347@mips.complang.tuwien.ac.at> <vavpnh$13tj0$2@dont-email.me>
 <vb2hir$1ju7q$1@dont-email.me> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com>
 <vb4amr$2rcbt$1@dont-email.me> <2024Sep5.133102@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 05 Sep 2024 17:09:20 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="622309b8b3e203ac6e38dece30d12fb2";
	logging-data="406980"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+WpPDzgBRjlXAwxkiPaWZRTKf3pzq2P7g="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:ZpGXfdVPAPtqVUn4F8zSaQUHdYI=
Content-Language: en-GB
In-Reply-To: <2024Sep5.133102@mips.complang.tuwien.ac.at>
Bytes: 12105

On 05/09/2024 13:31, Anton Ertl wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> Anton writes code that seriously pushes the boundary of what can be
>> achieved.  For at least some of the things he does (such as GForth) he
>> is trying to squeeze every last drop of speed out of the target.  And he
>> is /really/ good at it.  But that means he is forever relying on nuances
>> about code generation.  His code, at least for efficiency if not for
>> correctness, is dependent on details far beyond what is specified and
>> documented for C and for the gcc compiler.  He might spend a long time
>> working with his code and a version of gcc, fine-tuning the details of
>> his source code to get out exactly the assembly he wants from the
>> compiler.
> 
> No.  We distribute Gforth as source code.  It works for a wide variety
> of architectures and compilers.  So unlike what you suggest and what
> some people have suggested earlier to avoid problems with new
> "optimizations" in newer releases of gcc, we don't concentrate on a
> specific version of gcc.
> 

OK.

>> Of course it is frustrating for him when the next version of
>> gcc generates very different assembly from that same source, but he is
>> not really programming at the level of C, and he should not expect
>> consistency from C compilers like he does.
> 
> It's normal and no problem when the next version of gcc generates
> different assembly language.  There are some basic assumptions that
> our code relies on, and that mostly does not change between gcc
> versions.
> 

As long as you are sticking to defined behaviour (defined by the C 
standards, or by the gcc documentation), and use specified C standard 
versions in the build, then there should not be any incorrect behaviour 
in different versions.  Performance might regress, and of course there's 
always the risk of bugs.

> An essential assumption is that, when we have:
> 
> A:
>    C code
> B:
> 
> ... that when we do &&A and &&B (which is documented in the GNU C
> manual), we get the addresses pointing to the start and end of the
> machine code corresponding to the C code.

I don't see anything in the gcc reference manual suggesting that &&B is 
the end of the corresponding code.  What you get - all you get - is that 
"goto * &&A" gives the same effect as "goto A".

>  In the days starting with
> gcc-3.0, we found that gcc started reordering the basic blocks within
> loops, so replaced loops in the part of the code that needs such
> assumptions into separate functions.  Around gcc-7, gcc started to
> compile
> 
> A: C-code1
> B: C-code2
> C: goto *...
> 
> to the same code as
> 
> A: C-code1; C-code2; goto *...;
> B: C-code2; goto *...;
> C: goto *...;
> 
> I found a workaround that avoids this kind of code generation.

This is all the kind of thing you can expect when you make assumptions 
about code generation that are not supported by the documentation. 
Compilers can, and do, move code around in various ways, duplicate it, 
combine it, unroll it, compress it - whatever gives (or tries to give - 
optimisation is not an exact science) better results while giving the 
documented behaviour.

I too have written code that relies on being able to identify the start 
and end of certain bits of code - typically for microcontrollers where 
you want some bits of code (like flash programming routines or very 
timing critical interrupt code) put in ram rather than flash.  Sometimes 
that can be done with compiler extensions, sometimes it takes extra 
flags, linker file magic, or other messing around.  But it's not 
something I would expect to be portable, and it needs confirmed for 
every compiler version and selection of flags used.  (I realise that 
this is a vastly simpler task for the kind of work I do than for an open 
source project!)

> 
> Another problem from gcc-3.1 to at least gcc-4.4 (intermittently) is
> that gcc compiled
> 
> goto *ca;
> 
> into the equivalent of
> 
> goto gotoca;
> 
> /* and elsewhere */
> gotoca: goto *ca;
> 
> We reported that repeatedly.  At one point a gcc maintainer gave us
> some bullshit about a possible performance advantage from this
> transformation, of course without presenting any empirical support,
> while we saw a big slowdown on our code.  We developed workarounds for
> that, and they are in Gforth to this day, even though we have not
> encountered a new gcc version with this problem for over a decade, but
> new Gforth should also work on old gcc.
> 

Again, the compiler is not doing anything outside its specifications. 
What you want here is a guarantee of behaviour that is not defined 
anywhere.  You are not seeing a bug in the compiler, or an 
incompatibility with previous versions - you are seeing the need for a 
feature (and a controlling compiler flag) that gcc does not currently 
have.  It's a potential feature that might be useful to other people 
too, while being an anti-feature to others.

> Another assumption is that when we concatenate the code snippet
> between label A and B (which contains C-code1) and the code snippet
> between label X and Y (which contains C-code3), executing the result
> will behave like the concatenation of C-code1 and C-code3 in source
> code.  This assumption has two aspects:
> 
> 1) Do the register assignments at the labels fit together.  It turns
>     out that we never had a problem with that, and I think that the
>     reason for that is that the "goto *" can jump to any of those
>     labels (all their addresses are taken), and so the register
>     assignment must be the same right after each label.
>     
>     What guarantees that the assignments are the same right before each
>     label?  Probably that after the label, there is not much between
>     the label and the next goto*, and that makes all registers at
>     potential targets live.
> 
> 2) If we have two pieces of machine code produced in this fashion,
>     does the architecture guarantee that such a concatenation works?
>     It turns out that in general-purpose architectures, all-but-one do.
>     That includes IA-64.  The exception is MIPS with its architectural
>     load-delay slot (and there are also scheduling restrictions having
>     to do with the hilo register that may be problematic): the first
>     code snippet may end in a load, and the next code snippet may start
>     with an instruction that reads the result of the load.  So we just
>     disabled this concatenation on MIPS.
> 
> We do a number of things to achieve stability: We do sanity-checking
> on the resulting machine code snippets and fall back to plain threaded
> code if the snippets turn out not to be relocatable.
> 
> Also, we enable all the flags for defining behaviour in gcc that we
> find (unfortunately, in the documentation they are intermixed with
> other options).  For good measure, this includes
> -fno-delete-null-pointer-checks, although I doubt that it makes a
> difference for our code either way.
> 

(-fno-delete-null-pointer-checks will make no difference to code that 
doesn't accidentally use leap-before-you-look checking.)

There are certainly a few cases (-fno-strict-aliasing is a prime 
example) where flags are documented as disabling optimisations, when 
they are better viewed as adding definitions to the language and would 
be better documented under "Options Controlling C Dialect" or "Options 
for Code Generation Conventions".

> One thing that came up about a year ago was that gcc auto-vectorizes
> adjacent memory accesses on AMD64 (apparently the AMD64 port
> maintainers are unhappy because AMD64 does not have instructions like
> ARM A64's ldp and stp:-), which did not impact correctness, but led to
> worse performance (not just in Gforth; I have also seen it in the
> bubble benchmark from John Hennessy's Stanford small integer
========== REMAINDER OF ARTICLE TRUNCATED ==========