Article <vb4amr$2rcbt$1@dont-email.me>

Deutsch English Français Italiano
<vb4amr$2rcbt$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!news.in-chemnitz.de!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Mon, 2 Sep 2024 14:22:51 +0200
Organization: A noiseless patient Spider
Lines: 170
Message-ID: <vb4amr$2rcbt$1@dont-email.me>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at>
 <memo.20240830164247.19028y@jgd.cix.co.uk> <vasruo$id3b$1@dont-email.me>
 <2024Aug30.195831@mips.complang.tuwien.ac.at> <vat5ap$jthk$2@dont-email.me>
 <vaunhb$vckc$1@dont-email.me> <vautmu$vr5r$1@dont-email.me>
 <2024Aug31.170347@mips.complang.tuwien.ac.at> <vavpnh$13tj0$2@dont-email.me>
 <vb2hir$1ju7q$1@dont-email.me> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 02 Sep 2024 14:22:52 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="940a970a29053721110c83aed858b8d6";
	logging-data="2994557"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19E9RG+44ybyoXm25PIGZVhNcqzK04lWpg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:f9zFfISR5fdrhS5XABXhFJH7xFg=
Content-Language: en-GB
In-Reply-To: <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com>
Bytes: 11225

On 02/09/2024 06:08, George Neuner wrote:
> On Sun, 1 Sep 2024 22:07:53 +0200, David Brown

> I'm not going to argue about whether UB in code is wrong.  The
> question I have concerns what to do with something that explicitly is
> mentioned as UB in some standard N, but was not addressed in previous
> standards.
> 
> Was it always UB?  Or should it be considered ID until it became UB?

I can't answer for languages other than C and C++ (others might be able 
to compare usefully to, for example, Ada or Fortran).  But the C 
standards explicitly state that behaviours that are not defined in the 
standards are undefined behaviour in exactly the same way as cases that 
are labelled as undefined behaviour, and also cases where the program 
violates a "shall" or "shall not" requirement.

To be clear - the meaning of "undefined behaviour" is simply that no 
behaviour has been defined.  The C standards can say that something is 
"undefined behaviour" (or just fail to give a definition of the 
behaviour) and then the implementation can give a definition of it.  An 
example here would be that the C standards say that signed integer 
arithmetic overflow is undefined behaviour - if you have a signed 
integer operation and the mathematically correct results can't be 
represented in the type, then there is no possible way for the generated 
code to give the correct result.  The C standards therefore leave this 
as "undefined behaviour".  However, if you use "gcc -fwrapv" then the 
behaviour /is/ defined - it is defined as two's complement wrapping.

So if you write C code that overflows signed integer arithmetic and 
relies on given behaviour and results, the code is wrong because it has 
undefined behaviour - you are, at best, relying on luck.  But if you 
write C code with such demands and specify that it is only suitable for 
use with the gcc "-fwrapv" flag, then it is not wrong and there is no 
undefined behaviour because the compiler implementation has given a 
definition of the behaviour.  However, if you use the same code with, 
say, old versions of MSVC then you are back to luck and UB even if that 
compiler does not have optimisations based on knowing that signed 
integer arithmetic overflow is UB.  And it is /your/ fault when the code 
fails on newer versions of MSVC that /do/ have such optimisations.

This is all very different from what the C standards call 
"implementation-defined behaviour".  Such things as how signed integers 
are converted to unsigned integers are explicitly IB in the C standards 
- implementations must define and document the behaviour.

> 
> 
> It does seem to me that as the C standard evolved, and as more things
> have *explicitly* become documented as UB, compiler developers have
> responded largely by dropping whatever the compiler did previously -
> sometimes breaking code that relied on it.
> 

I think that is perhaps partly true, partly a myth, and partly simply a 
side-effect of compilers gaining more optimisations as they are able to 
analyse more code at a time and do more advanced transforms.  The C 
standards have clarified some of the text over time (most people would 
agree there is still plenty of scope for improvement there!).  That can 
include changing some things that were previously undefined by omission 
to being explicitly labelled UB.  I can't think of any examples 
off-hand.  But note that this would not in any way change the meaning of 
the code - UB by omission is the same as explicit UB as far as the C 
language is concerned.  There are very few cases where code was correct 
for original standard C90 (i.e., independent of any IB and independent 
of particular compilers) and is not correct C23 with identical defined 
behaviour.  There were a few things changed between C90 and C99, but I 
don't know of any since then other than a few added keywords that could 
conflict with user identifiers.


It is an unfortunate truth that older C compilers did not do as good a 
job at optimisation as newer ones.  And this meant that many tricks were 
used in order to get efficient results, even those some of these relied 
on UB.  Such code can have different results on different compilers, or 
different sets of options, because there is no definition of what the 
"correct" result should be.  The programmer will have a clear idea of 
what they think is "correct", but it is not defined or specified 
anywhere.  Usually the programmer feels it is "obvious" what the 
intended behaviour is - but "obvious" to a programmer does not mean 
"obvious" to a compiler.  Thus you end up with code that works (as 
intended by the programmer) by testing and good luck with some compilers 
and options, and fails by bad luck on other compilers or options.  The 
compiler didn't "break" the code - the code was broken to start with. 
But it is entirely reasonable and understandable why the programmer 
wrote the "broken" code in the first place, and why it did a useful job 
despite having UB.


So I appreciate when people get frustrated that changes to a tool change 
the apparent behaviour of their code.  But it is important to understand 
the the compiler is not wrong here - it is doing the best job it can for 
people writing correct code.  A development tool should emphasis people 
using it /now/ - and while there is C code in use today that was written 
many decades ago, the majority of C code (and even more so for C++) is 
much more recent.  It would be wrong to limit modern programmers because 
of code written long ago - even more so when there is no clear 
specification of how that old code was supposed to work.


> I have moved on from C (mostly), and I learned long ago to archive
> toolchains and to expect that any new version of a tool might break
> something that worked previously.  I don't like it, but it generally
> doesn't annoy me that much.

This all depends on the kind of code you write, and the kind of system 
you target.  On my embedded targets, most of my code can be written in 
standard C.  But a lot of it also uses at least some gcc extensions to 
improve the code - enhancing static error checking, making it more 
efficient, or making it easier and clearer to write.  I am quite clear 
there that the code is dependent on gcc (it would probably also be fine 
for clang, but I have not checked that).  For all such code, I do my 
utmost to make sure it is correct and safe, with no UB and no IB beyond 
what is obvious and necessary.  Most programs will also contain code 
that is more specifically toolchain-dependent, perhaps with snippets of 
inline assembly, or target-specific features that are needed.  This was 
more of an issue before, when I was using a wider range of compilers.

But for any given project, I stick to a single compiler version and 
usually one set of compiler flags.  For my work, code without C-level UB 
is not enough - I sometimes also need to test for things like run-time 
speed and code size, or interaction with external tools of various 
sorts, or stack usage limits - all things that are outside the scope of C.

However, I don't remember when I last found that portable code that I 
wrote and was working on one compiler failed to have correct C-level 
functionality when compiled with a newer compiler (or flags) due to 
undefined behaviour, new optimisations, or changes in the C standard. 
I've had portability issues with older code due to IB such as writing 
code for a microcontroller with a different size of "int".  I've seen 
issues with third-party code - I've had to compile such code with 
"-fwrapv -fno-strict-aliasing" on occasion.  I've made other mistakes in 
my code.  And I've got UB things wrong in my early days when new to C 
programming.  But truly, I am at a loss to understand why some people 
are so worried about UB in C - you simply need to know the rules and 
specifications for the language features you use, and follow those rules.

> 
> 
> MMV. Certainly Anton's does. ;-)

Anton writes code that seriously pushes the boundary of what can be 
achieved.  For at least some of the things he does (such as GForth) he 
is trying to squeeze every last drop of speed out of the target.  And he 
is /really/ good at it.  But that means he is forever relying on nuances 
about code generation.  His code, at least for efficiency if not for 
correctness, is dependent on details far beyond what is specified and 
documented for C and for the gcc compiler.  He might spend a long time 
working with his code and a version of gcc, fine-tuning the details of 
his source code to get out exactly the assembly he wants from the 
compiler.  Of course it is frustrating for him when the next version of 
gcc generates very different assembly from that same source, but he is 
not really programming at the level of C, and he should not expect 
consistency from C compilers like he does.

> 
> Similar to you (David), I came from a - not embedded per se - but
> kiosk background:  HRT indrustrial QA/QC systems.  I know well the
> attraction of a new compiler yielding better performing code.  I also
> know a large amount of my code was hardware and OS specific, that
> those are the things beyond the scope of the compiler, but they also
> are things that I don't want to have to revisit every time a new
> version of the compiler is released.
> 

Yes.  For this kind of work, you want to keep your build environment 
consistent - no matter how careful you are to write correct code without UB.

> 13 of one, baker's dozen of the other.