Article <104qj4o$1drn4$1@dont-email.me>

Deutsch English Français Italiano
<104qj4o$1drn4$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: "The provenance memory model for C", by Jens Gustedt
Date: Fri, 11 Jul 2025 10:48:23 +0200
Organization: A noiseless patient Spider
Lines: 321
Message-ID: <104qj4o$1drn4$1@dont-email.me>
References: <87o6u343y3.fsf@gmail.com> <20250702025125.969@kylheku.com>
 <104kkp3$anl$1@dont-email.me> <104ldg8$5f8m$1@dont-email.me>
 <104n8he$lb42$1@dont-email.me> <104o1f2$pkvi$1@dont-email.me>
 <104proo$15qjv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 11 Jul 2025 10:48:27 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="1eaeeb7db17774036606f1e71e09347b";
	logging-data="1502948"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+Lm4hDAkePSK+hBKUjCn20GBiRuaTcdzY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:yJimsCE7Mk5M5ckD9Vafc2dtPfc=
Content-Language: en-GB
In-Reply-To: <104proo$15qjv$1@dont-email.me>

On 11/07/2025 04:09, BGB wrote:
> On 7/10/2025 4:34 AM, David Brown wrote:
>> On 10/07/2025 04:28, BGB wrote:
>>> On 7/9/2025 4:41 AM, David Brown wrote:
>>>> On 09/07/2025 04:39, BGB wrote:
>>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>>
>>>> ...

>>
>> Please don't call this "traditional behaviour" of compilers - be 
>> honest, and call it limited optimisation and dumb translation.  And 
>> don't call it "code that assumes traditional behaviour" - call it 
>> "code written by people who don't really understand the language".  
>> Code which assumes you can do "extern float x; unsigned int * p = 
>> (unsigned int *) &x;" is broken code.  It always has been, and always 
>> will be - even if it does what the programmer wanted on old or limited 
>> compilers.
>>
>> There were compilers in the 1990's that did type-based alias analysis, 
>> and many other "modern" optimisations - I have used at least one.
>>
> 
> Either way, MSVC mostly accepts this sorta code.

I remember reading in a MSVC blog somewhere that they had no plans to 
introduce type-based alias analysis in the compiler.  The same blog 
article announced their advanced new optimisations that treat signed 
integer overflow as undefined behaviour and explained that they'd being 
doing that for years in a few specific cases.  I think it is fair to 
assume there is a strong overlap between the programmers who think MSVC, 
or C and C++ in general, have two's complement wrapping of signed 
integers when the hardware supports it, as those who think pointer casts 
let you access any data.

And despite the blog, I don't believe MSVC will be restricted that way 
indefinitely.  After all, they encourage the use of clang/llvm for C 
programming, and that does do type-based alias analysis and optimisation.

The C world is littered with code that "used to work" or "works when 
optimisation is not used" because it relied on shite like this - 
unwarranted assumptions about limitations in compiler technology.

> 
> Also I think a lot of this code was originally written for compilers 
> like Watcom C and similar.
> 
> 
> Have noted that there are some behavioral inconsistencies, for example:
> Some old code seems to assumes that x<<y, y always shifts left but 
> modulo to the width of the type. Except, when both x and y are constant, 
> code seems to expect it as if it were calculated with a wider type, and 
> where negative shifts go in the opposite direction, ... with the result 
> then being converted to the final type.
> 
> Meanwhile, IIRC, GCC and Clang raise an error if trying to do a large or 
> negative shift. MSVC will warn if the shift is large or negative.
> 
> Though, in most cases, if the shift is larger than the width of the 
> type, or negative, it is usually a programming error.
> 
> 
>> It's okay to be conservative in a compiler (especially when high 
>> optimisation is really difficult!).  It's okay to have command-line 
>> switches or pragmas to support additional language semantics such as 
>> supporting access via any lvalue type, or giving signed integer 
>> arithmetic two's complement wrapping behaviour.  It's okay to make 
>> these the defaults.
>>
>> But it is not okay to encourage code to make these compiler-specific 
>> assumptions without things like a pre-processor check for the specific 
>> compiler and pragmas to explicitly set the required compiler switches. 
>> It is not okay to excuse bad code as "traditional style" - that's an 
>> insult to people who have been writing good C code for decades.
>>
> 
> A lot of the code I have seen from the 90s was written this way.
> 

Yes.  A lot code from the 90's was written badly.  A lot of code today 
is written badly.  Just because a lot of code was, and still is, written 
that way does not stop it being bad code.

> 
> Though, a lot of it comes from a few major sources:
>    id Software;
>      Can mostly be considered "standard" practice,
>      along with maybe Linux kernel, ...
>    Apogee Software
>      Well, some of this code is kinda bad.
>      This code tends to be dominated by global variables.
>      Also treating array bounds as merely a suggestion.
>    Raven Software
>      Though, most of this was merely modified ID Software code.
> 
> Early on, I think I also looked a fair bit at the Linux kernel, and also 
> some of the GNU shell utilities and similar (though, the "style" was 
> very different vs either the Linux kernel or ID code).
> 

The Linux kernel is not a C style to aspire to.  But they do at least 
try to make such assumptions explicit - the kernel build process makes 
it very clear that it requires the "-fno-strict-aliasing" flag and can 
only be correctly compiled by a specific range of gcc versions (and I 
think experimentally, icc and clang).  Low-level and systems programming 
is sometimes very dependent on the details of the targets, or the 
details of particular compilers - that's okay, as long as it is clear in 
the code and the build instructions.  Then the code (or part of it at 
least) is not written in standard C, but in gcc-specific C or some other 
non-standard dialect.  It is not, however, "traditional C".


> 
> Early on, I had learned C partly by tinkering around with id's code and 
> trying to understand what secrets it contained.
> 
> 
> But, alas, an example from Wikipedia shows a relevant aspect of id's style:
> https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
> 
> Which is, at least to me, what I consider "traditional".

The declaration of all the variables at the top of the function is 
"traditional".  The reliance on a specific format for floating point is 
system-dependent code (albeit one that works on a great many systems). 
The use of "long" for a 32-bit integer is both "traditional" /and/ 
system-dependent.  (Though it is possible that earlier in the code there 
are pre-processor checks on the size of "long".)  The use of signed 
integer types for bit manipulation is somewhere between "traditional" 
and "wrong".  The use of pointer casts instead of a type-punning union 
is wrong.  The lack of documentation and comments, use of an unexplained 
magic number, and failure to document or comment the range for which the 
algorithm works and its accuracy limitations are also very traditional - 
a programming tradition that remains strong today.

It is worth remembering that game code (especially commercial game code) 
is seldom written with a view to portability, standard correctness, or 
future maintainability.  It is written to be as fast as possible using 
the compiler chosen at the time, to be build and released as a binary in 
the shortest possible time-to-market.

>>>
>>> So:
>>>    memcpy(&i, &f, 8);
>>> Will still use memory ops and wreck the performance of both the i and 
>>> f variables.
>>
>> Well, there you have scope for some useful optimisations (more useful 
>> than type-based alias analysis).  memcpy does not need to use memory 
>> accesses unless real memory accesses are actually needed to give the 
>> observable effects specified in the C standards.
>>
> 
> Possibly, but by the stage we know that it could be turned into a 
> reg-reg move (in the final code generation), most of the damage has 
> already been done.
> 
> Basically, it would likely be necessary to detect and special case this 
> scenario at the AST level(probably by turning it into a cast or 
> intrinsic). But, usually one doesn't want to add too much of this sort 
> of cruft to the AST walk.
> 

One thing to remember is that functions like "memcpy" don't have to be 
treated as normal functions.  You can handle it as a keyword in your 
compiler if that's easiest.  You can declare it as a macro in your 
<strings.h>.  You can combine these, and have compiler-specific 
extensions (keywords, attributes, whatever) and have the declaration as 
a function with attributes.  Your key aim is to spot cases where there 
is a small compile-time constant on the size of the memcpy.

> 
> But, then, apart from code written to assume GCC or similar, most of the 
> code doesn't use memcpy in this way.
> 
========== REMAINDER OF ARTICLE TRUNCATED ==========