Article <104o1f2$pkvi$1@dont-email.me>

Deutsch English Français Italiano
<104o1f2$pkvi$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: "The provenance memory model for C", by Jens Gustedt
Date: Thu, 10 Jul 2025 11:34:26 +0200
Organization: A noiseless patient Spider
Lines: 251
Message-ID: <104o1f2$pkvi$1@dont-email.me>
References: <87o6u343y3.fsf@gmail.com> <20250702025125.969@kylheku.com>
 <104kkp3$anl$1@dont-email.me> <104ldg8$5f8m$1@dont-email.me>
 <104n8he$lb42$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 10 Jul 2025 11:34:28 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a23cd98e8c9b44ef5e2a0812e485fa75";
	logging-data="840690"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18ac9oYWRuPvZIPk2C34xIBOf2GuvsZE80="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:zgjXW7VRZKjIX+g4xKB57maH/y8=
In-Reply-To: <104n8he$lb42$1@dont-email.me>
Content-Language: en-GB

On 10/07/2025 04:28, BGB wrote:
> On 7/9/2025 4:41 AM, David Brown wrote:
>> On 09/07/2025 04:39, BGB wrote:
>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>
>> ...
>>
>> There have been plenty of papers and blogs written about pointer 
>> provenance (several by Gustedt) and how it could work.  It's not a 
>> very easy thing to follow in any format.  A patch to current C 
>> standards is perhaps the least easy to follow, but it is important for 
>> how the concept could be added to C.
>>
> 
> Admittedly, as of yet, I haven't quite figured out what exactly 
> provenance is supposed to be, or how it is supposed to work in practice.
> 

I've read a bit, but I think it would take quite an effort to understand 
the details.

As a compiler user (albeit one with an interest in compilers and code 
generation), rather than a compiler developer, my attitude to writing C 
code will be the same if and when pointer provenance becomes part of the 
C model and C compiler optimisations - don't lie to your compiler.  If 
you want to do weird stuff behind the compiler's back (and that is 
certainly possible in embedded development), use "volatile" accesses in 
the right places.  So for me, in practical use, pointer provenance will 
simply mean that the compiler can do a bit more optimisation with less 
manual work - and that's a nice thing.  (I'll still be interested in how 
it works, but that's for fun, not for real work.)

>>>>
>>>> If you think that certain code could go faster because certain 
>>>> suspected
>>>> aliasing isn't actually taking place, then since C99 you were able to
>>>> spin the roulette wheel and use "restrict".
>>>>
>>
>> "restrict" can certainly be useful in some cases.  There are also 
>> dozens of compiler extensions (such as gcc attributes) for giving the 
>> compiler extra information about aliasing.
>>
> 
> And, the annoyance of them being compiler dependent...

Sure.  "restrict" is, of course, not compiler dependent - but the effect 
it has on optimisation is compiler dependent.

Often you can also get improved results by manually "caching" data in 
local variables, instead of using pointer or array access directly, thus 
avoiding any extra memory accesses the compiler has to put in just in 
case pointers alias.  But code is neater if you don't have to do that 
kind of thing.

> 
> 
>>>> So the aliasing analysis and its missed opportunities are the
>>>> programmer's responsibility.
>>>>
>>>> It's always better for the machine to miss opportunities than to miss
>>>> compile. :)
>>>>
>>>
>>> Agreed.
>>
>> It is always better for the toolchain to be able to optimise 
>> automatically than to require manual intervention by the programmer. 
>> (It should go without saying that optimisations are only valid if they 
>> do not affect the observable behaviour of correct code.)  Programmers 
>> are notoriously bad at figuring out what will affect their code 
>> efficiency, and will either under-use "restrict" where it could 
>> clearly be safely used to speed up code, or over-use it resulting in 
>> risky code.
>>
>> If the compiler can't be sure that accesses don't alias, then of 
>> course it should assume that aliasing is possible.
>>
>> The idea of pointer provenance is to let compilers (and programmers!) 
>> have a better understanding of when accesses are guaranteed to be 
>> alias- free, when they are guaranteed to be aliasing, and when there 
>> are no guarantees.  This is useful for optimisation and program 
>> analysis (including static error checking).  The more information the 
>> compiler has, the better.
>>
> 
> That is the idea at least.
> 
> Though, if one assumes the compiler has non-local visibility, this is a 
> problem.
> 
> Granted, as long as one can keep using more traditional semantics, 
> probably OK.

Of course compilers can (and must!) fall back to the "assume accesses 
might alias" approach when they don't have the extra information.  But 
at least for code in the same compilation, they can do better.

And there is a trend amongst those wanting higher performance to use 
link-time optimisation, whole-program optimisation, or similarly named 
techniques to share information across units.  Traditional separate 
compilation to object files then linking by identifier name only is a 
nice clear model, but hugely limiting for both optimisation and static 
error checking.

> 
> 
>>>
>>> In my compiler, the default was to use a fairly conservative aliasing 
>>> strategy.
>>>
>> ...
>>> With pointer operations, all stores can be assumed potentially 
>>> aliasing unless restrict is used, regardless of type.
>>>
>>
>> C does not require that.  And it is rare in practice, IME, for code to 
>> actually need to access the same data through different lvalue types 
>> (other than unsigned char).  It is rarer still for it not to be 
>> handled better using type-punning unions or memcpy() - assuming the 
>> compiler handles memcpy() decently.
>>
> 
> I take a conservative approach because I want the compiler to be able to 
> run code that assumes traditional behavior (like that typical of 1990s 
> era compilers, or MSVC).

Please don't call this "traditional behaviour" of compilers - be honest, 
and call it limited optimisation and dumb translation.  And don't call 
it "code that assumes traditional behaviour" - call it "code written by 
people who don't really understand the language".  Code which assumes 
you can do "extern float x; unsigned int * p = (unsigned int *) &x;" is 
broken code.  It always has been, and always will be - even if it does 
what the programmer wanted on old or limited compilers.

There were compilers in the 1990's that did type-based alias analysis, 
and many other "modern" optimisations - I have used at least one.

It's okay to be conservative in a compiler (especially when high 
optimisation is really difficult!).  It's okay to have command-line 
switches or pragmas to support additional language semantics such as 
supporting access via any lvalue type, or giving signed integer 
arithmetic two's complement wrapping behaviour.  It's okay to make these 
the defaults.

But it is not okay to encourage code to make these compiler-specific 
assumptions without things like a pre-processor check for the specific 
compiler and pragmas to explicitly set the required compiler switches. 
It is not okay to excuse bad code as "traditional style" - that's an 
insult to people who have been writing good C code for decades.


> 
> Granted, it is a tradeoff that a lot of this code needs to be modified 
> to work on GCC and Clang (absent the usual need for "-fwrapv 
> -fno-strict-aliasing" options).
> 
> Granted, there is a command-line option to enable TBAA semantics, just 
> it is not the default option in this case (so, in BGBCC, TBAA is opt-in; 
> rather than opt-out in GCC and Clang).
> 
> BGBCC's handling of memcpy is intermediate:
> It can turn it into loads and stores;
> But, it can't turn it into a plain register move;
> Taking the address of a variable will also cause the variable to be 
> loaded/stored every time it is accessed in this function (regardless of 
> where it is accessed in said function).
> 
> So:
>    memcpy(&i, &f, 8);
> Will still use memory ops and wreck the performance of both the i and f 
> variables.

Well, there you have scope for some useful optimisations (more useful 
than type-based alias analysis).  memcpy does not need to use memory 
accesses unless real memory accesses are actually needed to give the 
========== REMAINDER OF ARTICLE TRUNCATED ==========