| Deutsch English Français Italiano |
|
<104proo$15qjv$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: "The provenance memory model for C", by Jens Gustedt
Date: Thu, 10 Jul 2025 21:09:26 -0500
Organization: A noiseless patient Spider
Lines: 556
Message-ID: <104proo$15qjv$1@dont-email.me>
References: <87o6u343y3.fsf@gmail.com> <20250702025125.969@kylheku.com>
<104kkp3$anl$1@dont-email.me> <104ldg8$5f8m$1@dont-email.me>
<104n8he$lb42$1@dont-email.me> <104o1f2$pkvi$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 11 Jul 2025 04:09:29 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c29649ad5fabfbf91ab208d0a52f2944";
logging-data="1239679"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/icZSnIKzIao4RzXhbcpoewOppKudu9CA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:aq81lzJVzlQtAXT1L5/xAlgMK5E=
In-Reply-To: <104o1f2$pkvi$1@dont-email.me>
Content-Language: en-US
On 7/10/2025 4:34 AM, David Brown wrote:
> On 10/07/2025 04:28, BGB wrote:
>> On 7/9/2025 4:41 AM, David Brown wrote:
>>> On 09/07/2025 04:39, BGB wrote:
>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>
>>> ...
>>>
>>> There have been plenty of papers and blogs written about pointer
>>> provenance (several by Gustedt) and how it could work. It's not a
>>> very easy thing to follow in any format. A patch to current C
>>> standards is perhaps the least easy to follow, but it is important
>>> for how the concept could be added to C.
>>>
>>
>> Admittedly, as of yet, I haven't quite figured out what exactly
>> provenance is supposed to be, or how it is supposed to work in practice.
>>
>
> I've read a bit, but I think it would take quite an effort to understand
> the details.
>
> As a compiler user (albeit one with an interest in compilers and code
> generation), rather than a compiler developer, my attitude to writing C
> code will be the same if and when pointer provenance becomes part of the
> C model and C compiler optimisations - don't lie to your compiler. If
> you want to do weird stuff behind the compiler's back (and that is
> certainly possible in embedded development), use "volatile" accesses in
> the right places. So for me, in practical use, pointer provenance will
> simply mean that the compiler can do a bit more optimisation with less
> manual work - and that's a nice thing. (I'll still be interested in how
> it works, but that's for fun, not for real work.)
>
Probably true, but I am also thinking as a compiler developer.
Granted, arguably my compiler isn't great, but this is a different issue.
>>>>>
>>>>> If you think that certain code could go faster because certain
>>>>> suspected
>>>>> aliasing isn't actually taking place, then since C99 you were able to
>>>>> spin the roulette wheel and use "restrict".
>>>>>
>>>
>>> "restrict" can certainly be useful in some cases. There are also
>>> dozens of compiler extensions (such as gcc attributes) for giving the
>>> compiler extra information about aliasing.
>>>
>>
>> And, the annoyance of them being compiler dependent...
>
> Sure. "restrict" is, of course, not compiler dependent - but the effect
> it has on optimisation is compiler dependent.
>
I was unclear, I meant more about the use of GCC attributes and similar
being compiler dependent.
> Often you can also get improved results by manually "caching" data in
> local variables, instead of using pointer or array access directly, thus
> avoiding any extra memory accesses the compiler has to put in just in
> case pointers alias. But code is neater if you don't have to do that
> kind of thing.
>
This is often sort of what I end up doing anyways, because manually
caching stuff in local variables and being selective about when things
are loaded or stored to external locations, is often better for
performance in MSVC.
Doesn't make so much difference with GCC though.
But, for native Windows, I am primarily using MSVC.
For BGBCC, yeah best case performance is manually caching things,
manually unrolling or modulo scheduling loops, and trying to organize
expressions such that results are not reused too quickly. Often
typically breaking up complex expressions into multiple simpler ones so
as to limit dependencies. Also avoiding any expensive operations
whenever possible (divide, modulo, or 64 bit multiply or similar), ...
Such a style tends to look kinda like one is writing assembler in C, but
can often perform OK.
>>
>>
>>>>> So the aliasing analysis and its missed opportunities are the
>>>>> programmer's responsibility.
>>>>>
>>>>> It's always better for the machine to miss opportunities than to miss
>>>>> compile. :)
>>>>>
>>>>
>>>> Agreed.
>>>
>>> It is always better for the toolchain to be able to optimise
>>> automatically than to require manual intervention by the programmer.
>>> (It should go without saying that optimisations are only valid if
>>> they do not affect the observable behaviour of correct code.)
>>> Programmers are notoriously bad at figuring out what will affect
>>> their code efficiency, and will either under-use "restrict" where it
>>> could clearly be safely used to speed up code, or over-use it
>>> resulting in risky code.
>>>
>>> If the compiler can't be sure that accesses don't alias, then of
>>> course it should assume that aliasing is possible.
>>>
>>> The idea of pointer provenance is to let compilers (and programmers!)
>>> have a better understanding of when accesses are guaranteed to be
>>> alias- free, when they are guaranteed to be aliasing, and when there
>>> are no guarantees. This is useful for optimisation and program
>>> analysis (including static error checking). The more information the
>>> compiler has, the better.
>>>
>>
>> That is the idea at least.
>>
>> Though, if one assumes the compiler has non-local visibility, this is
>> a problem.
>>
>> Granted, as long as one can keep using more traditional semantics,
>> probably OK.
>
> Of course compilers can (and must!) fall back to the "assume accesses
> might alias" approach when they don't have the extra information. But
> at least for code in the same compilation, they can do better.
>
> And there is a trend amongst those wanting higher performance to use
> link-time optimisation, whole-program optimisation, or similarly named
> techniques to share information across units. Traditional separate
> compilation to object files then linking by identifier name only is a
> nice clear model, but hugely limiting for both optimisation and static
> error checking.
>
Ironically, my compiler doesn't do traditional separate compilation in
the first place.
The frontend (separate stages) are basically:
Preprocess;
Parse to AST;
Flatten AST to a Stack-Based IL.
The handling of value caching and similar is basically in the 3AC stage,
which doesn't exist until after we are generating the final binary.
If you try to produce object files, or static libraries, in this case,
they are basically just blobs of this stack-oriented bytecode.
The behavior of the IL is sort of part-way between that of JVM and .NET
bytecode. Though, has some structural aspects in common with WASM (eg,
more monolothic bytecode blob rather than structures and tables), but
the bytecode semantics are more like those of .NET bytecode.
Where, say:
JVM:
".class" files, each representing a single class type;
Has tables for literals, fields, and methods;
Each table contains various structures for each item.
Bytecode operations have explicit types;
Member and method types, etc, are identified by ASCII strings;
Handles internal control flow with byte-based branch offsets;
Stack offset to be the same at all paths to a given spot.
However, it is not always obvious what the stack-offset is.
.NET:
Has various tables embedded in a COFF or PE/COFF;
Has tables organized similar to those in a relational database;
Uses a dense packing scheme for table contents.
Bytecode operations types are carried implicitly on the stack;
Type signatures are encoded as binary blobs;
========== REMAINDER OF ARTICLE TRUNCATED ==========