| Deutsch English Français Italiano |
|
<104o1f2$pkvi$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.lang.c Subject: Re: "The provenance memory model for C", by Jens Gustedt Date: Thu, 10 Jul 2025 11:34:26 +0200 Organization: A noiseless patient Spider Lines: 251 Message-ID: <104o1f2$pkvi$1@dont-email.me> References: <87o6u343y3.fsf@gmail.com> <20250702025125.969@kylheku.com> <104kkp3$anl$1@dont-email.me> <104ldg8$5f8m$1@dont-email.me> <104n8he$lb42$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 10 Jul 2025 11:34:28 +0200 (CEST) Injection-Info: dont-email.me; posting-host="a23cd98e8c9b44ef5e2a0812e485fa75"; logging-data="840690"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ac9oYWRuPvZIPk2C34xIBOf2GuvsZE80=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cancel-Lock: sha1:zgjXW7VRZKjIX+g4xKB57maH/y8= In-Reply-To: <104n8he$lb42$1@dont-email.me> Content-Language: en-GB On 10/07/2025 04:28, BGB wrote: > On 7/9/2025 4:41 AM, David Brown wrote: >> On 09/07/2025 04:39, BGB wrote: >>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote: >>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote: >>>>> >> ... >> >> There have been plenty of papers and blogs written about pointer >> provenance (several by Gustedt) and how it could work. It's not a >> very easy thing to follow in any format. A patch to current C >> standards is perhaps the least easy to follow, but it is important for >> how the concept could be added to C. >> > > Admittedly, as of yet, I haven't quite figured out what exactly > provenance is supposed to be, or how it is supposed to work in practice. > I've read a bit, but I think it would take quite an effort to understand the details. As a compiler user (albeit one with an interest in compilers and code generation), rather than a compiler developer, my attitude to writing C code will be the same if and when pointer provenance becomes part of the C model and C compiler optimisations - don't lie to your compiler. If you want to do weird stuff behind the compiler's back (and that is certainly possible in embedded development), use "volatile" accesses in the right places. So for me, in practical use, pointer provenance will simply mean that the compiler can do a bit more optimisation with less manual work - and that's a nice thing. (I'll still be interested in how it works, but that's for fun, not for real work.) >>>> >>>> If you think that certain code could go faster because certain >>>> suspected >>>> aliasing isn't actually taking place, then since C99 you were able to >>>> spin the roulette wheel and use "restrict". >>>> >> >> "restrict" can certainly be useful in some cases. There are also >> dozens of compiler extensions (such as gcc attributes) for giving the >> compiler extra information about aliasing. >> > > And, the annoyance of them being compiler dependent... Sure. "restrict" is, of course, not compiler dependent - but the effect it has on optimisation is compiler dependent. Often you can also get improved results by manually "caching" data in local variables, instead of using pointer or array access directly, thus avoiding any extra memory accesses the compiler has to put in just in case pointers alias. But code is neater if you don't have to do that kind of thing. > > >>>> So the aliasing analysis and its missed opportunities are the >>>> programmer's responsibility. >>>> >>>> It's always better for the machine to miss opportunities than to miss >>>> compile. :) >>>> >>> >>> Agreed. >> >> It is always better for the toolchain to be able to optimise >> automatically than to require manual intervention by the programmer. >> (It should go without saying that optimisations are only valid if they >> do not affect the observable behaviour of correct code.) Programmers >> are notoriously bad at figuring out what will affect their code >> efficiency, and will either under-use "restrict" where it could >> clearly be safely used to speed up code, or over-use it resulting in >> risky code. >> >> If the compiler can't be sure that accesses don't alias, then of >> course it should assume that aliasing is possible. >> >> The idea of pointer provenance is to let compilers (and programmers!) >> have a better understanding of when accesses are guaranteed to be >> alias- free, when they are guaranteed to be aliasing, and when there >> are no guarantees. This is useful for optimisation and program >> analysis (including static error checking). The more information the >> compiler has, the better. >> > > That is the idea at least. > > Though, if one assumes the compiler has non-local visibility, this is a > problem. > > Granted, as long as one can keep using more traditional semantics, > probably OK. Of course compilers can (and must!) fall back to the "assume accesses might alias" approach when they don't have the extra information. But at least for code in the same compilation, they can do better. And there is a trend amongst those wanting higher performance to use link-time optimisation, whole-program optimisation, or similarly named techniques to share information across units. Traditional separate compilation to object files then linking by identifier name only is a nice clear model, but hugely limiting for both optimisation and static error checking. > > >>> >>> In my compiler, the default was to use a fairly conservative aliasing >>> strategy. >>> >> ... >>> With pointer operations, all stores can be assumed potentially >>> aliasing unless restrict is used, regardless of type. >>> >> >> C does not require that. And it is rare in practice, IME, for code to >> actually need to access the same data through different lvalue types >> (other than unsigned char). It is rarer still for it not to be >> handled better using type-punning unions or memcpy() - assuming the >> compiler handles memcpy() decently. >> > > I take a conservative approach because I want the compiler to be able to > run code that assumes traditional behavior (like that typical of 1990s > era compilers, or MSVC). Please don't call this "traditional behaviour" of compilers - be honest, and call it limited optimisation and dumb translation. And don't call it "code that assumes traditional behaviour" - call it "code written by people who don't really understand the language". Code which assumes you can do "extern float x; unsigned int * p = (unsigned int *) &x;" is broken code. It always has been, and always will be - even if it does what the programmer wanted on old or limited compilers. There were compilers in the 1990's that did type-based alias analysis, and many other "modern" optimisations - I have used at least one. It's okay to be conservative in a compiler (especially when high optimisation is really difficult!). It's okay to have command-line switches or pragmas to support additional language semantics such as supporting access via any lvalue type, or giving signed integer arithmetic two's complement wrapping behaviour. It's okay to make these the defaults. But it is not okay to encourage code to make these compiler-specific assumptions without things like a pre-processor check for the specific compiler and pragmas to explicitly set the required compiler switches. It is not okay to excuse bad code as "traditional style" - that's an insult to people who have been writing good C code for decades. > > Granted, it is a tradeoff that a lot of this code needs to be modified > to work on GCC and Clang (absent the usual need for "-fwrapv > -fno-strict-aliasing" options). > > Granted, there is a command-line option to enable TBAA semantics, just > it is not the default option in this case (so, in BGBCC, TBAA is opt-in; > rather than opt-out in GCC and Clang). > > BGBCC's handling of memcpy is intermediate: > It can turn it into loads and stores; > But, it can't turn it into a plain register move; > Taking the address of a variable will also cause the variable to be > loaded/stored every time it is accessed in this function (regardless of > where it is accessed in said function). > > So: > memcpy(&i, &f, 8); > Will still use memory ops and wreck the performance of both the i and f > variables. Well, there you have scope for some useful optimisations (more useful than type-based alias analysis). memcpy does not need to use memory accesses unless real memory accesses are actually needed to give the ========== REMAINDER OF ARTICLE TRUNCATED ==========