Deutsch English Français Italiano |
<20240322094449.555@kylheku.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Kaz Kylheku <433-929-6894@kylheku.com> Newsgroups: comp.lang.c Subject: Re: A Famous Security Bug Date: Fri, 22 Mar 2024 17:20:03 -0000 (UTC) Organization: A noiseless patient Spider Lines: 139 Message-ID: <20240322094449.555@kylheku.com> References: <bug-20240320191736@ram.dialup.fu-berlin.de> <20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me> <20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com> <20240322083648.539@kylheku.com> <87le6az0s8.fsf@nosuchdomain.example.com> Injection-Date: Fri, 22 Mar 2024 17:20:03 -0000 (UTC) Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390"; logging-data="3210798"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19hDZd8jZOex4BLrJvylQhxB/1JK8zVhDM=" User-Agent: slrn/pre1.0.4-9 (Linux) Cancel-Lock: sha1:lnLseEdtpdHEI/elmvn6HGDer7U= Bytes: 6665 On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: > Kaz Kylheku <433-929-6894@kylheku.com> writes: >> Since ISO C says that the semantic analysis has been done (that >> unit having gone through phase 7), we can take it for granted as a >> done-and-dusted property of that translation unit that it calls bar >> whenever its foo is invoked. > > We can take it for granted that the output performed by the printf call > will be performed, because output is observable behavior. If the > external function bar is modified, the LTO step has to be redone. That's what undeniably has to be done in the LTO world. Nothing that is done brings that world into conformance, though. >>> Say I have a call to foo in main, and the definition of foo is in >>> another translation unit. In the absence of LTO, the compiler will have >>> to generate a call to foo. If LTO is able to determine that foo doesn't >>> do anything, it can remove the code for the function call, and the >>> resulting behavior of the linked program is unchanged. >> >> There always situations in which optimizations that have been forbidden >> don't cause a problem, and are even desirable. >> >> If you have LTO turned on, you might be programming in GNU C or Clang C >> or whatever, not standard C. >> >> Sometimes programs have the same interpretation in GNU C and standard >> C, or the same interpretation to someone who doesn't care about certain >> differences. > > Are you claiming that a function call is observable behavior? Yes. It is the observable behavior of an unlinked translation unit. It can be observed by linking a harness to it, with a main() function and all else that is required to make it a complete program. That harness becomes an instrument for observation. > Consider: > > main.c: > #include "foo.h" > int main(void) { > foo(); > } > > > foo.h: > #ifndef FOO_H > #define FOO_H > void foo(void); > #endif > > > foo.c: > void foo(void) { > // do nothing > } > > > Are you saying that the "call" instruction generated for the function > call is *observable behavior*? Of course; it can be observed externally, without doing any reverse engineering on the translated unit. External linkage is called "external" for a reason! > If an implementation doesn't generate > that "call" instruction because it's able to determine at link time that > the call does nothing, that optimization is forbidden? The text says so. Translation units are separate; semantic analysis is finished in translation phase 7; linking in 8. Out of translation phases 1-7 we get a concrete artifact: the translated unit. That has externally visible features, like what symbols it requires. Its behavior with regard to those symbols can be empirically observed, validated by tests and expected to hold thereafter. Since semantic analysis is complete, any observable behavior can be taken to be a fact about that translated unit, a property of it, which will not change when it is subject to linkage. The truth cannot be clawed back, according to the way things are defined in the standard, and this is a good thing. > I presume you'd agree that omitting the "call" instruction is allowed if > the call and the function definition are in the same translation unit. Yes. And that's a way to get the effect of LTO portably, in a conforming way, in any implementation going back decades. Instead of linkage use #include "foo.c", #include "bar.c" (taking steps to ensure your internal names don't clash). LTO is more convenient in that you don't have to use an unusual program structure, and keeps your internal linkage scopes separate. Just don't pretend it's conforming to standard C, any more than -ffast-math. LTO is "vooodoo" though. The translation units contain intermediate code, not target code. The intermediate code continues to be subject to compiler passes when the translation units are brought together. Thus translation is going on, but the units are gone. > What wording in the standard requires a "call" instruction to be > generated if they're in different translation units? > > That's a trivial example, but other link time optimizations that don't > change a program's observable behavior (insert weasel words about > unspecified behavior) are also allowed. An example would be the removal of material that is not referenced, like functions not called anywhere, or entire translation units whose external names are not referenced. That can cause issues too, and I've run into them, but I can't call that nonconforming. Nothing is semantically analyzed across translation units, only the linkage graph itself, which may be found to be disconnected. > In phase 8: > All external object and function references are resolved. Library > components are linked to satisfy external references to functions > and objects not defined in the current translation. All such > translator output is collected into a program image which contains > information needed for execution in its execution environment. > > I don't see anything about required CPU instructions. I don't see anything about /removing/ instructions that have to be there according to the semantic analysis performed in order to translate those units from phases 1 - 7, and that can be confirmed to be present with a test harness. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca