Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown Newsgroups: comp.lang.c Subject: Re: A Famous Security Bug Date: Fri, 22 Mar 2024 13:51:53 +0100 Organization: A noiseless patient Spider Lines: 154 Message-ID: References: <20240320114218.151@kylheku.com> <20240321092738.111@kylheku.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 22 Mar 2024 12:51:54 -0000 (UTC) Injection-Info: dont-email.me; posting-host="07a44ddb34b47981e78dc5e82c11a0d6"; logging-data="3086673"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+sXe4E/Aa6gD1rsyXcm4KyM9AVuXylm4k=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cancel-Lock: sha1:wZ2AvimxoKL3tp/VNqntDepFguU= In-Reply-To: <20240321092738.111@kylheku.com> Content-Language: en-GB Bytes: 8270 On 21/03/2024 18:41, Kaz Kylheku wrote: > On 2024-03-21, David Brown wrote: >> On 20/03/2024 19:54, Kaz Kylheku wrote: >>> On 2024-03-20, Stefan Ram wrote: >>>> A "famous security bug": >>>> >>>> void f( void ) >>>> { char buffer[ MAX ]; >>>> /* . . . */ >>>> memset( buffer, 0, sizeof( buffer )); } >>>> >>>> . Can you see what the bug is? >>> >>> I don't know about "the bug", but conditions can be identified under >>> which that would have a problem executing, like MAX being in excess >>> of available automatic storage. >>> >>> If the /*...*/ comment represents the elision of some security sensitive >>> code, where the memset is intended to obliterate secret information, >>> of course, that obliteration is not required to work. >>> >>> After the memset, the buffer has no next use, so the all the assignments >>> performed by memset to the bytes of buffer are dead assignments that can >>> be elided. >>> >>> To securely clear memory, you have to use a function for that purpose >>> that is not susceptible to optimization. >>> >>> If you're not doing anything stupid, like link time optimization, an >>> external function in another translation unit (a function that the >>> compiler doesn't recognize as being an alias or wrapper for memset) >>> ought to suffice. >> >> Using LTO is not "stupid". Relying on people /not/ using LTO, or not >> using other valid optimisations, is "stupid". > > LTO is a nonconforming optimization. Really? That is news to me, and I suspect to the folks at gcc and clang/llvm that developed LTO for these compilers. (I have worked with embedded compilers that have had LTO-type optimisations for decades, but these are not often concerned with the minutiae of the standards.) > It destroys the concept that > when a translation unit is translated, the semantic analysis is > complete, such that the only remaining activity is resolution of > external references (linkage), and that the semantic analysis of one > translation unit deos not use information about another translation > unit. Where is it described in the C standards that semantic information from one translation unit cannot be used (for optimisation, for static error checking, for other analysis or any other purposes) in another translation unit? What makes you think that LTO, as implemented in compilers like gcc and clang/llvm, do not generate code according to the "as if" rules? (That is, they can generate code that is more optimal, but produces the same observable effects "as if" they were strict dumb translators of the functioning of the C abstract machine.) I believe there is very little where the behaviour of a C program is different if parts of the code are in one translation unit, or if they are in several. There are utilities that merge multiple C files into single C files (for easier deployment, or for better optimisation). They have to take into account renaming static objects and functions to file-local names, and remove duplicate type definitions, but as long as certain reasonable rules are followed by the programmer, it all goes fine. (You could, I suppose, hit complications if you relied on compatibility of struct or union types across translation units where the identifiers were different and they are compatible across TU's but not within TU's, according to the 6.2.7p1 rules. But that would be unlikely, and I expect LTO compilers to handle those cases.) > > This has not yet changed in last April's N3096 draft, where > translation phases 7 and 8 are: > > 7. White-space characters separating tokens are no longer significant. > Each preprocessing token is converted into a token. The resulting > tokens are syntactically and semantically analyzed and translated > as a translation unit. > > 8. All external object and function references are resolved. Library > components are linked to satisfy external references to functions > and objects not defined in the current translation. All such > translator output is collected into a program image which contains > information needed for execution in its execution environment. > > and before that, the Program Structure section says: > > The separate translation units of a program communicate by (for > example) calls to functions whose identifiers have external linkage, > manipulation of objects whose identifiers have external linkage, or > manipulation of data files. Translation units may be separately > translated and then later linked to produce an executable program. > All of that is irrelevant. It says nothing against sharing other information. > LTO deviates from the the model that translation units are separate, > and the conceptual steps of phases 7 and 8. No, it does not. These paragraphs are requirements, not limitations. > > The translation unit that is prepared for LTO is not fully cooked. You > have no idea what its code will turn into when the interrupted > compilation is resumed during linkage, under the influence of other > tranlation units it is combined with. You have as much and as little idea of what the generated code will be as you always do during compilation. Compilers can do all kinds of manipulations of the source code you write - as long as the observable behaviour of the program is the same as a dumb translation. They can, and do, use all kinds of inter-procedural optimisations for inlining code, outlining it, breaking functions into pieces, cloning them, using constant propagation, and so on. LTO lets them do this across translation units. > > So in fact, the language allows us to take it for granted that, given > > my_memset(array, 0, sizeof(array)); } > > at the end of a function, and my_memset is an external definition > provided by another translation unit, the call may not be elided. > No, the C language standards make no such guarantee. > The one who may be acting recklessly is he who turns on nonconforming > optimizations that are not documented as supported by the code base. > > Another example would be something like gcc's -ffast-math. That is /completely/ different. That option is clearly documented as potentially violating some of the rules of the ISO C standards. This is why it is not enabled by default or by any common optimisation levels (except "-Ofast", which is also documented as potentially violating standards). > You wouldn't unleash that on numerical code written by experts, > and expect the same correct results. > I would not expect identical results to floating point calculations, no. Depending on the code in question, I would still expect correct results. I use "-ffast-math" in all my code in order to get correct results a good deal faster (for my targets, and my type of code) than I would get without it.