Deutsch   English   Français   Italiano  
<utkftr$32ahu$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder6.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 18:42:19 +0100
Organization: A noiseless patient Spider
Lines: 225
Message-ID: <utkftr$32ahu$1@dont-email.me>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
 <20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
 <20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
 <20240322083648.539@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Mar 2024 17:42:19 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="07a44ddb34b47981e78dc5e82c11a0d6";
	logging-data="3222078"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18uefwZcUm2cCEj3jrdgM4JxbwXK+Mld/c="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:OCnegnn6Ziepk9fiXvxjAK1lonc=
In-Reply-To: <20240322083648.539@kylheku.com>
Content-Language: en-GB
Bytes: 12068

On 22/03/2024 16:50, Kaz Kylheku wrote:
> On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>>>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>>>     A "famous security bug":
>>>>>>
>>>>>> void f( void )
>>>>>> { char buffer[ MAX ];
>>>>>>     /* . . . */
>>>>>>     memset( buffer, 0, sizeof( buffer )); }
>>>>>>
>>>>>>     . Can you see what the bug is?
>>>>>
>>>>> I don't know about "the bug", but conditions can be identified under
>>>>> which that would have a problem executing, like MAX being in excess
>>>>> of available automatic storage.
>>>>>
>>>>> If the /*...*/ comment represents the elision of some security sensitive
>>>>> code, where the memset is intended to obliterate secret information,
>>>>> of course, that obliteration is not required to work.
>>>>>
>>>>> After the memset, the buffer has no next use, so the all the assignments
>>>>> performed by memset to the bytes of buffer are dead assignments that can
>>>>> be elided.
>>>>>
>>>>> To securely clear memory, you have to use a function for that purpose
>>>>> that is not susceptible to optimization.
>>>>>
>>>>> If you're not doing anything stupid, like link time optimization, an
>>>>> external function in another translation unit (a function that the
>>>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>>>> ought to suffice.
>>>>
>>>> Using LTO is not "stupid".  Relying on people /not/ using LTO, or not
>>>> using other valid optimisations, is "stupid".
>>>
>>> LTO is a nonconforming optimization. It destroys the concept that
>>> when a translation unit is translated, the semantic analysis is
>>> complete, such that the only remaining activity is resolution of
>>> external references (linkage), and that the semantic analysis of one
>>> translation unit deos not use information about another translation
>>> unit.
>>>
>>> This has not yet changed in last April's N3096 draft, where
>>> translation phases 7 and 8 are:
>>>
>>>    7. White-space characters separating tokens are no longer significant.
>>>       Each preprocessing token is converted into a token. The resulting
>>>       tokens are syntactically and semantically analyzed and translated
>>>       as a translation unit.
>>>
>>>    8. All external object and function references are resolved. Library
>>>       components are linked to satisfy external references to functions
>>>       and objects not defined in the current translation. All such
>>>       translator output is collected into a program image which contains
>>>       information needed for execution in its execution environment.
>>>
>>> and before that, the Program Structure section says:
>>>
>>>    The separate translation units of a program communicate by (for
>>>    example) calls to functions whose identifiers have external linkage,
>>>    manipulation of objects whose identifiers have external linkage, or
>>>    manipulation of data files. Translation units may be separately
>>>    translated and then later linked to produce an executable program.
>>>
>>> LTO deviates from the the model that translation units are separate,
>>> and the conceptual steps of phases 7 and 8.
>> [...]
>>
>> Link time optimization is as valid as cross-function optimization *as
>> long as* it doesn't change the defined behavior of the program.
> 
> It always does; the interaction of a translation unit with another
> is an externally visible aspect of the C program.

The C standards don't define a term "externally visible".  They define 
"observable behaviour", and require that a conforming implementation 
generates a program that matches the "observable behaviour".  This is in 
5.1.2.2.2p6.  Interaction between translation units is not part of the 
observable behaviour of a program, because it is not relevant to the 
concept of /running/ a program - it is only relevant when translating 
the source to the program image.

Thus the "as if" rules apply - the compiler can do whatever it wants - 
up to and including asking ChatGPT for an exe file - as long as the 
result is a /program/ that gives the same "observable behaviour" as you 
would get from an abstract machine.

You should read the footnotes to 5.1.1.2 "Translation phases". 
Footnotes are not normative, but they are helpful in explaining the 
meaning of the text.  They note that compilers don't have to follow the 
details of the translation phases, and that source files, translation 
units, and translated translation units don't have to have one-to-one 
correspondences.

The standard also does not say what the output of "translation" is - it 
does not have to be assembly or machine code.  It can happily be an 
internal format, as used by gcc and clang/llvm.  It does not define what 
"linking" is, or how the translated translation units are "collected 
into a program image" - combining the partially compiled units, 
optimising, and then generating a program image is well within that 
definition.

> (That can be inferred
> from the rules which forbid semantic analysis across translation
> units, only linkage.)

The rules do not forbid semantic analysis across translation units - 
they merely do not /require/ it.  You are making an inference without 
any justification that I can see.

> 
> That's why we can have a real world security issue caused by zeroing
> being optimized away.

No, it is not.  We have real-world security issues for all sorts of 
reasons, including people mistakenly thinking they can force particular 
types of code generation by calling functions in different source files.

(To be clear here, before LTO became common, that was a strategy that 
worked.  There is a long history in C programming of dilemmas between 
writing code that you know works efficiently on current tools, or 
writing code that you know is guaranteed correct by the standards but is 
inefficient with current tools.)

> 
> The rules spelled out in ISO C allow us to unit test a translation
> unit by linking it to some harness, and be sure it has exactly the
> same behaviors when linked to the production program.
> 

No, they don't.

If the unit you are testing calls something outside that unit, you may 
get different behaviours when testing and when used in production.  The 
only thing you can be sure of from testing is that if you find a bug 
during testing, you have a bug in the code.  You can never use testing 
to be sure that the code works (with the exception of exhaustive testing 
of all possible inputs, which is rarely practical).

> If I have some translation unit in which there is a function foo, such
> that when I call foo, it then calls an external function bar, that's
> observable. 

5.1.2.2.1p6 lists the three things that C defines as "observable 
behaviour".  Function calls - internal or external - are not amongst these.

> I can link that unit to a program which supplies bar,
> containing a printf call, then call foo and verify that the printf call
> is executed.

Yes, you can.  The printf call - or, more exactly, the "input and output 
dynamics" - are observable behaviour.  The call to "bar", however, is not.

The compiler, when compiling the source of "foo", will include a call to 
"bar" when it does not have the source code (or other detailed semantic 
information) for "bar" available at the time.  But you are mistaken to 
think it does so because the call is "observable" or required by the C 
standard.  It does so because it cannot prove that /running/ the 
function "bar" contains no observable behaviour, or otherwise affects 
the observable behaviour of the program.  The compiler cannot skip the 
call unless it can be sure it is safe to do so - and if it knows nothing 
about the implementation of "bar", it must assume the worst.

Sometimes the compiler may have additional information - such as if it 
is declared the gcc "const" or "pure" attributes (or the standardised 
"unsequenced" and "reproducible" attributes in the draft for the next C 
version after C23).  This may allow a compiler to re-arrange calls, 
duplicating them, eliminating them, or re-ordering them in various ways. 
  (The C2y draft includes running such functions once at startup for 
each input value, and preserving the results for later use, as a 
permissible optimisation.  It does this without having changed the 
description of translation phases or observable behaviour.  But of 
========== REMAINDER OF ARTICLE TRUNCATED ==========