Article <20240322105321.365@kylheku.com>

Deutsch English Français Italiano
<20240322105321.365@kylheku.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Kaz Kylheku <433-929-6894@kylheku.com>
Newsgroups: comp.lang.c
Subject: Re: A Famous Security Bug
Date: Fri, 22 Mar 2024 18:55:15 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 228
Message-ID: <20240322105321.365@kylheku.com>
References: <bug-20240320191736@ram.dialup.fu-berlin.de>
 <20240320114218.151@kylheku.com> <uthirj$29aoc$1@dont-email.me>
 <20240321092738.111@kylheku.com> <87a5mr1ffp.fsf@nosuchdomain.example.com>
 <20240322083648.539@kylheku.com> <utkftr$32ahu$1@dont-email.me>
Injection-Date: Fri, 22 Mar 2024 18:55:15 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5ea1ed0f61c2acaab32e111ed755f390";
	logging-data="3255872"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18Xx5s3CrvOw/8Km61E0Nz2frTUIdwrCsg="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:wf5Qsrpq5fDQGrr2KPJy+kkzAr4=
Bytes: 11469

On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
> You should read the footnotes to 5.1.1.2 "Translation phases". 
> Footnotes are not normative, but they are helpful in explaining the 
> meaning of the text.  They note that compilers don't have to follow the 
> details of the translation phases, and that source files, translation 
> units, and translated translation units don't have to have one-to-one 
> correspondences.

Yes, I'm aware of that. For instance preprocessing can all be jumbled
into one process. But it has to produce that result.

Even if translation phases 7 and 8 are combined, the semantic analysis
of the individual translation unit has to appear to be settled before
linkage. So for instance a translation unit could incrementally emerge
from the semantic analysis steps, and those parts of it already analyzed
(phase 7) could start to be linked to other translation units (phase 8).

I'm just saying that certain information leakage is clearly permitted,
regardless of how the phases are integrated.

> The standard also does not say what the output of "translation" is - it 
> does not have to be assembly or machine code.  It can happily be an 
> internal format, as used by gcc and clang/llvm.  It does not define what 
> "linking" is, or how the translated translation units are "collected 
> into a program image" - combining the partially compiled units, 
> optimising, and then generating a program image is well within that 
> definition.
>
>> (That can be inferred
>> from the rules which forbid semantic analysis across translation
>> units, only linkage.)
>
> The rules do not forbid semantic analysis across translation units - 
> they merely do not /require/ it.  You are making an inference without 
> any justification that I can see.

Translation phase 7 is clearly about a single translation unit in
isolation:

"The resulting tokens are syntactically and semantically analyzed
 and translated as a translation unit."

Not: "as a combination of multiple translation uints".

5.1.1.1 clearly refers to "[t]he separate translation units of a
program".

LTO pretends that the program is still divided into the same translation
units, while minging them together in ways contrary to all those
chapter 5 descriptions.

The conforming way to obtain LTO is to actually combine multiple
preprocessing translation units into one.

>> That's why we can have a real world security issue caused by zeroing
>> being optimized away.
>
> No, it is not.  We have real-world security issues for all sorts of 
> reasons, including people mistakenly thinking they can force particular 
> types of code generation by calling functions in different source files.

In fact, that code generation is forced, when people do not use LTO,
which is not enabled by default.

>> The rules spelled out in ISO C allow us to unit test a translation
>> unit by linking it to some harness, and be sure it has exactly the
>> same behaviors when linked to the production program.
>
> No, they don't.
>
> If the unit you are testing calls something outside that unit, you may 
> get different behaviours when testing and when used in production.

Yes; if you do nonconforming things.

> only thing you can be sure of from testing is that if you find a bug 
> during testing, you have a bug in the code.  You can never use testing 
> to be sure that the code works (with the exception of exhaustive testing 
> of all possible inputs, which is rarely practical).

LTO will break translation units that are simple enough to be trivially
proven to have a certain behavior.

>> If I have some translation unit in which there is a function foo, such
>> that when I call foo, it then calls an external function bar, that's
>> observable. 
>
> 5.1.2.2.1p6 lists the three things that C defines as "observable 
> behaviour".  Function calls - internal or external - are not amongst these.

External calls are de facto observable, because we have it for granted
when we have a translation unit that calls a certain function, we can
supply another translation unit which supplies that function. In 
that function we can communicate with the host environment to confirm
that it was called.

>> I can link that unit to a program which supplies bar,
>> containing a printf call, then call foo and verify that the printf call
>> is executed.
>
> Yes, you can.  The printf call - or, more exactly, the "input and output 
> dynamics" - are observable behaviour.  The call to "bar", however, is not.

If bar does not call the function, then the observable behavior of
printf doesn't occur either; they linked by logic / cause-and-effect.

A behavior that is not itself formally classified as observable can be
discovered by logical linkage to be necessary for the production of
observable behavior. It can be an "if, and only if" linkage.

If an observable behavior B occurs if, and only if, some behavior A
occurs, then the fact of whether A occurs or not is de facto observable.

> The compiler, when compiling the source of "foo", will include a call to 
> "bar" when it does not have the source code (or other detailed semantic 
> information) for "bar" available at the time.

Translation phases 1 to 7 forbid processing material from another
translation unit. Conforming semantic analysis of a translation unit has
nothing but that translation unit.

> But you are mistaken to 
> think it does so because the call is "observable" or required by the C 
> standard.

Sure; let's say that the call can be tied to observable behavior
elsewhere such that the call occurs if and only if the observable
behavior occurs.

> It does so because it cannot prove that /running/ the 
> function "bar" contains no observable behaviour, or otherwise affects 
> the observable behaviour of the program.  The compiler cannot skip the 
> call unless it can be sure it is safe to do so - and if it knows nothing 
> about the implementation of "bar", it must assume the worst.

The compiler cannot do any of this if it is in a conforming mode.

But sure, in the nonconforming LTO paradigm, which does have to adhere
to sane rules, that more or less follow what would have to happen if
multiple preprocessing translation units were merged at the token level
and thus analyzed together.

> Sometimes the compiler may have additional information - such as if it 
> is declared the gcc "const" or "pure" attributes (or the standardised 
> "unsequenced" and "reproducible" attributes in the draft for the next C 
> version after C23).

If the declarations are available only in another translation unit,
they cannot be taken into account when analyzing this translation unit.

>> Since ISO C says that the semantic analysis has been done (that
>> unit having gone through phase 7), we can take it for granted as a
>> done-and-dusted property of that translation unit that it calls bar
>> whenever its foo is invoked.
>
> No, we can't - see above.  Nothing in the C standards forbids any 
> additional analysis, or using other information in code generation.

Any semantic analysis performed be that which is stated in translation
phase 7, which happens for one translation unit, before considering
linkage to other translation units.

What forbids is is that no semantic analysis activity is decribed as
taking place in translation phase 8, other than linage.

>>> Say I have a call to foo in main, and the definition of foo is in
>>> another translation unit.  In the absence of LTO, the compiler will have
>>> to generate a call to foo.  If LTO is able to determine that foo doesn't
>>> do anything, it can remove the code for the function call, and the
>>> resulting behavior of the linked program is unchanged.
>> 
>> There always situations in which optimizations that have been forbidden
>> don't cause a problem, and are even desirable.
>> 
>
> Can you give examples?
>
> You already mentioned "-fast-math" (and by implication, its various 
> subflags in gcc, clang and icc).  These are clearly documented as 
> allowing some violations of the C standards (and not least, the IEEE 
> floating point standards, which are stricter than those of C).
========== REMAINDER OF ARTICLE TRUNCATED ==========