Article <vrjcd5$18m5n$1@dont-email.me>

Deutsch English Français Italiano
<vrjcd5$18m5n$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: int a = a
Date: Fri, 21 Mar 2025 10:44:05 +0100
Organization: A noiseless patient Spider
Lines: 154
Message-ID: <vrjcd5$18m5n$1@dont-email.me>
References: <vracit$178ka$1@dont-email.me> <vrc2d5$1jjrf$1@paganini.bofh.team>
 <vrc4eb$2p28t$1@dont-email.me> <vrc75b$2r4lt$1@dont-email.me>
 <vrccjb$b3m6$1@news.xmission.com> <vrcef2$33076$1@dont-email.me>
 <vrelvn$12ddq$1@dont-email.me> <87sen8u5d5.fsf@nosuchdomain.example.com>
 <86zfhgni2a.fsf@linuxsc.com> <87cyect356.fsf@nosuchdomain.example.com>
 <vrhbsf$3e7sn$4@dont-email.me> <87msdfscxj.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 21 Mar 2025 10:44:06 +0100 (CET)
Injection-Info: dont-email.me; posting-host="7dde97a724a107a78eed0ca7f7e82295";
	logging-data="1333431"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18FYQR5UUAEE6TTHdIh4pSI8M5AeF1IxFM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:6vFzV1Sk3GpkX9f5P3Z1nxcjoBE=
Content-Language: en-GB
In-Reply-To: <87msdfscxj.fsf@nosuchdomain.example.com>
Bytes: 7637

On 20/03/2025 20:46, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 20/03/2025 11:20, Keith Thompson wrote:
>>> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>>>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>>>> The "could have been declared with the register storage class"
>>>>> seems quite odd.  And in fact it is quite odd.
>>>>
>>>> I don't have the same reaction.  The point of this phrase is that
>>>> undefined behavior occurs only for variables that don't have
>>>> their address taken.  The phrase used describes that nicely.
>>>> Any questions related to "registerness" can be ignored, because
>>>> 'register' in C really has nothing to do with hardware registers,
>>>> despite the name.
>>> DR 338 is explicitly motivated by an IA-64 feature that applies only
>>> to
>>> CPU registers.  An object whose address is taken can't be stored (only)
>>> in a register, so it can't have a NaT representation.
>>> The phrase used is "could have been declared with register storage
>>> class
>>> (never had its address taken)".  Surely "never had its address taken"
>>> would have been clear enough if CPU registers weren't a big part of the
>>> motivation.
>>
>> I too think the phrasing is a bit odd.
>>
>> Just because a variable's address is taken, does not mean it cannot be
>> put in a cpu register by the compiler.  If the variable is not
>> accessed in a way that actually requires putting it in memory, then
>> the compiler can put it in a cpu register (or otherwise optimise it).
>> So simply taking the address of a variable on IA-64 does not mean it
>> cannot be in a register, and thus does not necessarily mean it cannot
>> be NaT.  Taking the address of a variable means the variable cannot be
>> declared "register", but it does not mean it cannot be /in/ a
>> register.
> 
> Sure, any variable that's stored in memory can be mirrored by holding
> its value in a register.
> 
>      int n = 42; // Assume n is assigned a memory address
>      printf("n+1=%d n+2=%d\n", n+1, n+2);
> 
> A compiler could plausibly store the value of n in a register before
> computing n+1, and then reuse the register value to compute n+2.

Yes, of course.  But there is also no necessity for variables to be in 
memory at all, or that there is any consistency there.  "Assume n is 
assigned a memory address" is a completely unwarranted assumption for 
almost all local variables.  It is only if the address is taken, and 
used in some way that is beyond the optimiser, that the variable 
actually has to go in a fixed place in memory.  Otherwise optimisers can 
and do keep data in registers, or move them in and out of registers and 
different stack slots according to convenience for efficient code.


uint32_t float_to_uint(float f) {
     uint32_t u;
     memcpy(&u, &f, 4);
     return u;
}

gcc compiles that to :

float_to_uint:
         movd    eax, xmm0
         ret

So even though the addresses of the variable "u" and the parameter "f" 
are taken, and converted to char pointers, and passed to a function with 
external linkage, nothing is actually put in memory at all.

Thus the standard's wording as though the legality of using the 
"register" storage-class specifier corresponds to cpu register usage is, 
at best, wildly out of date.

(And there are some architectures where the cpu registers are directly 
mapped to memory, and can be accessed as memory locations or registers.)

> 
> My understanding is that IA-64 NaT (Not a Thing) representations
> exist only for registers, and the NaT bit should be cleared when
> a value is stored in the register.
> 
> The odd wording in the standard allows an IA-64 C compiler to
> take advantage of NaT representations for their intended purpose.
> It might impose some minor constraints on what machine code can be
> generated, but *most* of the cases where a NaT could be accessed
> are undefined behavior in C.
> 

I see that, but I believe it would be much simpler and clearer if 
attempting to read an uninitialised and unassigned local variable were 
undefined behaviour in every case.

Alternatively, it could have said that the value is unspecified in every 
case.  Then on the IA-64, the compiler would have to ensure that 
registers do not have their NaT bit set even if they are not initialised 
- this would not be a difficult task.  Enabling use of the NaT bit for 
detection of bugs could then be a compiler option if implementations 
wanted to provide that feature.

>> It seems very strange to me that this is UB:
>>
>> 	int foo1(void) {
>> 		int x;
>>
>> 		return x;
>> 	}
>>
>> while this is not :
>>
>> 	int foo2(void) {
>> 		int x;
>>
>> 		int * p = &x;
>>
>> 		return x;
>> 	}
>>
>> (Unfortunately, godbolt.org doesn't seem to have a gcc IA-64 compiler
>> in its list.)
>>
>> It strikes me that it would have been far simpler for the standard
>> simply to say that using the value of an uninitialised and unassigned
>> variable is undefined behaviour.
> 
> In C90, it was.  C99 changed that, making the behavior defined if the
> representation is not a trap representation.
> 
> For C99, a conforming IA-64 C compiler would have had to go out of its
> way to avoid accessing NaT representations.  For example, if you wrote
> 
>      {
>          int n;
>          n;
>      }
> 
> the most straightforward IA-64 code would store n in a register and
> not initialize it, resulting in a trap when the register is read.
> A compiler might have to generate code to store an arbitrary value
> in the register to void the trap.
> 
> I'm undecided on whether reading the value of an uninitialized
> automatic object *should* be undefined behavior, but given that
> it isn't, the C11 committee made the smallest possible change to
> cater to IA-64 semantics.
> 

IMHO, having it as UB is the best option, with unspecified behaviour as 
a second best option.  The jumble that C11 has is not necessary for the 
IA-64, and clearly worse than the other two choices for architectures 
that don't have a NaT equivalent.