Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: David Brown Newsgroups: comp.lang.c Subject: Re: int a = a Date: Fri, 21 Mar 2025 10:44:05 +0100 Organization: A noiseless patient Spider Lines: 154 Message-ID: References: <87sen8u5d5.fsf@nosuchdomain.example.com> <86zfhgni2a.fsf@linuxsc.com> <87cyect356.fsf@nosuchdomain.example.com> <87msdfscxj.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 21 Mar 2025 10:44:06 +0100 (CET) Injection-Info: dont-email.me; posting-host="7dde97a724a107a78eed0ca7f7e82295"; logging-data="1333431"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18FYQR5UUAEE6TTHdIh4pSI8M5AeF1IxFM=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cancel-Lock: sha1:6vFzV1Sk3GpkX9f5P3Z1nxcjoBE= Content-Language: en-GB In-Reply-To: <87msdfscxj.fsf@nosuchdomain.example.com> Bytes: 7637 On 20/03/2025 20:46, Keith Thompson wrote: > David Brown writes: >> On 20/03/2025 11:20, Keith Thompson wrote: >>> Tim Rentsch writes: >>>> Keith Thompson writes: >>>>> The "could have been declared with the register storage class" >>>>> seems quite odd. And in fact it is quite odd. >>>> >>>> I don't have the same reaction. The point of this phrase is that >>>> undefined behavior occurs only for variables that don't have >>>> their address taken. The phrase used describes that nicely. >>>> Any questions related to "registerness" can be ignored, because >>>> 'register' in C really has nothing to do with hardware registers, >>>> despite the name. >>> DR 338 is explicitly motivated by an IA-64 feature that applies only >>> to >>> CPU registers. An object whose address is taken can't be stored (only) >>> in a register, so it can't have a NaT representation. >>> The phrase used is "could have been declared with register storage >>> class >>> (never had its address taken)". Surely "never had its address taken" >>> would have been clear enough if CPU registers weren't a big part of the >>> motivation. >> >> I too think the phrasing is a bit odd. >> >> Just because a variable's address is taken, does not mean it cannot be >> put in a cpu register by the compiler. If the variable is not >> accessed in a way that actually requires putting it in memory, then >> the compiler can put it in a cpu register (or otherwise optimise it). >> So simply taking the address of a variable on IA-64 does not mean it >> cannot be in a register, and thus does not necessarily mean it cannot >> be NaT. Taking the address of a variable means the variable cannot be >> declared "register", but it does not mean it cannot be /in/ a >> register. > > Sure, any variable that's stored in memory can be mirrored by holding > its value in a register. > > int n = 42; // Assume n is assigned a memory address > printf("n+1=%d n+2=%d\n", n+1, n+2); > > A compiler could plausibly store the value of n in a register before > computing n+1, and then reuse the register value to compute n+2. Yes, of course. But there is also no necessity for variables to be in memory at all, or that there is any consistency there. "Assume n is assigned a memory address" is a completely unwarranted assumption for almost all local variables. It is only if the address is taken, and used in some way that is beyond the optimiser, that the variable actually has to go in a fixed place in memory. Otherwise optimisers can and do keep data in registers, or move them in and out of registers and different stack slots according to convenience for efficient code. uint32_t float_to_uint(float f) { uint32_t u; memcpy(&u, &f, 4); return u; } gcc compiles that to : float_to_uint: movd eax, xmm0 ret So even though the addresses of the variable "u" and the parameter "f" are taken, and converted to char pointers, and passed to a function with external linkage, nothing is actually put in memory at all. Thus the standard's wording as though the legality of using the "register" storage-class specifier corresponds to cpu register usage is, at best, wildly out of date. (And there are some architectures where the cpu registers are directly mapped to memory, and can be accessed as memory locations or registers.) > > My understanding is that IA-64 NaT (Not a Thing) representations > exist only for registers, and the NaT bit should be cleared when > a value is stored in the register. > > The odd wording in the standard allows an IA-64 C compiler to > take advantage of NaT representations for their intended purpose. > It might impose some minor constraints on what machine code can be > generated, but *most* of the cases where a NaT could be accessed > are undefined behavior in C. > I see that, but I believe it would be much simpler and clearer if attempting to read an uninitialised and unassigned local variable were undefined behaviour in every case. Alternatively, it could have said that the value is unspecified in every case. Then on the IA-64, the compiler would have to ensure that registers do not have their NaT bit set even if they are not initialised - this would not be a difficult task. Enabling use of the NaT bit for detection of bugs could then be a compiler option if implementations wanted to provide that feature. >> It seems very strange to me that this is UB: >> >> int foo1(void) { >> int x; >> >> return x; >> } >> >> while this is not : >> >> int foo2(void) { >> int x; >> >> int * p = &x; >> >> return x; >> } >> >> (Unfortunately, godbolt.org doesn't seem to have a gcc IA-64 compiler >> in its list.) >> >> It strikes me that it would have been far simpler for the standard >> simply to say that using the value of an uninitialised and unassigned >> variable is undefined behaviour. > > In C90, it was. C99 changed that, making the behavior defined if the > representation is not a trap representation. > > For C99, a conforming IA-64 C compiler would have had to go out of its > way to avoid accessing NaT representations. For example, if you wrote > > { > int n; > n; > } > > the most straightforward IA-64 code would store n in a register and > not initialize it, resulting in a trap when the register is read. > A compiler might have to generate code to store an arbitrary value > in the register to void the trap. > > I'm undecided on whether reading the value of an uninitialized > automatic object *should* be undefined behavior, but given that > it isn't, the C11 committee made the smallest possible change to > cater to IA-64 semantics. > IMHO, having it as UB is the best option, with unspecified behaviour as a second best option. The jumble that C11 has is not necessary for the IA-64, and clearly worse than the other two choices for architectures that don't have a NaT equivalent.