Path: ...!3.eu.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: "undefined behavior"? Date: Thu, 13 Jun 2024 15:58:39 -0700 Organization: None to speak of Lines: 144 Message-ID: <87le385u1s.fsf@nosuchdomain.example.com> References: <666a095a$0$952$882e4bbb@reader.netnews.com> <8t3k6j5ikf5mvimvksv2t91gbt11ljdfgb@4ax.com> <666a18de$0$958$882e4bbb@reader.netnews.com> <87y1796bfn.fsf@nosuchdomain.example.com> <666a2a30$0$952$882e4bbb@reader.netnews.com> <87tthx65qu.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Fri, 14 Jun 2024 00:58:43 +0200 (CEST) Injection-Info: dont-email.me; posting-host="10f324c947246626491173dedfdc5917"; logging-data="2629610"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/yD97Z79cBh9fMQxTYpur6" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cancel-Lock: sha1:vRbWgXYO7CCpGveJhoWLyIQGUJY= sha1:c2wIwL6dLGu66fRJ1zUxv3nS11Q= Bytes: 7595 bart writes: > On 13/06/2024 16:39, Scott Lurndal wrote: >> Malcolm McLean writes: >>> On 13/06/2024 01:33, Keith Thompson wrote: >>>> >>>> printf is a variadic function, so the types of the arguments after >>>> the format string are not specified in its declaration. The printf >>>> function has to *assume* that arguments have the types specified >>>> by the format string. This: >>>> printf("%d\n", foo); >>>> (probably) has undefined behavior if foo is of type size_t. >>>> >>> And isn't that a nightmare? >> No, because compilers have been able to diagnose mismatches >> for more than two decades. > > What about the previous 3 decades? They're over. Sheesh. > What about the compilers that can't do that? Use a compiler that can. If you're using a compiler for production that can't produce the warnings, you can even use a different compiler just to get the warnings (if your code isn't too dependent on the compiler you're using). Code reviews also help. > What about even the latest gcc 14.1 that won't diagnose it even with > -Wpedantic -Wextra? I don't know. The default gcc on my system diagnoses it by default, but various versions of gcc I've built from source do not. Perhaps Ubuntu configures gcc differently. (Ubuntu 22.04.4, gcc 11.4.0.) I'm building gcc 11.4.0 from source, and I'll compare its behavior to that of Ubuntu's gcc 11.4.0-1ubuntu1~22.04. The "-pedantic" option enables diagnostics that are required by the C standard. I wouldn't expect it to enable optional warnings like those for format strings. The "-Wextra" option enabled additional warnings that are not enabled by "-Wall". Common usage if you want a lot of warnings is "-Wall -Wextra". It doesn't make much sense to use "-Wextra" by itself. You've used an unusual set of options that avoid enabling format string warnings. Format string warnings are enabled by "-Wformat", which is included in "-Wall". On serious projects, gcc is rarely invoked with default options. If you don't like the default settings, I'm likely to agree with you, but specifying the options you want is a lot more effective than complaining. But the mechanism for enabling the warning, and whether it's enabled by default, is a gcc issue, not a C issue. > What about when the format string is a variable? Then the compiler probably won't be able to diagnose it. (How often do you use a variable format string?) > What about the example given below? > > It is definitely a language problem. Dealing with some of it with some > compilers with some options isn't a solution, it's just a workaround. Because C doesn't have the language features necessary for the library to provide something as flexible as printf with more type safety. > Meanwhile for over 4 decades I've been able to just write 'print foo' > with no format mismatch, because such a silly concept doesn't exist. > THAT's how you deal with it. By using a different language, which perhaps you should consider discussing in a different newsgroup. We discuss C here. If foo is an int, for example, printf lets you decide how to print it (leading zeros or spaces, decimal vs. hex vs. octal (or binary in C23), upper vs. lower case for hex). Perhaps "print foo" in your language has similar features. Yes, the fact that incorrect printf format strings cause undefined behavior, and that that's sometimes difficult to diagnose, is a language problem. I don't recall anyone saying it isn't. But it's really not that hard to deal with it as a programmer. If you have ideas (other than abandoning C) for a flexible type-safe printing function, by all means share them. What are your suggestions? Adding `print` as a new keyword so you can use `print foo` is unlikely to be considered practical; I'd want a much more general mechanism that's not limited to stdio files. Reasonable new language features that enable type-safe printf-like functions could be interesting. I'm not aware of any such proposals for C. >>> We just can't have size_t variables swilling around in prgrams for these >>> reasons. >> POSIX defines a set of strings that can be used by a programmer to >> specify the format string for size_t on any given implementation. > > And here it just gets even uglier. You also get situations like this: > > uint64_t i=0; > printf("%lld\n", i); > > This compiles OK with gcc -Wall, on Windows64. But compile under > Linux64 and it complains the format should be %ld. Change it to %ld, > and it complains under Windows. > > It can't tell you that you should be using one of those ludicrous macros. And you know why, right? uint64_t is a typedef (an alias) for some existing type, typically either unsigned long or unsigned long long. If uint64_t is a typedef for unsigned long long, then i is of type unsigned long long, and the format string is correct. Sure, that's a language problem. It's unfortunate that code can be either valid or a constraint violation depending on how the current implementation defines uint64_t. I just don't spend much time complaining about it. I wouldn't mind seeing a new kind of typedef that creates a new type rather than an alias. Then uint64_t could be a distinct type. That could cause some problems for _Generic, for example. C99 added , defining fixed-width and other integer types using existing language features. Sure, there are some disadvantages in the way it was done. The alternative, creating new language features, would likely have resulted in the proposal not being accepted until some time after C99, if ever. > I've also just noticed that 'i' is unsigned but the format calls for > signed. That may or may not be deliberate, but the compiler didn't say > anything. The standard allows using an argument of an integer type with a format of the corresponding type of the other signedness, as long as the value is in the range of both. (I vaguely recall the standard's wording being a bit vague on this point.) -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com void Void(void) { Void(); } /* The recursive call of the void */