Deutsch English Français Italiano |
<v4g5pm$2hsvk$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: bart <bc@freeuk.com> Newsgroups: comp.lang.c Subject: Re: "undefined behavior"? Date: Fri, 14 Jun 2024 02:18:45 +0100 Organization: A noiseless patient Spider Lines: 143 Message-ID: <v4g5pm$2hsvk$1@dont-email.me> References: <666a095a$0$952$882e4bbb@reader.netnews.com> <8t3k6j5ikf5mvimvksv2t91gbt11ljdfgb@4ax.com> <666a18de$0$958$882e4bbb@reader.netnews.com> <87y1796bfn.fsf@nosuchdomain.example.com> <666a2a30$0$952$882e4bbb@reader.netnews.com> <87tthx65qu.fsf@nosuchdomain.example.com> <v4dtlt$23m6i$1@dont-email.me> <NoEaO.2646$J8n7.2264@fx12.iad> <v4fc5j$2cksu$1@dont-email.me> <87le385u1s.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 14 Jun 2024 03:18:46 +0200 (CEST) Injection-Info: dont-email.me; posting-host="3cbeaf55936e5f695a7fb47506a986f6"; logging-data="2683892"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+0xPKkFSj/hxMeMeGTCZgd" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:zAvDglu6l1gbl3oledsESf+YMZo= In-Reply-To: <87le385u1s.fsf@nosuchdomain.example.com> Content-Language: en-GB Bytes: 6864 On 13/06/2024 23:58, Keith Thompson wrote: > bart <bc@freeuk.com> writes: >> Meanwhile for over 4 decades I've been able to just write 'print foo' >> with no format mismatch, because such a silly concept doesn't exist. >> THAT's how you deal with it. > > By using a different language, which perhaps you should consider > discussing in a different newsgroup. We discuss C here. That was my point about the 3 decades it took to do something about it. In the end nothing really changed. > If foo is an int, for example, printf lets you decide how to print > it (leading zeros or spaces, decimal vs. hex vs. octal (or binary > in C23), upper vs. lower case for hex). Perhaps "print foo" in > your language has similar features. The format string specified two things. One is to do with the type of an expression, which the compiler knows. After all that's how sometimes it can tell you you've got it wrong. And if it can do that, it could also put in the format for you. > Yes, the fact that incorrect printf format strings cause undefined > behavior, and that that's sometimes difficult to diagnose, is a > language problem. I don't recall anyone saying it isn't. But it's > really not that hard to deal with it as a programmer. > > If you have ideas (other than abandoning C) for a flexible > type-safe printing function, by all means share them. What are your > suggestions? A few years ago I played with a "%?" format code in my 'bcc' compiler and demonstrated it here. The ? gets replaced by some suitable format code. This is done within the compiler, not the printf library. For other display control, such as hex output, or to provide other info such as width, that still needs to be provided as it is done now. This would cover most of my points except variable format strings, which you said were not worth worrying about. Here is a demo: -------------------------- #include <stdio.h> #include <stdint.h> #include <time.h> int main(void) { uint64_t a = 0xFFFFFFFF00000000; float b = 1.46; int c = -67; char* d = "Hello"; int* e = &c; for (int i=0; i<100000000; ++i); clock_t f = clock(); printf("%=? %=? %=? %=? %=? %=?\n", a, b, c, d, e, f); printf("%=? %=? %=? %=? %=? %=?\n", f, e, d, c, b, a); } -------------------------- This prints 6 variables of diverse types with a suitable default format. Then it prints then in reverse order, without having to change those format codes. The '=' is an extra feature which displays the name of the argument. The output from this was: A=18446744069414584320 B=1.460000 C=-67 D=Hello E=000000000080FF08 F=219 F=219 E=000000000080FF08 D=Hello C=-67 B=1.460000 A=18446744069414584320 It's not quite as good as my language where it's just: println =a, =b, =c, =d, =d, =f but I think it was an interesting experiment. This required 50 lines of code within my C compiler; a bit more for a full treatment. Adding `print` as a new keyword so you can use `print > foo` is unlikely to be considered practical; I'd want a much more > general mechanism that's not limited to stdio files. Reasonable new > language features that enable type-safe printf-like functions could > be interesting. I'm not aware of any such proposals for C. > >>>> We just can't have size_t variables swilling around in prgrams for these >>>> reasons. >>> POSIX defines a set of strings that can be used by a programmer to >>> specify the format string for size_t on any given implementation. >> >> And here it just gets even uglier. You also get situations like this: >> >> uint64_t i=0; >> printf("%lld\n", i); >> >> This compiles OK with gcc -Wall, on Windows64. But compile under >> Linux64 and it complains the format should be %ld. Change it to %ld, >> and it complains under Windows. >> >> It can't tell you that you should be using one of those ludicrous macros. > > And you know why, right? uint64_t is a typedef (an alias) for some > existing type, typically either unsigned long or unsigned long long. > If uint64_t is a typedef for unsigned long long, then i is of type > unsigned long long, and the format string is correct. > > Sure, that's a language problem. It's unfortunate that code can be > either valid or a constraint violation depending on how the current > implementation defines uint64_t. I just don't spend much time > complaining about it. > > I wouldn't mind seeing a new kind of typedef that creates a new type > rather than an alias. Then uint64_t could be a distinct type. > That could cause some problems for _Generic, for example. > > C99 added <stdint.h>, defining fixed-width and other integer types using > existing language features. Sure, there are some disadvantages in the > way it was done. The alternative, creating new language features, would > likely have resulted in the proposal not being accepted until some time > after C99, if ever. > >> I've also just noticed that 'i' is unsigned but the format calls for >> signed. That may or may not be deliberate, but the compiler didn't say >> anything. > > The standard allows using an argument of an integer type with a format > of the corresponding type of the other signedness, as long as the value > is in the range of both. (I vaguely recall the standard's wording being > a bit vague on this point.) >