Deutsch English Français Italiano |
<87cyof14rd.fsf@nosuchdomain.example.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson <Keith.S.Thompson+u@gmail.com> Newsgroups: comp.lang.c Subject: Re: Hex string literals (was Re: C23 thoughts and opinions) Date: Mon, 17 Jun 2024 17:19:50 -0700 Organization: None to speak of Lines: 116 Message-ID: <87cyof14rd.fsf@nosuchdomain.example.com> References: <v2l828$18v7f$1@dont-email.me> <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com> <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com> <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me> <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me> <87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me> <87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me> <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me> <87v82b43h6.fsf@nosuchdomain.example.com> <87iky830v7.fsf_-_@nosuchdomain.example.com> <v4p0dv$jeb2$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Tue, 18 Jun 2024 02:19:54 +0200 (CEST) Injection-Info: dont-email.me; posting-host="aed299878570cb32e21d076f9aa05b90"; logging-data="1057279"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18RNf1vJcii1e4piesyPN36" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cancel-Lock: sha1:bGJt27ei+rTLRue1QEpO5AVSmTI= sha1:x9N5eQuUyPMrvRgE0Qh08cEnKzM= Bytes: 7139 David Brown <david.brown@hesbynett.no> writes: > On 17/06/2024 01:48, Keith Thompson wrote: [...] > For binary, > the compaction is irrelevant and indeed counter-productive - binary > literals became a lot more practical with the introduction of digit > separators. (For standard C, these are from C23, but for C++ they came > in C++14, and compilers have supported them as extensions in C.) I forgot about digit separators. C23 adds the option to use apostrophes as separators in numeric constants: 123'456'789 or 0xdead'beef, for example. (This is borrowed from C++. Commas are more commonly used in real life, at least in my experience, but that wouldn't work given the other meanings of commas.) I briefly considered that, for consistency, we might want to use apostrophes rather than spaces in hex string constants: 0x"de'ad'be'ef". But since digit separators are purely decorative, and spaces in my proposed hex string literals are semantically significant (they terminate a byte), I'll stick with spaces. You could even write 0x"0 0 0 0" to denote 4 zero bytes (where "0x0000" is 2 bytes) but 0x"00 00 00 00" or "0x00000000" is probably clearer. I think allowing both spaces and apostrophes would be too confusing. >> Octal >> string literals 0"012 345 670" *might* be worth considering. > > Most situations where octal could be useful died out many decades ago > - it is vastly more likely that "012" is intended to mean 12 than 10. > No serious programming language supports a leading 0 as an indication > of octal unless they are forced to do so by backwards compatibility, > and many that used to support them have dropped them. > > Having /some/ way to write octal can be helpful to old *nix > programmers who prefer 046 to "S_IRUSR | S_IWUSR | S_IRGRP" in their > chmod calls. (And to be fair, the constant names made in ancient > history with short identifier length limits are pretty ugly.) But it > is not something to be encouraged, and I think there is no simple > syntax that is obviously octal, and not easily mistaken for something > else. There is, the proposed "0o" prefix. It's already supported in both Perl and Python, and likely other languages. >> <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3193.htm> >> proposes a new "0o123" syntax for octal constants; if that's adopted, >> I propose allowing 0o"..." and *not" 0"...". I'm not sure whether >> to suggest hex only, or doing hex, octal, and binary for the sake >> of completeness. > > Binary support is useless, and octal support would be worse than > useless - even using an 0o rather than 0 prefix. Completeness is not > a justification for repeating old mistakes or complicating a good idea > with features that will never be used. I like binary integer constants (0b11001001), but I suppose I agree that they're not useful for larger chunks of data. I have no problem supporting only hex string literals, not binary or octal -- but I'd have no problem with having all three if anyone thinks that would be sufficiently useful. >> What I'm trying to design here is a more straightforward way to >> represent raw (unsigned char[]) data in C code, largely but not >> exclusively for use by #embed. > > Personally, I'd see it as useful when /not/ using #embed. I really do > not think programmers will care what format #embed uses. I don't > share your concerns about efficiency of implementation, or that > programmers need to know when it is efficient or not. In almost all > circumstances, C programmers never see or need to think about a > separation between a C preprocessor and a post-processed C compiler - > they are seen as a single entity, and can use whatever format is > convenient between them. And once you ignore the implementation > details, which are an SEP, the way #embed is defined is better than a > definition using these new hex blob strings. I think my main problem with the current #embed is that it's conceptually messy. I'm probably an outlier in how much I care about that. It's not clear whether the problems with the current definition of #embed are as serious as I suggest; you clearly think they aren't. But even if the current #embed is ok, I think adding hex string literals and adding a language defined embed parameter that specifies using hex string literals rather than a list of integer constant expressions would be useful. Among other things, it lets the programmer specify that a given #embed is only to be used to initialize an array of unsigned char. For example, given a 4-byte foo.dat containing bytes 1, 2, 3, and 4: const unsigned char buf[] = { #embed "foo.dat" }; would expand to something like: const unsigned char buf[] = { 1, 2, 3, 4 }; (and the same if buf is of type int[] or double[]), while this: const unsigned char buf[] = #embed "foo.dat" hex(true) // proposed new parameter ; would expand to something like: const unsigned char buf[] = 0x"01020304" ; (and would result in an error if buf is of type int[] or double[]). [...] -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com void Void(void) { Void(); } /* The recursive call of the void */