Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connectionsPath: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown Newsgroups: comp.lang.c Subject: Re: Hex string literals (was Re: C23 thoughts and opinions) Date: Tue, 18 Jun 2024 15:54:15 +0200 Organization: A noiseless patient Spider Lines: 173 Message-ID: References: <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <87msoh5uh6.fsf@nosuchdomain.example.com> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <87ikz11osy.fsf@nosuchdomain.example.com> <87plt8yxgn.fsf@nosuchdomain.example.com> <87cyp6zsen.fsf@nosuchdomain.example.com> <874jahznzt.fsf@nosuchdomain.example.com> <87v82b43h6.fsf@nosuchdomain.example.com> <87iky830v7.fsf_-_@nosuchdomain.example.com> <87cyof14rd.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Tue, 18 Jun 2024 15:54:16 +0200 (CEST) Injection-Info: dont-email.me; posting-host="000ac22a82b477e7b73d30c4bbbc814d"; logging-data="1461691"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18LuLvF6f8d8XEVGEfN4dzXL9u2iS7BylU=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cancel-Lock: sha1:/QDyZBvOWtwLiIDlKzv0xDCSLDk= In-Reply-To: <87cyof14rd.fsf@nosuchdomain.example.com> Content-Language: en-GB Bytes: 9554 On 18/06/2024 02:19, Keith Thompson wrote: > David Brown writes: >> On 17/06/2024 01:48, Keith Thompson wrote: > [...] >> For binary, >> the compaction is irrelevant and indeed counter-productive - binary >> literals became a lot more practical with the introduction of digit >> separators. (For standard C, these are from C23, but for C++ they came >> in C++14, and compilers have supported them as extensions in C.) > > I forgot about digit separators. > > C23 adds the option to use apostrophes as separators in numeric > constants: 123'456'789 or 0xdead'beef, for example. (This is > borrowed from C++. Commas are more commonly used in real life, > at least in my experience, but that wouldn't work given the other > meanings of commas.) Commas would be entirely unsuitable here, since half the world uses decimal commas rather than decimal points. I think underscores are a nicer choice, used by many languages, but C++ could not use underscores due to their use in user-defined literals, and C followed C++. > > I briefly considered that, for consistency, we might want to > use apostrophes rather than spaces in hex string constants: > 0x"de'ad'be'ef". But since digit separators are purely decorative, > and spaces in my proposed hex string literals are semantically > significant (they terminate a byte), I'll stick with spaces. I think you were using spaces as byte separators, whereas apostrophes should be completely ignored when parsing. > > You could even write 0x"0 0 0 0" to denote 4 zero bytes (where > "0x0000" is 2 bytes) but 0x"00 00 00 00" or "0x00000000" is probably > clearer. > > I think allowing both spaces and apostrophes would be too confusing. > Fair enough. >>> Octal >>> string literals 0"012 345 670" *might* be worth considering. >> >> Most situations where octal could be useful died out many decades ago >> - it is vastly more likely that "012" is intended to mean 12 than 10. >> No serious programming language supports a leading 0 as an indication >> of octal unless they are forced to do so by backwards compatibility, >> and many that used to support them have dropped them. >> >> Having /some/ way to write octal can be helpful to old *nix >> programmers who prefer 046 to "S_IRUSR | S_IWUSR | S_IRGRP" in their >> chmod calls. (And to be fair, the constant names made in ancient >> history with short identifier length limits are pretty ugly.) But it >> is not something to be encouraged, and I think there is no simple >> syntax that is obviously octal, and not easily mistaken for something >> else. > > There is, the proposed "0o" prefix. It's already supported in both Perl > and Python, and likely other languages. Some languages apparently use 0q, because 0o might be confusing in some fonts. I'm not sure I agree, and 0q is not very intuitive. I'd rate 0o as vastly better than 0, but I would not bother with supporting it in a new feature like this. > >>> >>> proposes a new "0o123" syntax for octal constants; if that's adopted, >>> I propose allowing 0o"..." and *not" 0"...". I'm not sure whether >>> to suggest hex only, or doing hex, octal, and binary for the sake >>> of completeness. >> >> Binary support is useless, and octal support would be worse than >> useless - even using an 0o rather than 0 prefix. Completeness is not >> a justification for repeating old mistakes or complicating a good idea >> with features that will never be used. > > I like binary integer constants (0b11001001), but I suppose I > agree that they're not useful for larger chunks of data. Perhaps I am so used to binary and hex that I convert without thinking, and thus rarely need binary. The one place I find binary useful is for bitmap fonts. I use these a lot less than I used to, but sometimes you need to make new characters for an old-style low resolution LCD screen, and then binary constants can be useful. Often, however, I prefer characters like . and @ rather than 0 and 1 as it makes the contrast much higher. > I have no > problem supporting only hex string literals, not binary or octal -- > but I'd have no problem with having all three if anyone thinks that > would be sufficiently useful. > Fair enough. >>> What I'm trying to design here is a more straightforward way to >>> represent raw (unsigned char[]) data in C code, largely but not >>> exclusively for use by #embed. >> >> Personally, I'd see it as useful when /not/ using #embed. I really do >> not think programmers will care what format #embed uses. I don't >> share your concerns about efficiency of implementation, or that >> programmers need to know when it is efficient or not. In almost all >> circumstances, C programmers never see or need to think about a >> separation between a C preprocessor and a post-processed C compiler - >> they are seen as a single entity, and can use whatever format is >> convenient between them. And once you ignore the implementation >> details, which are an SEP, the way #embed is defined is better than a >> definition using these new hex blob strings. > > I think my main problem with the current #embed is that it's > conceptually messy. I'm probably an outlier in how much I care about > that. > > It's not clear whether the problems with the current definition of > #embed are as serious as I suggest; you clearly think they aren't. I am still not convinced that there /are/ problems, never mind serious problems, nor that it it is "conceptually messy". (I'd care about that too, at least to some extent.) I don't think the feature will lead to any dramatic changes in the way I work, but it could sometimes be convenient and avoid the need of external scripts or programs in a build file. > But > even if the current #embed is ok, I think adding hex string literals and > adding a language defined embed parameter that specifies using hex > string literals rather than a list of integer constant expressions would > be useful. Agreed. > Among other things, it lets the programmer specify that a > given #embed is only to be used to initialize an array of unsigned char. > > For example, given a 4-byte foo.dat containing bytes 1, 2, 3, and 4: > const unsigned char buf[] = { > #embed "foo.dat" > }; > would expand to something like: > const unsigned char buf[] = { > 1, 2, 3, 4 > }; > (and the same if buf is of type int[] or double[]), while this: > const unsigned char buf[] = > #embed "foo.dat" hex(true) // proposed new parameter > ; > would expand to something like: > const unsigned char buf[] = > 0x"01020304" > ; > (and would result in an error if buf is of type int[] or double[]). > > [...] > I don't see the benefit here. This is C - the programmer is expected to get the type right, and I think it would be rare to get it wrong (or worse wrong than forgetting "unsigned") in a case like this. So the extra type checking here has little or no benefit. (In general, I am a ========== REMAINDER OF ARTICLE TRUNCATED ==========