| Deutsch English Français Italiano |
|
<v4s3i8$1cjdr$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: Hex string literals (was Re: C23 thoughts and opinions)
Date: Tue, 18 Jun 2024 15:54:15 +0200
Organization: A noiseless patient Spider
Lines: 173
Message-ID: <v4s3i8$1cjdr$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
<00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
<v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
<f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
<v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
<87y18047jk.fsf@nosuchdomain.example.com>
<87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
<87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
<87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me>
<87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me>
<874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me>
<87v82b43h6.fsf@nosuchdomain.example.com>
<87iky830v7.fsf_-_@nosuchdomain.example.com> <v4p0dv$jeb2$1@dont-email.me>
<87cyof14rd.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 18 Jun 2024 15:54:16 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="000ac22a82b477e7b73d30c4bbbc814d";
logging-data="1461691"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18LuLvF6f8d8XEVGEfN4dzXL9u2iS7BylU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:/QDyZBvOWtwLiIDlKzv0xDCSLDk=
In-Reply-To: <87cyof14rd.fsf@nosuchdomain.example.com>
Content-Language: en-GB
Bytes: 9554
On 18/06/2024 02:19, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 17/06/2024 01:48, Keith Thompson wrote:
> [...]
>> For binary,
>> the compaction is irrelevant and indeed counter-productive - binary
>> literals became a lot more practical with the introduction of digit
>> separators. (For standard C, these are from C23, but for C++ they came
>> in C++14, and compilers have supported them as extensions in C.)
>
> I forgot about digit separators.
>
> C23 adds the option to use apostrophes as separators in numeric
> constants: 123'456'789 or 0xdead'beef, for example. (This is
> borrowed from C++. Commas are more commonly used in real life,
> at least in my experience, but that wouldn't work given the other
> meanings of commas.)
Commas would be entirely unsuitable here, since half the world uses
decimal commas rather than decimal points. I think underscores are a
nicer choice, used by many languages, but C++ could not use underscores
due to their use in user-defined literals, and C followed C++.
>
> I briefly considered that, for consistency, we might want to
> use apostrophes rather than spaces in hex string constants:
> 0x"de'ad'be'ef". But since digit separators are purely decorative,
> and spaces in my proposed hex string literals are semantically
> significant (they terminate a byte), I'll stick with spaces.
I think you were using spaces as byte separators, whereas apostrophes
should be completely ignored when parsing.
>
> You could even write 0x"0 0 0 0" to denote 4 zero bytes (where
> "0x0000" is 2 bytes) but 0x"00 00 00 00" or "0x00000000" is probably
> clearer.
>
> I think allowing both spaces and apostrophes would be too confusing.
>
Fair enough.
>>> Octal
>>> string literals 0"012 345 670" *might* be worth considering.
>>
>> Most situations where octal could be useful died out many decades ago
>> - it is vastly more likely that "012" is intended to mean 12 than 10.
>> No serious programming language supports a leading 0 as an indication
>> of octal unless they are forced to do so by backwards compatibility,
>> and many that used to support them have dropped them.
>>
>> Having /some/ way to write octal can be helpful to old *nix
>> programmers who prefer 046 to "S_IRUSR | S_IWUSR | S_IRGRP" in their
>> chmod calls. (And to be fair, the constant names made in ancient
>> history with short identifier length limits are pretty ugly.) But it
>> is not something to be encouraged, and I think there is no simple
>> syntax that is obviously octal, and not easily mistaken for something
>> else.
>
> There is, the proposed "0o" prefix. It's already supported in both Perl
> and Python, and likely other languages.
Some languages apparently use 0q, because 0o might be confusing in some
fonts. I'm not sure I agree, and 0q is not very intuitive. I'd rate 0o
as vastly better than 0, but I would not bother with supporting it in a
new feature like this.
>
>>> <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3193.htm>
>>> proposes a new "0o123" syntax for octal constants; if that's adopted,
>>> I propose allowing 0o"..." and *not" 0"...". I'm not sure whether
>>> to suggest hex only, or doing hex, octal, and binary for the sake
>>> of completeness.
>>
>> Binary support is useless, and octal support would be worse than
>> useless - even using an 0o rather than 0 prefix. Completeness is not
>> a justification for repeating old mistakes or complicating a good idea
>> with features that will never be used.
>
> I like binary integer constants (0b11001001), but I suppose I
> agree that they're not useful for larger chunks of data.
Perhaps I am so used to binary and hex that I convert without thinking,
and thus rarely need binary.
The one place I find binary useful is for bitmap fonts. I use these a
lot less than I used to, but sometimes you need to make new characters
for an old-style low resolution LCD screen, and then binary constants
can be useful. Often, however, I prefer characters like . and @ rather
than 0 and 1 as it makes the contrast much higher.
> I have no
> problem supporting only hex string literals, not binary or octal --
> but I'd have no problem with having all three if anyone thinks that
> would be sufficiently useful.
>
Fair enough.
>>> What I'm trying to design here is a more straightforward way to
>>> represent raw (unsigned char[]) data in C code, largely but not
>>> exclusively for use by #embed.
>>
>> Personally, I'd see it as useful when /not/ using #embed. I really do
>> not think programmers will care what format #embed uses. I don't
>> share your concerns about efficiency of implementation, or that
>> programmers need to know when it is efficient or not. In almost all
>> circumstances, C programmers never see or need to think about a
>> separation between a C preprocessor and a post-processed C compiler -
>> they are seen as a single entity, and can use whatever format is
>> convenient between them. And once you ignore the implementation
>> details, which are an SEP, the way #embed is defined is better than a
>> definition using these new hex blob strings.
>
> I think my main problem with the current #embed is that it's
> conceptually messy. I'm probably an outlier in how much I care about
> that.
>
> It's not clear whether the problems with the current definition of
> #embed are as serious as I suggest; you clearly think they aren't.
I am still not convinced that there /are/ problems, never mind serious
problems, nor that it it is "conceptually messy". (I'd care about that
too, at least to some extent.) I don't think the feature will lead to
any dramatic changes in the way I work, but it could sometimes be
convenient and avoid the need of external scripts or programs in a build
file.
> But
> even if the current #embed is ok, I think adding hex string literals and
> adding a language defined embed parameter that specifies using hex
> string literals rather than a list of integer constant expressions would
> be useful.
Agreed.
> Among other things, it lets the programmer specify that a
> given #embed is only to be used to initialize an array of unsigned char.
>
> For example, given a 4-byte foo.dat containing bytes 1, 2, 3, and 4:
> const unsigned char buf[] = {
> #embed "foo.dat"
> };
> would expand to something like:
> const unsigned char buf[] = {
> 1, 2, 3, 4
> };
> (and the same if buf is of type int[] or double[]), while this:
> const unsigned char buf[] =
> #embed "foo.dat" hex(true) // proposed new parameter
> ;
> would expand to something like:
> const unsigned char buf[] =
> 0x"01020304"
> ;
> (and would result in an error if buf is of type int[] or double[]).
>
> [...]
>
I don't see the benefit here. This is C - the programmer is expected to
get the type right, and I think it would be rare to get it wrong (or
worse wrong than forgetting "unsigned") in a case like this. So the
extra type checking here has little or no benefit. (In general, I am a
========== REMAINDER OF ARTICLE TRUNCATED ==========