Article <87cyof14rd.fsf@nosuchdomain.example.com>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <87cyof14rd.fsf@nosuchdomain.example.com>
Deutsch English Français Italiano
<87cyof14rd.fsf@nosuchdomain.example.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith Thompson <Keith.S.Thompson+u@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: Hex string literals (was Re: C23 thoughts and opinions)
Date: Mon, 17 Jun 2024 17:19:50 -0700
Organization: None to speak of
Lines: 116
Message-ID: <87cyof14rd.fsf@nosuchdomain.example.com>
References: <v2l828$18v7f$1@dont-email.me>
	<00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
	<v2lji1$1bbcp$1@dont-email.me>
	<87msoh5uh6.fsf@nosuchdomain.example.com>
	<f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
	<v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
	<87y18047jk.fsf@nosuchdomain.example.com>
	<87msoe1xxo.fsf@nosuchdomain.example.com>
	<v2sh19$2rle2$2@dont-email.me>
	<87ikz11osy.fsf@nosuchdomain.example.com>
	<v2v59g$3cr0f$1@dont-email.me>
	<87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me>
	<87cyp6zsen.fsf@nosuchdomain.example.com>
	<v34gi3$j385$1@dont-email.me>
	<874jahznzt.fsf@nosuchdomain.example.com>
	<v36nf9$12bei$1@dont-email.me>
	<87v82b43h6.fsf@nosuchdomain.example.com>
	<87iky830v7.fsf_-_@nosuchdomain.example.com>
	<v4p0dv$jeb2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Tue, 18 Jun 2024 02:19:54 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="aed299878570cb32e21d076f9aa05b90";
	logging-data="1057279"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18RNf1vJcii1e4piesyPN36"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:bGJt27ei+rTLRue1QEpO5AVSmTI=
	sha1:x9N5eQuUyPMrvRgE0Qh08cEnKzM=
Bytes: 7139

David Brown <david.brown@hesbynett.no> writes:
> On 17/06/2024 01:48, Keith Thompson wrote:
[...]
>                                                            For binary,
> the compaction is irrelevant and indeed counter-productive - binary
> literals became a lot more practical with the introduction of digit
> separators. (For standard C, these are from C23, but for C++ they came
> in C++14, and compilers have supported them as extensions in C.)

I forgot about digit separators.

C23 adds the option to use apostrophes as separators in numeric
constants: 123'456'789 or 0xdead'beef, for example.  (This is
borrowed from C++.  Commas are more commonly used in real life,
at least in my experience, but that wouldn't work given the other
meanings of commas.)

I briefly considered that, for consistency, we might want to
use apostrophes rather than spaces in hex string constants:
0x"de'ad'be'ef".  But since digit separators are purely decorative,
and spaces in my proposed hex string literals are semantically
significant (they terminate a byte), I'll stick with spaces.

You could even write 0x"0 0 0 0" to denote 4 zero bytes (where
"0x0000" is 2 bytes) but 0x"00 00 00 00" or "0x00000000" is probably
clearer.

I think allowing both spaces and apostrophes would be too confusing.

>> Octal
>> string literals 0"012 345 670" *might* be worth considering.
>
> Most situations where octal could be useful died out many decades ago
> - it is vastly more likely that "012" is intended to mean 12 than 10.
> No serious programming language supports a leading 0 as an indication
> of octal unless they are forced to do so by backwards compatibility,
> and many that used to support them have dropped them.
> 
> Having /some/ way to write octal can be helpful to old *nix
> programmers who prefer 046 to "S_IRUSR | S_IWUSR | S_IRGRP" in their
> chmod calls. (And to be fair, the constant names made in ancient
> history with short identifier length limits are pretty ugly.)  But it
> is not something to be encouraged, and I think there is no simple
> syntax that is obviously octal, and not easily mistaken for something
> else.

There is, the proposed "0o" prefix.  It's already supported in both Perl
and Python, and likely other languages.

>> <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3193.htm>
>> proposes a new "0o123" syntax for octal constants; if that's adopted,
>> I propose allowing 0o"..." and *not" 0"...".  I'm not sure whether
>> to suggest hex only, or doing hex, octal, and binary for the sake
>> of completeness.
>
> Binary support is useless, and octal support would be worse than
> useless - even using an 0o rather than 0 prefix.  Completeness is not
> a justification for repeating old mistakes or complicating a good idea 
> with features that will never be used.

I like binary integer constants (0b11001001), but I suppose I
agree that they're not useful for larger chunks of data.  I have no
problem supporting only hex string literals, not binary or octal --
but I'd have no problem with having all three if anyone thinks that
would be sufficiently useful.

>> What I'm trying to design here is a more straightforward way to
>> represent raw (unsigned char[]) data in C code, largely but not
>> exclusively for use by #embed.
>
> Personally, I'd see it as useful when /not/ using #embed.  I really do
> not think programmers will care what format #embed uses.  I don't
> share your concerns about efficiency of implementation, or that
> programmers need to know when it is efficient or not.  In almost all
> circumstances, C programmers never see or need to think about a
> separation between a C preprocessor and a post-processed C compiler -
> they are seen as a single entity, and can use whatever format is
> convenient between them.  And once you ignore the implementation
> details, which are an SEP, the way #embed is defined is better than a
> definition using these new hex blob strings.

I think my main problem with the current #embed is that it's
conceptually messy.  I'm probably an outlier in how much I care about
that.

It's not clear whether the problems with the current definition of
#embed are as serious as I suggest; you clearly think they aren't.  But
even if the current #embed is ok, I think adding hex string literals and
adding a language defined embed parameter that specifies using hex
string literals rather than a list of integer constant expressions would
be useful.  Among other things, it lets the programmer specify that a
given #embed is only to be used to initialize an array of unsigned char.

For example, given a 4-byte foo.dat containing bytes 1, 2, 3, and 4:
    const unsigned char buf[] = {
        #embed "foo.dat"
    };
would expand to something like:
    const unsigned char buf[] = {
        1, 2, 3, 4
    };
(and the same if buf is of type int[] or double[]), while this:
    const unsigned char buf[] =
        #embed "foo.dat" hex(true) // proposed new parameter
    ;
would expand to something like:
    const unsigned char buf[] =
        0x"01020304"
    ;
(and would result in an error if buf is of type int[] or double[]).

[...]

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */