Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v4pd8t$m52o$1@dont-email.me>
Deutsch   English   Français   Italiano  
<v4pd8t$m52o$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!2.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: Hex string literals (was Re: C23 thoughts and opinions)
Date: Mon, 17 Jun 2024 14:21:32 +0100
Organization: A noiseless patient Spider
Lines: 108
Message-ID: <v4pd8t$m52o$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
 <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
 <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
 <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
 <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
 <87y18047jk.fsf@nosuchdomain.example.com>
 <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
 <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
 <87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me>
 <87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me>
 <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me>
 <87v82b43h6.fsf@nosuchdomain.example.com>
 <87iky830v7.fsf_-_@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 17 Jun 2024 15:21:33 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="10d436d755e4bfde8066d45503d65232";
	logging-data="726104"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/PgHG615ySukLL2pxPVzxP"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:mQhmTcP3pdlp3y1vDUQYeG6xaBk=
In-Reply-To: <87iky830v7.fsf_-_@nosuchdomain.example.com>
Content-Language: en-GB
Bytes: 6591

On 17/06/2024 00:48, Keith Thompson wrote:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
> [...]
>> uc"..." string literals might be made even simpler, for example allowing
>> only hex digits and not requiring \x (uc"01020304" rather than
>> uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals
>> could be useful in other contexts, and programmers will want
>> flexibility.  Maybe something like hex"01020304" (embedded spaces could
>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
> [...]
> 
> *If* hexadecimal string literals were to be added to a future version
> of the language, I think I have a syntax that I like better than
> what I suggested.
> 
> Inspired by the existing syntax for integer and floating-point
> hex constants, I propose using a "0x" prefix.  0x"deadbeef" is an
> expression of type `const unsigned char[4]` (assuming CHAR_BIT==8),
> with values 0xde, 0xad, 0xbe, 0xef in that order.  Byte order is
> irrelevant; we're specifying byte values in order, not bytes of
> the representation of some larger type.  memcpy()ing 0x"deadbeef"
> to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other
> more exotic possibilities).

Some points:

* Can the hex string span multiple lines? (You say space is the only
   white-space allowed)

* If not, would adjacent hex strings be concatenated, as happens with
   ordinary strings? Since hex data for one char array can be large.

* Your examples use only digits a-f but I assume A-F will work too.

* Can individual byte values end early, so allowing B to mean 0B? (My
   scheme requires hex digits to be in pairs.)


> Again, unlike other string literals, there is no implicit terminating
> null byte.  And I suggest making them const, since there's no
> existing code to break.
> 
> If CHAR_BIT==8, each byte is represented by two hex digits.  More
> generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
> the absence of whitespace.  Added whitespace marks the end of a byte,
> 0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8
> respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
> 0x"" is a syntax error, since C doesn't support zero-length arrays.
> Anything between the quotes other than hex digits and spaces is a
> syntax error.
> 
> 0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces the
> end of a byte, but the usage of spaces doesn't have to be consistent.

Here it gets confusing. But first, I understand that CHAR_BIT could be 
64, where hex literals get long enough that they could do with 
separators. But spaces now are significant in marking the early end of a 
64-bit value.

What I have in mind is that somebody might write 0x"12 34 56 78" to 
designate 4 8-bit values totalling 32 bits, and wants the spaces for 
readability. Compiled for a machine with 16-bit characters, it will now 
represent (in little-endian) the 64-bit value 0x0078005600340012 instead 
of 0x78563412.

I assume the hex string can only be used to initialise a char[] array? 
(The feature I presented elsewhere, 'data-strings', could be used to 
initialise any array type, just like #embed IIUC.)


> 
> This could be made more flexible by allowing various backslash
> escapes, but I'm not inclined to complicate it too much.
> 
> Note that the value of a (proposed) hex string literal is not a
> string unless it happens to end in zero.  I still use the term
> "string literal" because it's closely tied to existing string
> literal syntax, and existing string literals don't necessarily
> represent strings anyway ("embedded\0null\0characters").
> 
> Binary string literals 0b"11001001" might also be worth
> considering (that's of type `const unsigned char[1]`).

You mean, values that can only be one byte long? I don't get it. How 
many use-cases are there for char-arrays that are only a byte long?

Assuming that [1] was a typo for [], then I still have trouble finding 
uses for this.

Perhaps initialise a char[][] table representing a one-bit-per-pixel 
image? Bit-order becomes critical here.

Here, C already has 64-bit binary literals, using those might be a 
better idea, since a char[][] is the wrong type anyway, unless you can 
have bool[][] which is guaranteed to use 1-bit bools.

>  Octal
> string literals 0"012 345 670" *might* be worth considering.

AFAIK nobody uses octal anymore.


> What I'm trying to design here is a more straightforward way to
> represent raw (unsigned char[]) data in C code, largely but not
> exclusively for use by #embed.

Sorry, I thought this was an alternative to #embed, for smaller amounts 
of data directly written in source code.