Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Hex string literals (was Re: C23 thoughts and opinions) Date: Sun, 16 Jun 2024 16:48:44 -0700 Organization: None to speak of Lines: 64 Message-ID: <87iky830v7.fsf_-_@nosuchdomain.example.com> References: <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <87msoh5uh6.fsf@nosuchdomain.example.com> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <87ikz11osy.fsf@nosuchdomain.example.com> <87plt8yxgn.fsf@nosuchdomain.example.com> <87cyp6zsen.fsf@nosuchdomain.example.com> <874jahznzt.fsf@nosuchdomain.example.com> <87v82b43h6.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Mon, 17 Jun 2024 01:48:48 +0200 (CEST) Injection-Info: dont-email.me; posting-host="a34467dd4a85130cbe55ab07dc34e4ee"; logging-data="313326"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+pe38JOsbU7I3yX+ai7Cpx" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cancel-Lock: sha1:Bvyowrl8xEmCiWpcbsW/d7Qls5E= sha1:56aBvr0oo2hok6uPccit6AXOl10= Bytes: 4818 Keith Thompson writes: [...] > uc"..." string literals might be made even simpler, for example allowing > only hex digits and not requiring \x (uc"01020304" rather than > uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals > could be useful in other contexts, and programmers will want > flexibility. Maybe something like hex"01020304" (embedded spaces could > be ignored) could be defined in addition to uc"\x01\x02\x03\x04". [...] *If* hexadecimal string literals were to be added to a future version of the language, I think I have a syntax that I like better than what I suggested. Inspired by the existing syntax for integer and floating-point hex constants, I propose using a "0x" prefix. 0x"deadbeef" is an expression of type `const unsigned char[4]` (assuming CHAR_BIT==8), with values 0xde, 0xad, 0xbe, 0xef in that order. Byte order is irrelevant; we're specifying byte values in order, not bytes of the representation of some larger type. memcpy()ing 0x"deadbeef" to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other more exotic possibilities). Again, unlike other string literals, there is no implicit terminating null byte. And I suggest making them const, since there's no existing code to break. If CHAR_BIT==8, each byte is represented by two hex digits. More generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in the absence of whitespace. Added whitespace marks the end of a byte, 0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8 respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT. 0x"" is a syntax error, since C doesn't support zero-length arrays. Anything between the quotes other than hex digits and spaces is a syntax error. 0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces the end of a byte, but the usage of spaces doesn't have to be consistent. This could be made more flexible by allowing various backslash escapes, but I'm not inclined to complicate it too much. Note that the value of a (proposed) hex string literal is not a string unless it happens to end in zero. I still use the term "string literal" because it's closely tied to existing string literal syntax, and existing string literals don't necessarily represent strings anyway ("embedded\0null\0characters"). Binary string literals 0b"11001001" might also be worth considering (that's of type `const unsigned char[1]`). Octal string literals 0"012 345 670" *might* be worth considering. proposes a new "0o123" syntax for octal constants; if that's adopted, I propose allowing 0o"..." and *not" 0"...". I'm not sure whether to suggest hex only, or doing hex, octal, and binary for the sake of completeness. What I'm trying to design here is a more straightforward way to represent raw (unsigned char[]) data in C code, largely but not exclusively for use by #embed. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com void Void(void) { Void(); } /* The recursive call of the void */