Deutsch English Français Italiano |
<v4pd8t$m52o$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: bart <bc@freeuk.com> Newsgroups: comp.lang.c Subject: Re: Hex string literals (was Re: C23 thoughts and opinions) Date: Mon, 17 Jun 2024 14:21:32 +0100 Organization: A noiseless patient Spider Lines: 108 Message-ID: <v4pd8t$m52o$1@dont-email.me> References: <v2l828$18v7f$1@dont-email.me> <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com> <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com> <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me> <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me> <87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me> <87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me> <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me> <87v82b43h6.fsf@nosuchdomain.example.com> <87iky830v7.fsf_-_@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Mon, 17 Jun 2024 15:21:33 +0200 (CEST) Injection-Info: dont-email.me; posting-host="10d436d755e4bfde8066d45503d65232"; logging-data="726104"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/PgHG615ySukLL2pxPVzxP" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:mQhmTcP3pdlp3y1vDUQYeG6xaBk= In-Reply-To: <87iky830v7.fsf_-_@nosuchdomain.example.com> Content-Language: en-GB Bytes: 6591 On 17/06/2024 00:48, Keith Thompson wrote: > Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > [...] >> uc"..." string literals might be made even simpler, for example allowing >> only hex digits and not requiring \x (uc"01020304" rather than >> uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >> could be useful in other contexts, and programmers will want >> flexibility. Maybe something like hex"01020304" (embedded spaces could >> be ignored) could be defined in addition to uc"\x01\x02\x03\x04". > [...] > > *If* hexadecimal string literals were to be added to a future version > of the language, I think I have a syntax that I like better than > what I suggested. > > Inspired by the existing syntax for integer and floating-point > hex constants, I propose using a "0x" prefix. 0x"deadbeef" is an > expression of type `const unsigned char[4]` (assuming CHAR_BIT==8), > with values 0xde, 0xad, 0xbe, 0xef in that order. Byte order is > irrelevant; we're specifying byte values in order, not bytes of > the representation of some larger type. memcpy()ing 0x"deadbeef" > to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other > more exotic possibilities). Some points: * Can the hex string span multiple lines? (You say space is the only white-space allowed) * If not, would adjacent hex strings be concatenated, as happens with ordinary strings? Since hex data for one char array can be large. * Your examples use only digits a-f but I assume A-F will work too. * Can individual byte values end early, so allowing B to mean 0B? (My scheme requires hex digits to be in pairs.) > Again, unlike other string literals, there is no implicit terminating > null byte. And I suggest making them const, since there's no > existing code to break. > > If CHAR_BIT==8, each byte is represented by two hex digits. More > generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in > the absence of whitespace. Added whitespace marks the end of a byte, > 0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8 > respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT. > 0x"" is a syntax error, since C doesn't support zero-length arrays. > Anything between the quotes other than hex digits and spaces is a > syntax error. > > 0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces the > end of a byte, but the usage of spaces doesn't have to be consistent. Here it gets confusing. But first, I understand that CHAR_BIT could be 64, where hex literals get long enough that they could do with separators. But spaces now are significant in marking the early end of a 64-bit value. What I have in mind is that somebody might write 0x"12 34 56 78" to designate 4 8-bit values totalling 32 bits, and wants the spaces for readability. Compiled for a machine with 16-bit characters, it will now represent (in little-endian) the 64-bit value 0x0078005600340012 instead of 0x78563412. I assume the hex string can only be used to initialise a char[] array? (The feature I presented elsewhere, 'data-strings', could be used to initialise any array type, just like #embed IIUC.) > > This could be made more flexible by allowing various backslash > escapes, but I'm not inclined to complicate it too much. > > Note that the value of a (proposed) hex string literal is not a > string unless it happens to end in zero. I still use the term > "string literal" because it's closely tied to existing string > literal syntax, and existing string literals don't necessarily > represent strings anyway ("embedded\0null\0characters"). > > Binary string literals 0b"11001001" might also be worth > considering (that's of type `const unsigned char[1]`). You mean, values that can only be one byte long? I don't get it. How many use-cases are there for char-arrays that are only a byte long? Assuming that [1] was a typo for [], then I still have trouble finding uses for this. Perhaps initialise a char[][] table representing a one-bit-per-pixel image? Bit-order becomes critical here. Here, C already has 64-bit binary literals, using those might be a better idea, since a char[][] is the wrong type anyway, unless you can have bool[][] which is guaranteed to use 1-bit bools. > Octal > string literals 0"012 345 670" *might* be worth considering. AFAIK nobody uses octal anymore. > What I'm trying to design here is a more straightforward way to > represent raw (unsigned char[]) data in C code, largely but not > exclusively for use by #embed. Sorry, I thought this was an alternative to #embed, for smaller amounts of data directly written in source code.