Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown Newsgroups: comp.lang.c Subject: Re: C23 thoughts and opinions Date: Sat, 15 Jun 2024 19:17:23 +0200 Organization: A noiseless patient Spider Lines: 59 Message-ID: References: <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <87msoh5uh6.fsf@nosuchdomain.example.com> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <87ikz11osy.fsf@nosuchdomain.example.com> <87plt8yxgn.fsf@nosuchdomain.example.com> <87cyp6zsen.fsf@nosuchdomain.example.com> <874jahznzt.fsf@nosuchdomain.example.com> <87v82b43h6.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 15 Jun 2024 19:17:23 +0200 (CEST) Injection-Info: dont-email.me; posting-host="f678a482ffafce70c2ceef8ecfac3e10"; logging-data="3748828"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+tbLa3F3/a1zCceHa4ABVVKJJkA7hPFKA=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:gEFMS72RFKeaY+EsCJ6jQaJuLZA= Content-Language: en-GB In-Reply-To: Bytes: 4354 On 15/06/2024 00:39, bart wrote: > On 14/06/2024 22:30, Keith Thompson wrote: > >> Now that it's too late to change the definition, I've thought of >> something that I think would have been a better way to specify #embed. >> >> Define a new kind of string literal, with a "uc" prefix.  `uc"foo"` is >> of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if that's not >> too radical.)  Unlike other string literals, there is no implicit >> terminating '\0'.  Arbitrary byte values can of course be specified in >> hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null >> character and C doesn't support zero-sized objects, uc"" is a syntax >> error. >> >> uc"..." string literals might be made even simpler, for example allowing >> only hex digits and not requiring \x (uc"01020304" rather than >> uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals >> could be useful in other contexts, and programmers will want >> flexibility.  Maybe something like hex"01020304" (embedded spaces could >> be ignored) could be defined in addition to uc"\x01\x02\x03\x04". > > That's something I added to string literals in my language within the > last few months. Nothing do with embedding (but it can make hex > sequences within strings more efficient, if that approach was used). > > Writing byte-at-a-time hex data was always a bit fiddly: > >     0x12, 0x34, 0xAB, ... >     "\x12\x34\xAB... > > It was made worse by my preference for `x` being in lower case, and the > hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong. > > What I did was create a new, variable-lenghth string escape sequence > that looks like this: > >   "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq > > Hex digits after \h or \H are read in pairs. White space is allowed > between pairs: > >   "ABC\H 12 34 AB ...\nopq" > > The only thing I wasn't sure about was the closing backslash, which > looks at first like another escape code. But I think it is sound, > although it can still be tweaked. > > How often would something like that be useful? I would have thought that it is rare to see something that is basically text but has enough odd non-printing characters (other than the common \n, \t, \e) to make it worth the fuss. If you want to have binary data in something that looks like a string literal, then just use straight-up two hex digits per character - "4142431234ab". It's simpler to generate and parse. I don't see the benefit of something that mixes binary and text data.