Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: C23 thoughts and opinions
Date: Sat, 15 Jun 2024 19:17:23 +0200
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <v4kib3$3icus$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
 <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
 <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
 <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
 <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
 <87y18047jk.fsf@nosuchdomain.example.com>
 <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
 <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
 <87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me>
 <87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me>
 <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me>
 <87v82b43h6.fsf@nosuchdomain.example.com> <v4igql$32qts$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 15 Jun 2024 19:17:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f678a482ffafce70c2ceef8ecfac3e10";
	logging-data="3748828"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+tbLa3F3/a1zCceHa4ABVVKJJkA7hPFKA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:gEFMS72RFKeaY+EsCJ6jQaJuLZA=
Content-Language: en-GB
In-Reply-To: <v4igql$32qts$1@dont-email.me>
Bytes: 4354

On 15/06/2024 00:39, bart wrote:
> On 14/06/2024 22:30, Keith Thompson wrote:
> 
>> Now that it's too late to change the definition, I've thought of
>> something that I think would have been a better way to specify #embed.
>>
>> Define a new kind of string literal, with a "uc" prefix.  `uc"foo"` is
>> of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if that's not
>> too radical.)  Unlike other string literals, there is no implicit
>> terminating '\0'.  Arbitrary byte values can of course be specified in
>> hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null
>> character and C doesn't support zero-sized objects, uc"" is a syntax
>> error.
>>
>> uc"..." string literals might be made even simpler, for example allowing
>> only hex digits and not requiring \x (uc"01020304" rather than
>> uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals
>> could be useful in other contexts, and programmers will want
>> flexibility.  Maybe something like hex"01020304" (embedded spaces could
>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
> 
> That's something I added to string literals in my language within the 
> last few months. Nothing do with embedding (but it can make hex 
> sequences within strings more efficient, if that approach was used).
> 
> Writing byte-at-a-time hex data was always a bit fiddly:
> 
>      0x12, 0x34, 0xAB, ...
>      "\x12\x34\xAB...
> 
> It was made worse by my preference for `x` being in lower case, and the 
> hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.
> 
> What I did was create a new, variable-lenghth string escape sequence 
> that looks like this:
> 
>    "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq
> 
> Hex digits after \h or \H are read in pairs. White space is allowed 
> between pairs:
> 
>    "ABC\H 12 34 AB ...\nopq"
> 
> The only thing I wasn't sure about was the closing backslash, which 
> looks at first like another escape code. But I think it is sound, 
> although it can still be tweaked.
> 
> 

How often would something like that be useful?  I would have thought 
that it is rare to see something that is basically text but has enough 
odd non-printing characters (other than the common \n, \t, \e) to make 
it worth the fuss.  If you want to have binary data in something that 
looks like a string literal, then just use straight-up two hex digits 
per character - "4142431234ab".  It's simpler to generate and parse.  I 
don't see the benefit of something that mixes binary and text data.