Deutsch   English   Français   Italiano  
<v4kpvc$3jrmr$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: C23 thoughts and opinions
Date: Sat, 15 Jun 2024 20:27:41 +0100
Organization: A noiseless patient Spider
Lines: 114
Message-ID: <v4kpvc$3jrmr$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
 <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
 <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
 <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
 <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
 <87y18047jk.fsf@nosuchdomain.example.com>
 <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
 <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
 <87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me>
 <87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me>
 <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me>
 <87v82b43h6.fsf@nosuchdomain.example.com> <v4igql$32qts$1@dont-email.me>
 <v4kib3$3icus$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 15 Jun 2024 21:27:41 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="70a9fc796b84cb08329413872ec51cfa";
	logging-data="3796699"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/NJweA8LntS5YICSRX9I3j"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:0dKBbdEPeM4RBKxQNTFr78n5FN0=
In-Reply-To: <v4kib3$3icus$1@dont-email.me>
Content-Language: en-GB
Bytes: 6955

On 15/06/2024 18:17, David Brown wrote:
> On 15/06/2024 00:39, bart wrote:
>> On 14/06/2024 22:30, Keith Thompson wrote:
>>
>>> Now that it's too late to change the definition, I've thought of
>>> something that I think would have been a better way to specify #embed.
>>>
>>> Define a new kind of string literal, with a "uc" prefix.  `uc"foo"` is
>>> of type `unsigned char[3]`.  (Or `const unsigned char[3]`, if that's not
>>> too radical.)  Unlike other string literals, there is no implicit
>>> terminating '\0'.  Arbitrary byte values can of course be specified in
>>> hexadecimal: uc"\x01\x02\x03\x04".  Since there's no terminating null
>>> character and C doesn't support zero-sized objects, uc"" is a syntax
>>> error.
>>>
>>> uc"..." string literals might be made even simpler, for example allowing
>>> only hex digits and not requiring \x (uc"01020304" rather than
>>> uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals
>>> could be useful in other contexts, and programmers will want
>>> flexibility.  Maybe something like hex"01020304" (embedded spaces could
>>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
>>
>> That's something I added to string literals in my language within the 
>> last few months. Nothing do with embedding (but it can make hex 
>> sequences within strings more efficient, if that approach was used).
>>
>> Writing byte-at-a-time hex data was always a bit fiddly:
>>
>>      0x12, 0x34, 0xAB, ...
>>      "\x12\x34\xAB...
>>
>> It was made worse by my preference for `x` being in lower case, and 
>> the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.
>>
>> What I did was create a new, variable-lenghth string escape sequence 
>> that looks like this:
>>
>>    "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq
>>
>> Hex digits after \h or \H are read in pairs. White space is allowed 
>> between pairs:
>>
>>    "ABC\H 12 34 AB ...\nopq"
>>
>> The only thing I wasn't sure about was the closing backslash, which 
>> looks at first like another escape code. But I think it is sound, 
>> although it can still be tweaked.
>>
>>
> 
> How often would something like that be useful?  I would have thought 
> that it is rare to see something that is basically text but has enough 
> odd non-printing characters (other than the common \n, \t, \e) to make 
> it worth the fuss.  If you want to have binary data in something that 
> looks like a string literal, then just use straight-up two hex digits 
> per character - "4142431234ab".  It's simpler to generate and parse.  I 
> don't see the benefit of something that mixes binary and text data.

That's not the same thing. That sequence "...1234..." occupies 4 bytes 
(with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or 
18 and 52).

Here's an example of wanting to print '€4.99', first in C (note that my 
editor doesn't support Unicode so this stuff is needed):

    puts("\xE2\x82\xAC" "4.99");

The euro symbol occupies three bytes in UTF8. It's awkward to type: it 
has loads of backslashes, it keeps switching case and it needs more 
concentration.

Plus I had to split the string since apparently \x doesn't stop at two 
hex digits, it keeps going: it would have read \xAC4, which overflows 
the 8-bit width of a character anyway, so I don't know what the point is 
of reading more than 2 hex characters.

Using my feature, it looks like this:

     println "\H E2 82 AC\4.99"

There must be loads of examples of wanting to write many byte values 
within strings, which in C can also be used to initialise byte arrays (a 
useful feature I've now adopted; see below).

Here's another example, in my language, which is the first 128 bytes of 
an EXE file which is constant. It is currently defined like this, 
probably created with a script:

   []byte stubdata = (
     0x4D, 0x5A, 0x90, 0x00, 0x03, 0x00, 0x00, 0x00,
     0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0x00, 0x00,
     ...

Using the new escape, I can just copy&paste a dump, and use a text 
editor to put in the string context needed, which took under a minute:

[]byte stubdata=
   b"\H 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00\"+
   b"\H B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00\"+
   b"\H 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\"+
   b"\H 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00\"+
   b"\H 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68\"+
   b"\H 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F\"+
   b"\H 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20\"+
   b"\H 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00\"+
   b"\H 50 45 00 00 64 86 04 00 00 00 00 00 00 00 00 00\"

(The 's'/'b' prefixes are needed for strings to have a type of (in C 
terms) char[] rather than char*, a detail that C glosses over via some 
magic. 's' gives you a zero terminator, 'b' as used here doesn't. The 
"+" is used for compile-time string/data-string concatenation.)

In short, more is possible without needed to resort to tools. You can 
directly work from a hex dump.