Deutsch English Français Italiano |
<v4ncor$66d3$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!feeds.phibee-telecom.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: bart <bc@freeuk.com> Newsgroups: comp.lang.c Subject: Re: C23 thoughts and opinions Date: Sun, 16 Jun 2024 20:00:45 +0100 Organization: A noiseless patient Spider Lines: 158 Message-ID: <v4ncor$66d3$1@dont-email.me> References: <v2l828$18v7f$1@dont-email.me> <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com> <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com> <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me> <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me> <87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me> <87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me> <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me> <87v82b43h6.fsf@nosuchdomain.example.com> <v4igql$32qts$1@dont-email.me> <v4kib3$3icus$1@dont-email.me> <v4kpvc$3jrmr$1@dont-email.me> <v4mubu$3jg8$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 16 Jun 2024 21:00:43 +0200 (CEST) Injection-Info: dont-email.me; posting-host="48e70c5b8227fe18cdef471a005ffbe8"; logging-data="203171"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18LEE+kLEKM837DbgZVAnov" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:frWydjcphhxsfjiFRbe9Ro5mE4A= Content-Language: en-GB In-Reply-To: <v4mubu$3jg8$1@dont-email.me> Bytes: 8392 On 16/06/2024 15:54, David Brown wrote: > On 15/06/2024 21:27, bart wrote: >> On 15/06/2024 18:17, David Brown wrote: >>> On 15/06/2024 00:39, bart wrote: >>>> On 14/06/2024 22:30, Keith Thompson wrote: >>>> >>>>> Now that it's too late to change the definition, I've thought of >>>>> something that I think would have been a better way to specify #embed. >>>>> >>>>> Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is >>>>> of type `unsigned char[3]`. (Or `const unsigned char[3]`, if >>>>> that's not >>>>> too radical.) Unlike other string literals, there is no implicit >>>>> terminating '\0'. Arbitrary byte values can of course be specified in >>>>> hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null >>>>> character and C doesn't support zero-sized objects, uc"" is a syntax >>>>> error. >>>>> >>>>> uc"..." string literals might be made even simpler, for example >>>>> allowing >>>>> only hex digits and not requiring \x (uc"01020304" rather than >>>>> uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals >>>>> could be useful in other contexts, and programmers will want >>>>> flexibility. Maybe something like hex"01020304" (embedded spaces >>>>> could >>>>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04". >>>> >>>> That's something I added to string literals in my language within >>>> the last few months. Nothing do with embedding (but it can make hex >>>> sequences within strings more efficient, if that approach was used). >>>> >>>> Writing byte-at-a-time hex data was always a bit fiddly: >>>> >>>> 0x12, 0x34, 0xAB, ... >>>> "\x12\x34\xAB... >>>> >>>> It was made worse by my preference for `x` being in lower case, and >>>> the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look >>>> wrong. >>>> >>>> What I did was create a new, variable-lenghth string escape sequence >>>> that looks like this: >>>> >>>> "ABC\h1234AB...\nopq" // hex sequence between ABC & nopq >>>> >>>> Hex digits after \h or \H are read in pairs. White space is allowed >>>> between pairs: >>>> >>>> "ABC\H 12 34 AB ...\nopq" >>>> >>>> The only thing I wasn't sure about was the closing backslash, which >>>> looks at first like another escape code. But I think it is sound, >>>> although it can still be tweaked. >>>> >>>> >>> >>> How often would something like that be useful? I would have thought >>> that it is rare to see something that is basically text but has >>> enough odd non-printing characters (other than the common \n, \t, \e) >>> to make it worth the fuss. If you want to have binary data in >>> something that looks like a string literal, then just use straight-up >>> two hex digits per character - "4142431234ab". It's simpler to >>> generate and parse. I don't see the benefit of something that mixes >>> binary and text data. >> >> That's not the same thing. That sequence "...1234..." occupies 4 bytes >> (with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, >> or 18 and 52). >> >> Here's an example of wanting to print '€4.99', first in C (note that >> my editor doesn't support Unicode so this stuff is needed): >> >> puts("\xE2\x82\xAC" "4.99"); >> >> The euro symbol occupies three bytes in UTF8. It's awkward to type: it >> has loads of backslashes, it keeps switching case and it needs more >> concentration. >> >> Plus I had to split the string since apparently \x doesn't stop at two >> hex digits, it keeps going: it would have read \xAC4, which overflows >> the 8-bit width of a character anyway, so I don't know what the point >> is of reading more than 2 hex characters. >> >> Using my feature, it looks like this: >> >> println "\H E2 82 AC\4.99" >> > > I don't see any improvement of significance. The improvement, if any, > is very minor. The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data. > (I gather you have other conveniences for your language's printing > features when converting various types, but that's a different matter.) > > The obvious answer to writing this kind of thing is simply to switch to > an editor that supports UTF-8. It never happens that you want to type a bunch of hex byte values to initialise a byte array? OK. > Why bother with the \H stuff? That's my point - use hex data for data, > and text for text. Mixing these is not common enough to make it worth > the extra fuss you have to give such negligible extra convenience. > > My suggestion is that it could be helpful to have binary blobs written > as hex digits without escapes anywhere, because it is /just/ binary > data. I don't object to having optional spaces - that's a fine idea. > But just write : > > b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00" > b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00" > > The extra "\H" adds nothing useful. Is this a separate feature using 'b'? Because in my scheme, \H is just another string escape code, which can be used in ordinary strings, and b"" strings define char[] data which can include normal text data too. So my example could have been written as b"MZ\h 90 00 03 ..." I did look at having a separate feature, but I didn't want that. I ended up with these scheme for data-strings, here expressed using C types: Can initialise: "abcd" char* only s"abcd" char*, char[] or any T[]; zero-terminated b"abcd" char*, char[] or any T[] sinclude"file" char*, char[] or any T[]; zero-terminated binclude"file" char*, char[] or any T[] The first 3 can include any string escapes including \H...\ The last two embed file data, binary or text. But if a normal C-style string is needed with no embedded zeros except at the end, sinclude should be used with a text file. > > > >> >> (The 's'/'b' prefixes are needed for strings to have a type of (in C >> terms) char[] rather than char*, a detail that C glosses over via some >> magic. 's' gives you a zero terminator, 'b' as used here doesn't. The >> "+" is used for compile-time string/data-string concatenation.) >> >> In short, more is possible without needed to resort to tools. You can >> directly work from a hex dump. >