Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <1ffb2244967a28423c968f4b4a9fec5a2553f356@i2pn2.org>
Deutsch   English   Français   Italiano  
<1ffb2244967a28423c968f4b4a9fec5a2553f356@i2pn2.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.quux.org!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: Richard Damon <richard@damon-family.org>
Newsgroups: comp.lang.c
Subject: Re: multi bytes character - how to make it defined behavior?
Date: Tue, 13 Aug 2024 23:44:24 -0400
Organization: i2pn2 (i2pn.org)
Message-ID: <1ffb2244967a28423c968f4b4a9fec5a2553f356@i2pn2.org>
References: <v9frim$3u7qi$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 14 Aug 2024 03:44:24 -0000 (UTC)
Injection-Info: i2pn2.org;
	logging-data="2503679"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="diqKR1lalukngNWEqoq9/uFtbkm5U+w3w6FQ0yesrXg";
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <v9frim$3u7qi$1@dont-email.me>
X-Spam-Checker-Version: SpamAssassin 4.0.0

On 8/13/24 10:45 AM, Thiago Adams wrote:
> static_assert('×' == 50071);
> 
> GCC -  warning multi byte
> CLANG - error character too large
> 
> I think instead of "multi bytes" we need "multi characters" - not bytes.
> 
> We decode utf8 then we have the character to decide if it is multi char 
> or not.
> 
> decoding '×' would consume bytes 195 and 151 the result is the decoded 
> Unicode value of 215.
> 
> It is not multi byte : 256*195 + 151 = 50071
> 
> O the other hand 'ab' is "multi character" resulting
> 
> 256 * 'a' + 'b' = 256*97+98= 24930
> 
> One consequence is that
> 
> 'ab' == '𤤰'
> 
> But I don't think this is a problem. At least everything is defined.

When you use the single quotes by themselves ('), you are specifying 
characters in the narrow character set, typically ASCII, but might be 
some other 8-bit character encoding. It can not specify extended 
character beyond those.

You can (if the implementation allows it) place multiple characters in 
the constant to get an integer value with those characters packed.

When you use the double quotes by themselves ("), you are specifying a 
string of these narrow characters, although this form might allow for 
multi-byte encodings of some characters, like is done with UTF-8.

You can specifiy wide character constants by the syntax of L'x', u'x', 
or U'x'.

L'x' will give you what ever the inplementation calls its "wide 
character set". This MIGHT be UCS-2/UTF-16 or UCS-4/UTF-32 encoded, but 
doesn't need to be.

The u'x' form will always be USC-2/UTF-16, and U'x' will always be 
UCS-4/UTF-32

Like the plain 'x' form, the results from a single character, can not be 
a multi-unit value, so u'x' can't generate a two surrogate pairs for a 
single source character.

Change the ' to a " and you get wide strings, just like the characters, 
but now u"xx" and L"xx" can generate charaters that use surrogate pairs 
(or other multi-part encodings for L"xxx")