Deutsch   English   Français   Italiano  
<v9frim$3u7qi$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!feeds.phibee-telecom.net!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Thiago Adams <thiago.adams@gmail.com>
Newsgroups: comp.lang.c
Subject: multi bytes character - how to make it defined behavior?
Date: Tue, 13 Aug 2024 11:45:42 -0300
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <v9frim$3u7qi$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 13 Aug 2024 16:45:43 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2603b61a75fd950a96d23fa3a308b9a4";
	logging-data="4136786"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18lAYQS5395nnzNSUNf69ias/auMoC8UfE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:2nKf1OLOXz0nbjQegeN9LU2jkq4=
Content-Language: en-US
Bytes: 1595

static_assert('×' == 50071);

GCC -  warning multi byte
CLANG - error character too large

I think instead of "multi bytes" we need "multi characters" - not bytes.

We decode utf8 then we have the character to decide if it is multi char 
or not.

decoding '×' would consume bytes 195 and 151 the result is the decoded 
Unicode value of 215.

It is not multi byte : 256*195 + 151 = 50071

O the other hand 'ab' is "multi character" resulting

256 * 'a' + 'b' = 256*97+98= 24930

One consequence is that

'ab' == '𤤰'

But I don't think this is a problem. At least everything is defined.