Path: ...!feeds.phibee-telecom.net!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thiago Adams Newsgroups: comp.lang.c Subject: multi bytes character - how to make it defined behavior? Date: Tue, 13 Aug 2024 11:45:42 -0300 Organization: A noiseless patient Spider Lines: 24 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 13 Aug 2024 16:45:43 +0200 (CEST) Injection-Info: dont-email.me; posting-host="2603b61a75fd950a96d23fa3a308b9a4"; logging-data="4136786"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18lAYQS5395nnzNSUNf69ias/auMoC8UfE=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:2nKf1OLOXz0nbjQegeN9LU2jkq4= Content-Language: en-US Bytes: 1595 static_assert('×' == 50071); GCC - warning multi byte CLANG - error character too large I think instead of "multi bytes" we need "multi characters" - not bytes. We decode utf8 then we have the character to decide if it is multi char or not. decoding '×' would consume bytes 195 and 151 the result is the decoded Unicode value of 215. It is not multi byte : 256*195 + 151 = 50071 O the other hand 'ab' is "multi character" resulting 256 * 'a' + 'b' = 256*97+98= 24930 One consequence is that 'ab' == '𤤰' But I don't think this is a problem. At least everything is defined.