Path: ...!news.nobody.at!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Lawrence D'Oliveiro Newsgroups: comp.lang.fortran Subject: Re: OT: unicode (Was: Re: Upcoming gfortran 15 will contain unsigned numbers) Date: Mon, 25 Nov 2024 23:35:34 -0000 (UTC) Organization: A noiseless patient Spider Lines: 12 Message-ID: References: <87bjy6cfek.fsf@example.com> <87y117y297.fsf_-_@example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Date: Tue, 26 Nov 2024 00:35:34 +0100 (CET) Injection-Info: dont-email.me; posting-host="884e8e3d13bad04345a336fccab9cba8"; logging-data="3240801"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TTGOcc0nZgt11t3ZoiQbM" User-Agent: Pan/0.161 (Chasiv Yar; ) Cancel-Lock: sha1:j6VkLujc496+fVOT8TuDcgdg5/A= Bytes: 1858 On Mon, 25 Nov 2024 08:35:48 -0300, Wolfgang Agnes wrote: > It's a bit difficult to understand ``surrogates''. The Unicode folks just decided that the ranges 0xD800-0xDBFF (1024 codes of “high surrogates”) and 0xDC00-0xDFFF (1024 codes of “low surrogates”) would be used in pairs to represent codes above 0xFFFF in UTF-16 encoding. This gives an additional 1024×1024 = 1048576 different codes, which should be enough to cover the entire (current) Unicode range, which officially goes up to 0x10FFFF. At least, that’s what they’re saying right now. In the full UCS-4 encoding, those ranges are considered invalid.