Deutsch English Français Italiano |
<v33apl$9cst$3@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Lawrence D'Oliveiro <ldo@nz.invalid> Newsgroups: comp.arch Subject: Re: Unicode in strings Date: Tue, 28 May 2024 01:08:06 -0000 (UTC) Organization: A noiseless patient Spider Lines: 17 Message-ID: <v33apl$9cst$3@dont-email.me> References: <v0s17o$2okf4$2@dont-email.me> <2024May18.072920@mips.complang.tuwien.ac.at> <jwved9t656u.fsf-monnier+comp.arch@gnu.org> <v31ddp$3u8om$1@dont-email.me> <v3283t$1use$2@gal.iecc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Date: Tue, 28 May 2024 03:08:06 +0200 (CEST) Injection-Info: dont-email.me; posting-host="f9fa71f6bded3e8519d33d87ee221dff"; logging-data="308125"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Jzl3hnvrJEaHQPfdScv5l" User-Agent: Pan/0.158 (Avdiivka; ) Cancel-Lock: sha1:gfwo5aATEk07gpN5IYcaqWFfOkc= Bytes: 1841 On Mon, 27 May 2024 15:16:13 -0000 (UTC), John Levine wrote: > According to Lawrence D'Oliveiro <ldo@nz.invalid>: >>On Wed, 22 May 2024 15:38:51 -0400, Stefan Monnier wrote: >> >>> I don't know of any language (or even library) that supports the >>> notion of "character" for Unicode strings. 🙁 >> >> Surely a “character” (or “grapheme” I think is (one of) the Unicode >> terms) is (represented by) a non-combining code point combined with all >> the immediately-following combining code points. > > Take another look at the table I referred to yesterday. When you have > ZWJ the rules of what combines with what gets awfully complicated. ZWJ is classed as “punctuation”, and has no combining class. So it forms a “character” or “grapheme” it its own right.