Deutsch English Français Italiano |
<jwv5xuwwuqe.fsf-monnier+comp.arch@gnu.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Stefan Monnier <monnier@iro.umontreal.ca> Newsgroups: comp.arch Subject: Re: Unicode in strings Date: Wed, 29 May 2024 10:44:21 -0400 Organization: A noiseless patient Spider Lines: 21 Message-ID: <jwv5xuwwuqe.fsf-monnier+comp.arch@gnu.org> References: <v0s17o$2okf4$2@dont-email.me> <v1kifk$17qh0$1@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> <v1ns43$2260p$1@dont-email.me> <2024May11.173149@mips.complang.tuwien.ac.at> <v1preb$2jn47$1@dont-email.me> <2024May12.110053@mips.complang.tuwien.ac.at> <jwvjzjwid50.fsf-monnier+comp.arch@gnu.org> <2024May18.072920@mips.complang.tuwien.ac.at> <jwved9t656u.fsf-monnier+comp.arch@gnu.org> <2024May25.174807@mips.complang.tuwien.ac.at> <jwvy17ty8v7.fsf-monnier+comp.arch@gnu.org> <2024May29.085955@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Wed, 29 May 2024 16:44:25 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b4f73329fad870866166a1ffc8c05f07"; logging-data="1255160"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+QilbRHfEuEBZqnkhhwsT53ptoC7cgLyg=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:7JfedBBXCE6+iFbQ9BNGXT2dFzg= sha1:/Xl8AgHwGz/9e7/k2mLy3oPPHz8= Bytes: 2653 > Confirmed. So Emacs Lisp has a codepoint-oriented interface and then > needs to compensate for that elsewhere. This does not indicate that a > codepoint-oriented interface is a good idea, rather the opposite. Note that the "round to the next character boundary" is actually generalized to non-Unicode concepts: you can mark a chunk of text as being "intangible" or make it invisible and the "round up" will correspondingly move to the next boundary to avoid the cursor being in the middle of an invisible or intangible chunk of text. I'm not sure the codepoint-oriented API is the best option, but it's not completely clear what *is* the best option. You mention a byte-oriented API and you might be right that it's a better option, but in the case of Emacs that's what we used in Emacs-20.1 but it worked really poorly because of backward compatibility issues. I think if we started from scratch now (i.e. without having to contend with backward compatibility, and with a better understanding of Unicode (which barely existed back then)) it might work better, indeed, but that's not been an option 🙁 Stefan