Article <v3d0hj$2amga$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v3d0hj$2amga$1@dont-email.me>

Deutsch English Français Italiano

<v3d0hj$2amga$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Unicode in strings
Date: Fri, 31 May 2024 12:14:19 -0500
Organization: A noiseless patient Spider
Lines: 44
Message-ID: <v3d0hj$2amga$1@dont-email.me>
References: <v0s17o$2okf4$2@dont-email.me>
 <2024May11.173149@mips.complang.tuwien.ac.at> <v1preb$2jn47$1@dont-email.me>
 <2024May12.110053@mips.complang.tuwien.ac.at>
 <jwvjzjwid50.fsf-monnier+comp.arch@gnu.org>
 <2024May18.072920@mips.complang.tuwien.ac.at>
 <jwved9t656u.fsf-monnier+comp.arch@gnu.org>
 <2024May25.174807@mips.complang.tuwien.ac.at>
 <jwvy17ty8v7.fsf-monnier+comp.arch@gnu.org>
 <2024May29.085955@mips.complang.tuwien.ac.at>
 <jwv5xuwwuqe.fsf-monnier+comp.arch@gnu.org>
 <2024May30.182546@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 31 May 2024 19:14:28 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="6ea1dc31a293772695e7d714cf6f6549";
	logging-data="2447882"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/2TyZY+mi7+gaizhLojwaSKND4leB7b6E="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:WixLjhBA7QpMOtGvxA+0eMkUczg=
In-Reply-To: <2024May30.182546@mips.complang.tuwien.ac.at>
Content-Language: en-US
Bytes: 3525

On 5/30/2024 11:25 AM, Anton Ertl wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> I'm not sure the codepoint-oriented API is the best option, but it's not
>> completely clear what *is* the best option.  You mention a byte-oriented
>> API and you might be right that it's a better option, but in the case of
>> Emacs that's what we used in Emacs-20.1 but it worked really poorly
>> because of backward compatibility issues.  I think if we started from
>> scratch now (i.e. without having to contend with backward compatibility,
>> and with a better understanding of Unicode (which barely existed back
>> then)) it might work better, indeed, but that's not been an option
> 
> Plus, editors are among the very few uses where you have to deal with
> individual characters, so the "treat it as opaque string" approach
> that works so well for most other code is not good enough there.  The
> command-line editor of Gforth is one case where we use the xchar words
> (those for dealing with code points of UTF-8).
> 

Yeah.

For text editors, this is one of the few cases it makes sense to use 32 
or 64 bit characters (say, combining the 'character' with some 
additional metadata such as formatting).

Though, one thing that makes sense for text editors is if only the 
"currently being edited" lines are fully unpacked, whereas the others 
can remain in a more compact form (such as UTF-8), and are then unpacked 
as they come into view (say, treating the editor window as a 32-entry 
modulo cache or similar).

For the rest, say, one can have, say, a big buffer, with an array of 
lines giving the location and size of the line's text in the buffer.

If a line is modified, it can be reallocated at the end of the buffer, 
and if the buffer gets full, it can be "repacked" and/or expanded as 
needed. When written back to a file, the buffer lines can be emitted 
in-order to the text file.

Not entirely sure how other text editors manage things here, not really 
looked into it.


> - anton