Article <2024May18.104040@mips.complang.tuwien.ac.at>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <2024May18.104040@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano

<2024May18.104040@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Unicode in strings
Date: Sat, 18 May 2024 08:40:40 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 85
Message-ID: <2024May18.104040@mips.complang.tuwien.ac.at>
References: <v0s17o$2okf4$2@dont-email.me> <4e0557bec2acda4df76f1ed01ebcbdf6@www.novabbs.org> <v1ep4i$1ptf$1@gal.iecc.com> <v1eruj$3o1r8$1@dont-email.me> <v1h8l6$1ttd$1@gal.iecc.com> <v1kifk$17qh0$1@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> <v1ns43$2260p$1@dont-email.me> <2024May11.173149@mips.complang.tuwien.ac.at> <v1preb$2jn47$1@dont-email.me> <2024May12.110053@mips.complang.tuwien.ac.at> <6124140226e28fd4afec0b435bdbeca1@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 18 May 2024 11:26:26 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="452a7a22aa81cceac80468f54b7242fc";
	logging-data="2868669"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19y/nd8RhYIQCLCGuviWLN1"
Cancel-Lock: sha1:w+yXyA/S/sDMlT5q+NkXwUC7sak=
X-newsreader: xrn 10.11
Bytes: 5184

mitchalsup@aol.com (MitchAlsup1) writes:
>It seems to me (in my vast ignorance) that names for things should be
>written in the most appropriate set of characters in the language of
>the person/thing being named.
>
>Then when such a name is "sent out to be displayed" that it is a property
>of the display what character set(s) it can properly emit, and thereby
>alter the string of characters as appropriate to its capabilities.
>
>For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig
>When displayed on a ASCII only line printer it would be written Koenig
>When displayed on a enhanced ASCII printer  it would be written König
>When displayed on a full functional printer it would be written K̖̈nig

Why do you think that K̖̈nig should be written as Koenig or König?

However, for König Unicode specifies that the precomposed form is
König.  And if you want a transcription into ASCII with the knowledge
that it's German, the result would be Koenig.

>Only the display device needs to understand this mapping and NOT the 
>program/software/device holding the string.

Yes, that's why treating string data as opaque works for most of the
code.

>I think people in Japan should be able to use printf by using プリントフ
>There is way to much "english" in the way computers are being used.

I don't know how Japanese feel about that, but I certainly don't want
to have to use some Germanized form of C or Forth.  This kind of
catering for different natural-language programmers has been tried and
has not taken over the world.  I guess that's because

1) You need to learn a lot about what "printf" means and how it is
   used; remembering the name is only a minor aspect.

2) Having a name common on all the world allows you to read programs
   from all over the world, use reference material from all over the
   world, etc.

A similar concept was implemented in COBOL, where the designers though
that having to write

ADD A TO B GIVING C

or somesuch makes programming easier than writing

C = A+B

in FORTRAN.  Has not found many followers, either.  Interestingly,
among the Algol descendents, the BCPL (and later B and C) syntax,
which, e.g., replaced 'or' with || or |, and was otherwise more
symbolic and less natural-language-oriented than its ancestor Algol
60, was the most successful syntax style among the Algol descendents,
including spreading to languages like Java that are closer to Algol 60
or Pascal in other respects.

I have seen programmers define their own names based on their native
language, however.  But if they use names in their own language, these
names should not depend on the environment.

In the macro language of a game I play, you can refer to things
through their name or through their numeric id.  Unfortunately, the
names are localized, so the only way to write portable macros is by
using the unmnemonic numeric ids:-(.

What is more common than localized programming languages is producing
error messages in localized languages.  I find this annoying, too,
because it makes it harder to find out how others have solved the same
problem.

And, e.g., ENOTSUP in Unix, has such a specific meaning that the
lozalized text does not help the person unfamiliar with Unix, while it
makes life harder for people who know Unix enough to make sense of the
message; i.e., even though my native language is German, I find
"Operation not supported" easier to understand than "Operation wird
nicht unterstützt"; in the latter case I first have to guess what the
English error message would have been and then I can start analysing
the problem.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>