Deutsch English Français Italiano |
<2024May18.104040@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Unicode in strings Date: Sat, 18 May 2024 08:40:40 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 85 Message-ID: <2024May18.104040@mips.complang.tuwien.ac.at> References: <v0s17o$2okf4$2@dont-email.me> <4e0557bec2acda4df76f1ed01ebcbdf6@www.novabbs.org> <v1ep4i$1ptf$1@gal.iecc.com> <v1eruj$3o1r8$1@dont-email.me> <v1h8l6$1ttd$1@gal.iecc.com> <v1kifk$17qh0$1@dont-email.me> <2024May10.182047@mips.complang.tuwien.ac.at> <v1ns43$2260p$1@dont-email.me> <2024May11.173149@mips.complang.tuwien.ac.at> <v1preb$2jn47$1@dont-email.me> <2024May12.110053@mips.complang.tuwien.ac.at> <6124140226e28fd4afec0b435bdbeca1@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Sat, 18 May 2024 11:26:26 +0200 (CEST) Injection-Info: dont-email.me; posting-host="452a7a22aa81cceac80468f54b7242fc"; logging-data="2868669"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19y/nd8RhYIQCLCGuviWLN1" Cancel-Lock: sha1:w+yXyA/S/sDMlT5q+NkXwUC7sak= X-newsreader: xrn 10.11 Bytes: 5184 mitchalsup@aol.com (MitchAlsup1) writes: >It seems to me (in my vast ignorance) that names for things should be >written in the most appropriate set of characters in the language of >the person/thing being named. > >Then when such a name is "sent out to be displayed" that it is a property >of the display what character set(s) it can properly emit, and thereby >alter the string of characters as appropriate to its capabilities. > >For example:: Take > "K\u0316\u0308nig" cr type ==> K̖̈nig >When displayed on a ASCII only line printer it would be written Koenig >When displayed on a enhanced ASCII printer it would be written König >When displayed on a full functional printer it would be written K̖̈nig Why do you think that K̖̈nig should be written as Koenig or König? However, for König Unicode specifies that the precomposed form is König. And if you want a transcription into ASCII with the knowledge that it's German, the result would be Koenig. >Only the display device needs to understand this mapping and NOT the >program/software/device holding the string. Yes, that's why treating string data as opaque works for most of the code. >I think people in Japan should be able to use printf by using プリントフ >There is way to much "english" in the way computers are being used. I don't know how Japanese feel about that, but I certainly don't want to have to use some Germanized form of C or Forth. This kind of catering for different natural-language programmers has been tried and has not taken over the world. I guess that's because 1) You need to learn a lot about what "printf" means and how it is used; remembering the name is only a minor aspect. 2) Having a name common on all the world allows you to read programs from all over the world, use reference material from all over the world, etc. A similar concept was implemented in COBOL, where the designers though that having to write ADD A TO B GIVING C or somesuch makes programming easier than writing C = A+B in FORTRAN. Has not found many followers, either. Interestingly, among the Algol descendents, the BCPL (and later B and C) syntax, which, e.g., replaced 'or' with || or |, and was otherwise more symbolic and less natural-language-oriented than its ancestor Algol 60, was the most successful syntax style among the Algol descendents, including spreading to languages like Java that are closer to Algol 60 or Pascal in other respects. I have seen programmers define their own names based on their native language, however. But if they use names in their own language, these names should not depend on the environment. In the macro language of a game I play, you can refer to things through their name or through their numeric id. Unfortunately, the names are localized, so the only way to write portable macros is by using the unmnemonic numeric ids:-(. What is more common than localized programming languages is producing error messages in localized languages. I find this annoying, too, because it makes it harder to find out how others have solved the same problem. And, e.g., ENOTSUP in Unix, has such a specific meaning that the lozalized text does not help the person unfamiliar with Unix, while it makes life harder for people who know Unix enough to make sense of the message; i.e., even though my native language is German, I find "Operation not supported" easier to understand than "Operation wird nicht unterstützt"; in the latter case I first have to guess what the English error message would have been and then I can start analysing the problem. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>