Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: pozz <pozzugno@gmail.com>
Newsgroups: comp.arch.embedded
Subject: Re: How to add the second (or other) languages
Date: Sun, 16 Feb 2025 19:59:58 +0100
Organization: A noiseless patient Spider
Lines: 160
Message-ID: <votcl3$nc20$1@dont-email.me>
References: <voii3i$28jmm$1@dont-email.me>
 <voioe3.598.1@stefan.msgid.phost.de> <voiu1q$2f5ap$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 16 Feb 2025 20:00:52 +0100 (CET)
Injection-Info: dont-email.me; posting-host="f74bdbe09773e4d813c1959798f2cd87";
	logging-data="766016"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+gitiom2dYl2gaW1+gVzzR4bKJSwhLqCA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:zi0aCKQMvL6fC+LMXMtYVZSAiKs=
In-Reply-To: <voiu1q$2f5ap$1@dont-email.me>
Content-Language: it
Bytes: 7205

Il 12/02/2025 20:50, David Brown ha scritto:
> On 12/02/2025 18:14, Stefan Reuther wrote:
>> Am 12.02.2025 um 17:26 schrieb pozz:
>>> #if LANGUAGE_ITALIAN
>>> #  define STRING123            "Evento %d: accensione"
>>> #elif LANGUAGE_ENGLISH
>>> #  define STRING123            "Event %d: power up"
>>> #endif
>> [...]
>>> Another approach is giving the user the possibility to change the
>>> language at runtime, maybe with an option on the display. In some cases,
>>> I have enough memory to store all the strings in all languages.
>>
>> Put the strings into a structure.
>>
>>    struct Strings {
>>        const char* power_up_message;
>>    };
>>
>> I hate global variables, so I pass a pointer to the structure to every
>> function that needs it (but of course you can also make a global 
>> variable).
>>
>> Then, on language change, just point your structure pointer elsewhere,
>> or load the strings from secondary storage.
>>
>> One disadvantage is that this loses you the compiler warnings for
>> mismatching printf specifiers.
>>
>>> I know there are many possible solutions, but I'd like to know some
>>> suggestions from you. For example, it could be nice if there was some
>>> tool that automatically extracts all the strings used in the source code
>>> and helps managing more languages.
>>
>> There's packages like gettext. You tag your strings as
>> 'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
>> all into a .po file. Other tools help you manage these files (e.g.
>> 'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
>> warnings.
>>
>> The .po file is a mapping from English to Whateverish strings. So you
>> would convert that into some space-efficient resource file, and
>> implement the '_' macro/function to perform the mapping. The
>> disadvantage is that this takes lot of memory because your app needs to
>> have both the English and the translated strings in memory. But unless
>> you also use a fancy preprocessor that translates your code to
>> 'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
>> you might come up with some compile-time hashing...
>>
>> I wouldn't use that on a microcontroller, but it's nice for desktop apps.
>>
>>
>>    Stefan
> 
> 
> You don't need a very fancy pre-processor to handle this yourself, if 
> you are happy to make a few changes to the code.  Have your code use 
> something like :
> 
> #define DisplayPrintf(id, desc, args...) \
>      display_printf(strings[language][string_ ## id], ## x)
> 
> Use it like :
> 
>      DisplayPrintf(event_type_on, "Event on", ev->idx);
> 
> 
> A little Python preprocessor script can chew through all your C files 
> and identify each call to "DisplayPrintf".  

Little... yes, it would be little, but not simple, at least for me. How 
to write a correct C preprocessor in Python?

This preprocessor should ingest a C source file after it is preprocessed 
by the standard C preprocessor for the specific build you are doing.

For example, you could have a C source file that contains:

#if BUILD == BUILD_FULL
   DisplayPrintf(msg, "Press (1) for simple process, (2) for advanced 
process");
   x = wait_keypress();
   if (x == '1') do_simple();
   if (x == '2') do_adv();
#elif BUILD == BUILD_LIGHT
   do_simple();
#endif

If I'm building the project as BUILD_FULL, there's at least one 
additional string to translate.

Another big problem is the Python preprocessor should understand C 
syntax; it shouldn't simply search for DisplayPrintf occurrences.
For example:

/* DisplayPrintf(old_string, "This is an old message"); */
DisplayPrintf(new_string, "This is a new message");

Of course, only one string is present in the source file, but it's not 
simple to extract it.


> It can collect together all 
> the id's and generate a header with something like :
> 
>      typedef enum {
>          string_event_type_on, ...
>      } string_index;
>      enum { no_of_strings = ... };
> 
>      enum {
>          lang_English, lang_Italian, ...
>      } language_index;
>      enum { no_of_languages = ... };
> 
>      extern language_index language;        // global var :-)
>      extern const char* strings[no_of_languages][no_of_strings];
> 
> Then it will have a C file :
> 
>      #include "language.h"
> 
>      language_index language;
>      const char* strings[no_of_languages][no_of_strings] = {
>      {    // English
>          "Event %d: power up",        // Event on
>          ...
>      }
>      {    // Italian
>          "Evento %d: accensione",    // Event on
>      }
>      }
> 
> It would generate the strings based on language files:
> 
>      # english.txt
>      event_type_on : Event %d: power up
>      ...
> 
> If the preprocessor finds a use of DisplayPrintf where the id (which can 
> be as long or short as you want, but can't have spaces or awkward 
> characters) does not match the description, it should give an error - 
> duplicate uses of the same pair are skipped.  (You could just use an id 
> and no description if you prefer.)
> 
> Any ids that are not in the language files will be printed out or put in 
> a file, ids that are in the language files but not used in the program 
> will give warnings, etc.
> 
> It can all be done in a manner that makes it easy to get right, hard to 
> get wrong, and will not cause trouble as strings are added or removed.
> 
> It would be a lot simpler than gettext, and use minimal runtime space 
> and time.  And it should be straightforward to change if you want to 
> have string tables stored externally or something like that.  (I've made 
> systems with string tables in an external serial eprom, for example.)

Thanks for the suggestion, the idea is great. However I'm not able to 
write a Python preprocessor that works well.