Deutsch   English   Français   Italiano  
<tvg6sj54gssn4m4ao7m0g48e0nflbfsda8@4ax.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Geoff <geoff@invalid.invalid>
Newsgroups: comp.lang.c
Subject: Re: Simple string conversion from UCS2 to ISO8859-1
Date: Sat, 01 Mar 2025 09:31:55 -0800
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <tvg6sj54gssn4m4ao7m0g48e0nflbfsda8@4ax.com>
References: <vp9oml$3a0k5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 01 Mar 2025 18:31:58 +0100 (CET)
Injection-Info: dont-email.me; posting-host="9b43f30ed9e0206d315715016d408f16";
	logging-data="353104"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/5JWn1zLmn7mj5tyNa4FGdl6O/Q/z97xk="
User-Agent: ForteAgent/7.20.32.1218
Cancel-Lock: sha1:4YigdY8ZWaLReKa75Z5oTcoz/Dk=
Bytes: 2727

On Fri, 21 Feb 2025 12:40:06 +0100, pozz <pozzugno@gmail.com> wrote:

>I want to write a simple function that converts UCS2 string into ISO8859-1:
>
>void ucs2_to_iso8859p1(char *ucs2, size_t size);
>
>ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm passing 
>size because ucs2 isn't null terminated.
>
>I know I can use iconv() feature, but I'm on an embedded platform 
>without an OS and without iconv() function.
>
>It is trivial to convert "0000"-"007F" chars: it's a simple cast from 
>unsigned int to char.
>
>It isn't so simple to convert higher codes. For example, the small e 
>with grave "00E8" can be converted to 0xE8 in ISO8859-1, so it's trivial 
>again. But I saw the code "2019" (apostrophe) that can be rendered as 
>0x27 in ISO8859-1.
>
>Is there a simplified mapping table that can be written with if/switch?
>
>if (code < 0x80) {
>   *dst++ = (char)code;
>} else {
>   switch (code) {
>     case 0x2019: *dst++ = 0x27; break;  // Apostrophe
>     case 0x...: *dst++ = ...; break;
>     default: *ds++ = ' ';
>   }
>}
>
>I'm not searching a very detailed and correct mapping, but just a 
>"sufficient" implementation.
>

#include <stdint.h>
#include <stddef.h>

// Function to convert UCS2 to ISO8859-1
void UCS2ToISO88591(const uint16_t* ucs2, size_t length, char* iso88591) {
    for (size_t i = 0; i < length; ++i) {
        uint16_t ucs2_char = ucs2[i];
        if (ucs2_char <= 0x00FF) {
            iso88591[i] = (char)ucs2_char;
        } else {
            // Handle characters that cannot be represented in ISO8859-1
            iso88591[i] = '?'; // Replace with a placeholder character
        }
    }
    // Null-terminate the ISO8859-1 string
    iso88591[length] = '\0';
}