| Deutsch English Français Italiano |
|
<tvg6sj54gssn4m4ao7m0g48e0nflbfsda8@4ax.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Geoff <geoff@invalid.invalid>
Newsgroups: comp.lang.c
Subject: Re: Simple string conversion from UCS2 to ISO8859-1
Date: Sat, 01 Mar 2025 09:31:55 -0800
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <tvg6sj54gssn4m4ao7m0g48e0nflbfsda8@4ax.com>
References: <vp9oml$3a0k5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 01 Mar 2025 18:31:58 +0100 (CET)
Injection-Info: dont-email.me; posting-host="9b43f30ed9e0206d315715016d408f16";
logging-data="353104"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/5JWn1zLmn7mj5tyNa4FGdl6O/Q/z97xk="
User-Agent: ForteAgent/7.20.32.1218
Cancel-Lock: sha1:4YigdY8ZWaLReKa75Z5oTcoz/Dk=
Bytes: 2727
On Fri, 21 Feb 2025 12:40:06 +0100, pozz <pozzugno@gmail.com> wrote:
>I want to write a simple function that converts UCS2 string into ISO8859-1:
>
>void ucs2_to_iso8859p1(char *ucs2, size_t size);
>
>ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm passing
>size because ucs2 isn't null terminated.
>
>I know I can use iconv() feature, but I'm on an embedded platform
>without an OS and without iconv() function.
>
>It is trivial to convert "0000"-"007F" chars: it's a simple cast from
>unsigned int to char.
>
>It isn't so simple to convert higher codes. For example, the small e
>with grave "00E8" can be converted to 0xE8 in ISO8859-1, so it's trivial
>again. But I saw the code "2019" (apostrophe) that can be rendered as
>0x27 in ISO8859-1.
>
>Is there a simplified mapping table that can be written with if/switch?
>
>if (code < 0x80) {
> *dst++ = (char)code;
>} else {
> switch (code) {
> case 0x2019: *dst++ = 0x27; break; // Apostrophe
> case 0x...: *dst++ = ...; break;
> default: *ds++ = ' ';
> }
>}
>
>I'm not searching a very detailed and correct mapping, but just a
>"sufficient" implementation.
>
#include <stdint.h>
#include <stddef.h>
// Function to convert UCS2 to ISO8859-1
void UCS2ToISO88591(const uint16_t* ucs2, size_t length, char* iso88591) {
for (size_t i = 0; i < length; ++i) {
uint16_t ucs2_char = ucs2[i];
if (ucs2_char <= 0x00FF) {
iso88591[i] = (char)ucs2_char;
} else {
// Handle characters that cannot be represented in ISO8859-1
iso88591[i] = '?'; // Replace with a placeholder character
}
}
// Null-terminate the ISO8859-1 string
iso88591[length] = '\0';
}