Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Geoff Newsgroups: comp.lang.c Subject: Re: Simple string conversion from UCS2 to ISO8859-1 Date: Sat, 01 Mar 2025 09:31:55 -0800 Organization: A noiseless patient Spider Lines: 53 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Injection-Date: Sat, 01 Mar 2025 18:31:58 +0100 (CET) Injection-Info: dont-email.me; posting-host="9b43f30ed9e0206d315715016d408f16"; logging-data="353104"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/5JWn1zLmn7mj5tyNa4FGdl6O/Q/z97xk=" User-Agent: ForteAgent/7.20.32.1218 Cancel-Lock: sha1:4YigdY8ZWaLReKa75Z5oTcoz/Dk= Bytes: 2727 On Fri, 21 Feb 2025 12:40:06 +0100, pozz wrote: >I want to write a simple function that converts UCS2 string into ISO8859-1: > >void ucs2_to_iso8859p1(char *ucs2, size_t size); > >ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm passing >size because ucs2 isn't null terminated. > >I know I can use iconv() feature, but I'm on an embedded platform >without an OS and without iconv() function. > >It is trivial to convert "0000"-"007F" chars: it's a simple cast from >unsigned int to char. > >It isn't so simple to convert higher codes. For example, the small e >with grave "00E8" can be converted to 0xE8 in ISO8859-1, so it's trivial >again. But I saw the code "2019" (apostrophe) that can be rendered as >0x27 in ISO8859-1. > >Is there a simplified mapping table that can be written with if/switch? > >if (code < 0x80) { > *dst++ = (char)code; >} else { > switch (code) { > case 0x2019: *dst++ = 0x27; break; // Apostrophe > case 0x...: *dst++ = ...; break; > default: *ds++ = ' '; > } >} > >I'm not searching a very detailed and correct mapping, but just a >"sufficient" implementation. > #include #include // Function to convert UCS2 to ISO8859-1 void UCS2ToISO88591(const uint16_t* ucs2, size_t length, char* iso88591) { for (size_t i = 0; i < length; ++i) { uint16_t ucs2_char = ucs2[i]; if (ucs2_char <= 0x00FF) { iso88591[i] = (char)ucs2_char; } else { // Handle characters that cannot be represented in ISO8859-1 iso88591[i] = '?'; // Replace with a placeholder character } } // Null-terminate the ISO8859-1 string iso88591[length] = '\0'; }