| Deutsch English Français Italiano |
|
<vpciqb$3unkp$3@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.lang.c Subject: Re: Simple string conversion from UCS2 to ISO8859-1 Date: Sat, 22 Feb 2025 14:18:03 +0100 Organization: A noiseless patient Spider Lines: 28 Message-ID: <vpciqb$3unkp$3@dont-email.me> References: <vp9oml$3a0k5$1@dont-email.me> <87bjuvm68v.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 22 Feb 2025 14:18:04 +0100 (CET) Injection-Info: dont-email.me; posting-host="c7fa0a28977b5f488f5523ebf65c845d"; logging-data="4152985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19FKpHgn3vPgBK/f8CjR4AkJBJWrKV5Ddg=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:1J42ilAbqGKV/awpRtEZBEAuhlo= In-Reply-To: <87bjuvm68v.fsf@nosuchdomain.example.com> Content-Language: en-GB On 21/02/2025 20:45, Keith Thompson wrote: > pozz <pozzugno@gmail.com> writes: >> I want to write a simple function that converts UCS2 string into ISO8859-1: >> >> void ucs2_to_iso8859p1(char *ucs2, size_t size); >> >> ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm >> passing size because ucs2 isn't null terminated. > > Is the UCS-2 really represented as a sequence of ASCII hex digits? > > In actual UCS-2, each character is 2 bytes. The representation for > "Hello" would be 10 bytes, either "\0H\0e\0l\0l\0o" or > "H\0e\0l\0l\0o\0", depending on endianness. (UCS-2 is a subset of > UTF-16; the latter uses longer sequences to represent characters > outside the Basic Multilingual Plane.) > My understanding here is that the OP is getting the UCS-2 encoded string in from a modem, almost certainly on a serial line. The UCS-2 encoded data is itself a binary sequence of 16-bit code units, and the modem firmware is sending those as four hex digits. This is a very common way to handle transmission of binary data in such systems - there is no need for escapes or other complications to delimit the binary data. I would expect that the entire incoming message will be comma-separated fields with the time and date, sender's telephone number, and so on, as well as the text itself as this long hex string.