Deutsch   English   Français   Italiano  
<vpi4t3$10fsl$3@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: pozz <pozzugno@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: Simple string conversion from UCS2 to ISO8859-1
Date: Mon, 24 Feb 2025 16:57:24 +0100
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <vpi4t3$10fsl$3@dont-email.me>
References: <vp9oml$3a0k5$1@dont-email.me>
 <87bjuvm68v.fsf@nosuchdomain.example.com> <vpciqb$3unkp$3@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 24 Feb 2025 16:57:24 +0100 (CET)
Injection-Info: dont-email.me; posting-host="dfae41dd43c6538fdb144c4f4771af80";
	logging-data="1064853"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+D2QQcuu+iXTBwN2DxoHpejzRnyA+tBlU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:YVHNFqzef6Hkq5T2HUWr+vseG/c=
In-Reply-To: <vpciqb$3unkp$3@dont-email.me>
Content-Language: it
Bytes: 2643

Il 22/02/2025 14:18, David Brown ha scritto:
> On 21/02/2025 20:45, Keith Thompson wrote:
>> pozz <pozzugno@gmail.com> writes:
>>> I want to write a simple function that converts UCS2 string into 
>>> ISO8859-1:
>>>
>>> void ucs2_to_iso8859p1(char *ucs2, size_t size);
>>>
>>> ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm
>>> passing size because ucs2 isn't null terminated.
>>
>> Is the UCS-2 really represented as a sequence of ASCII hex digits?
>>
>> In actual UCS-2, each character is 2 bytes.  The representation for
>> "Hello" would be 10 bytes, either "\0H\0e\0l\0l\0o" or
>> "H\0e\0l\0l\0o\0", depending on endianness.  (UCS-2 is a subset of
>> UTF-16; the latter uses longer sequences to represent characters
>> outside the Basic Multilingual Plane.)
>>
> 
> My understanding here is that the OP is getting the UCS-2 encoded string 
> in from a modem, almost certainly on a serial line.  The UCS-2 encoded 
> data is itself a binary sequence of 16-bit code units, and the modem 
> firmware is sending those as four hex digits.  This is a very common way 
> to handle transmission of binary data in such systems - there is no need 
> for escapes or other complications to delimit the binary data.  I would 
> expect that the entire incoming message will be comma-separated fields 
> with the time and date, sender's telephone number, and so on, as well as 
> the text itself as this long hex string.
> 

Exactly. This is the reply to AT+CMGR command that is standardized in 
3GPP TS 27.005.