Deutsch English Français Italiano |
<vpi4t3$10fsl$3@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: pozz <pozzugno@gmail.com> Newsgroups: comp.lang.c Subject: Re: Simple string conversion from UCS2 to ISO8859-1 Date: Mon, 24 Feb 2025 16:57:24 +0100 Organization: A noiseless patient Spider Lines: 34 Message-ID: <vpi4t3$10fsl$3@dont-email.me> References: <vp9oml$3a0k5$1@dont-email.me> <87bjuvm68v.fsf@nosuchdomain.example.com> <vpciqb$3unkp$3@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 24 Feb 2025 16:57:24 +0100 (CET) Injection-Info: dont-email.me; posting-host="dfae41dd43c6538fdb144c4f4771af80"; logging-data="1064853"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+D2QQcuu+iXTBwN2DxoHpejzRnyA+tBlU=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:YVHNFqzef6Hkq5T2HUWr+vseG/c= In-Reply-To: <vpciqb$3unkp$3@dont-email.me> Content-Language: it Bytes: 2643 Il 22/02/2025 14:18, David Brown ha scritto: > On 21/02/2025 20:45, Keith Thompson wrote: >> pozz <pozzugno@gmail.com> writes: >>> I want to write a simple function that converts UCS2 string into >>> ISO8859-1: >>> >>> void ucs2_to_iso8859p1(char *ucs2, size_t size); >>> >>> ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm >>> passing size because ucs2 isn't null terminated. >> >> Is the UCS-2 really represented as a sequence of ASCII hex digits? >> >> In actual UCS-2, each character is 2 bytes. The representation for >> "Hello" would be 10 bytes, either "\0H\0e\0l\0l\0o" or >> "H\0e\0l\0l\0o\0", depending on endianness. (UCS-2 is a subset of >> UTF-16; the latter uses longer sequences to represent characters >> outside the Basic Multilingual Plane.) >> > > My understanding here is that the OP is getting the UCS-2 encoded string > in from a modem, almost certainly on a serial line. The UCS-2 encoded > data is itself a binary sequence of 16-bit code units, and the modem > firmware is sending those as four hex digits. This is a very common way > to handle transmission of binary data in such systems - there is no need > for escapes or other complications to delimit the binary data. I would > expect that the entire incoming message will be comma-separated fields > with the time and date, sender's telephone number, and so on, as well as > the text itself as this long hex string. > Exactly. This is the reply to AT+CMGR command that is standardized in 3GPP TS 27.005.