Article <20250221171137.949@kylheku.com>

Deutsch English Français Italiano
<20250221171137.949@kylheku.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Kaz Kylheku <643-408-1753@kylheku.com>
Newsgroups: comp.lang.c
Subject: Re: Simple string conversion from UCS2 to ISO8859-1
Date: Sat, 22 Feb 2025 01:20:20 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <20250221171137.949@kylheku.com>
References: <vp9oml$3a0k5$1@dont-email.me>
Injection-Date: Sat, 22 Feb 2025 02:20:20 +0100 (CET)
Injection-Info: dont-email.me; posting-host="f95c66af52f902865e27058a6dd1d91a";
	logging-data="3841502"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/jT01K924EsUSnTF7OU4emN1rZj5/UwJM="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:pKCnDwurqQsJbKQ9GTBAbybtJqM=
Bytes: 2450

On 2025-02-21, pozz <pozzugno@gmail.com> wrote:
> I want to write a simple function that converts UCS2 string into ISO8859-1:
>
> void ucs2_to_iso8859p1(char *ucs2, size_t size);
>
> ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm passing 
> size because ucs2 isn't null terminated.

This kind of normalizing is a good way of introducing injection
exploits.

Suppose the input is some syntax that has been validated; the decision
is trusted after that. The normalization to the 8-bit character set can
produce characters which are special in the syntax, changing its
meaning.

In Microsoft Windows, there is an example of such a problem. Programs
which use GetCommandLineA to get the argument string before parsing it
into arguments are vulnerable to argument injection. The attacker
specifies a piece of datum to be used by program A as an argument in
calling program B such that when the datum is decimated to the 8 bit
character set, quotes appear in it, creating additional arguments to
program B.

> again. But I saw the code "2019" (apostrophe) that can be rendered as 
> 0x27 in ISO8859-1.

.... and that's a common quoting character in various data syntaxes, oops!
What could go wrong?

I think in 2025 we shouldn't have to be crippling Unicode data to fit
some ISO Latin (or any other 8 bit) character set; we should be rooting
out technologies and situations which do that.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca