| Deutsch English Français Italiano |
|
<20250221171137.949@kylheku.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Kaz Kylheku <643-408-1753@kylheku.com> Newsgroups: comp.lang.c Subject: Re: Simple string conversion from UCS2 to ISO8859-1 Date: Sat, 22 Feb 2025 01:20:20 -0000 (UTC) Organization: A noiseless patient Spider Lines: 38 Message-ID: <20250221171137.949@kylheku.com> References: <vp9oml$3a0k5$1@dont-email.me> Injection-Date: Sat, 22 Feb 2025 02:20:20 +0100 (CET) Injection-Info: dont-email.me; posting-host="f95c66af52f902865e27058a6dd1d91a"; logging-data="3841502"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jT01K924EsUSnTF7OU4emN1rZj5/UwJM=" User-Agent: slrn/pre1.0.4-9 (Linux) Cancel-Lock: sha1:pKCnDwurqQsJbKQ9GTBAbybtJqM= Bytes: 2450 On 2025-02-21, pozz <pozzugno@gmail.com> wrote: > I want to write a simple function that converts UCS2 string into ISO8859-1: > > void ucs2_to_iso8859p1(char *ucs2, size_t size); > > ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm passing > size because ucs2 isn't null terminated. This kind of normalizing is a good way of introducing injection exploits. Suppose the input is some syntax that has been validated; the decision is trusted after that. The normalization to the 8-bit character set can produce characters which are special in the syntax, changing its meaning. In Microsoft Windows, there is an example of such a problem. Programs which use GetCommandLineA to get the argument string before parsing it into arguments are vulnerable to argument injection. The attacker specifies a piece of datum to be used by program A as an argument in calling program B such that when the datum is decimated to the 8 bit character set, quotes appear in it, creating additional arguments to program B. > again. But I saw the code "2019" (apostrophe) that can be rendered as > 0x27 in ISO8859-1. .... and that's a common quoting character in various data syntaxes, oops! What could go wrong? I think in 2025 we shouldn't have to be crippling Unicode data to fit some ISO Latin (or any other 8 bit) character set; we should be rooting out technologies and situations which do that. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca