Deutsch   English   Français   Italiano  
<vvssf0$13ls6$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Nuno Silva <nunojsilva@invalid.invalid>
Newsgroups: comp.unix.shell
Subject: Re: How to convert <binaryGlowMixedWithASCII> to pure ASCII
Date: Mon, 12 May 2025 14:18:24 +0100
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <vvssf0$13ls6$1@dont-email.me>
References: <vv8asa$2nscb$1@news.xmission.com>
	<slrn1023i84.2s2es.cmartin+usenetYYMMDD@nyx2.nyx.net>
	<vvsp2r$33sch$1@news.xmission.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 12 May 2025 15:18:25 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ee5cae4ee37f8bab3d9175f0bc39da5e";
	logging-data="1169286"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19tlurIz+Z3Angs0xI0TE/A"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
Cancel-Lock: sha1:Lp8mwD1ML3wRUuuWKwtm4Lw1m2o=

On 2025-05-12, Kenny McCormack wrote:

> In article <slrn1023i84.2s2es.cmartin+usenetYYMMDD@nyx2.nyx.net>,
> Chuck Martin  <cmartin+usenetYYMMDD@nyx.net> wrote:
> ...
>>Try piping it into the following command:
>>
>>perl -CS -MEncode -ne 'print decode("MIME-Header", $_)'
>
> Bingo!  Thanks very much.
>
> That worked well on the other example I gave in the OP too (which was
> actually more the focus of my query).
>
> I still get the binary glop in place of the single quotes (The text
> contains the word "They're", rendered as "They<binaryglop>re").  Note that
> in the input string, <binaryglop> is (the literal string): =E2=80=99
> which gets converted by your Perl program into the 3
> characters with octal codes (as displayed by "od -bc"): 342 200 231
>
> I can deal with this later problem myself via brute force with AWK, but it
> would be nice if I didn't have to - i.e., if there were a complete solution
> (i.e., one that does also the other half of the job).

My guess is that this isn't an apostrophe, but a "right single quotation
mark", which is sadly a common sight in such a context, and Emacs tells
me that this (UCS codepoint 0x2019) is represented as E2 80 99 in UTF-8.

Are there good ways to convert such chars to something more reasonable?
The only thing that occurs to me right now is passing it through iconv
to a more limited charset using transliteration (e.g. "iconv -f utf8 -t
iso8859-1//TRANSLIT -c") and then back to the desired encoding and
charset.

(But I suppose if this is already involving perl, then perhaps such a
modification can be done through perl too.)

-- 
Nuno Silva