Deutsch English Français Italiano |
<vvssf0$13ls6$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Nuno Silva <nunojsilva@invalid.invalid> Newsgroups: comp.unix.shell Subject: Re: How to convert <binaryGlowMixedWithASCII> to pure ASCII Date: Mon, 12 May 2025 14:18:24 +0100 Organization: A noiseless patient Spider Lines: 39 Message-ID: <vvssf0$13ls6$1@dont-email.me> References: <vv8asa$2nscb$1@news.xmission.com> <slrn1023i84.2s2es.cmartin+usenetYYMMDD@nyx2.nyx.net> <vvsp2r$33sch$1@news.xmission.com> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Mon, 12 May 2025 15:18:25 +0200 (CEST) Injection-Info: dont-email.me; posting-host="ee5cae4ee37f8bab3d9175f0bc39da5e"; logging-data="1169286"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19tlurIz+Z3Angs0xI0TE/A" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) Cancel-Lock: sha1:Lp8mwD1ML3wRUuuWKwtm4Lw1m2o= On 2025-05-12, Kenny McCormack wrote: > In article <slrn1023i84.2s2es.cmartin+usenetYYMMDD@nyx2.nyx.net>, > Chuck Martin <cmartin+usenetYYMMDD@nyx.net> wrote: > ... >>Try piping it into the following command: >> >>perl -CS -MEncode -ne 'print decode("MIME-Header", $_)' > > Bingo! Thanks very much. > > That worked well on the other example I gave in the OP too (which was > actually more the focus of my query). > > I still get the binary glop in place of the single quotes (The text > contains the word "They're", rendered as "They<binaryglop>re"). Note that > in the input string, <binaryglop> is (the literal string): =E2=80=99 > which gets converted by your Perl program into the 3 > characters with octal codes (as displayed by "od -bc"): 342 200 231 > > I can deal with this later problem myself via brute force with AWK, but it > would be nice if I didn't have to - i.e., if there were a complete solution > (i.e., one that does also the other half of the job). My guess is that this isn't an apostrophe, but a "right single quotation mark", which is sadly a common sight in such a context, and Emacs tells me that this (UCS codepoint 0x2019) is represented as E2 80 99 in UTF-8. Are there good ways to convert such chars to something more reasonable? The only thing that occurs to me right now is passing it through iconv to a more limited charset using transliteration (e.g. "iconv -f utf8 -t iso8859-1//TRANSLIT -c") and then back to the desired encoding and charset. (But I suppose if this is already involving perl, then perhaps such a modification can be done through perl too.) -- Nuno Silva