Deutsch   English   Français   Italiano  
<mailman.84.1730841650.4695.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: Cameron Simpson <cs@cskk.id.au>
Newsgroups: comp.lang.python
Subject: Re: Printing UTF-8 mail to terminal
Date: Wed, 6 Nov 2024 08:20:44 +1100
Lines: 38
Message-ID: <mailman.84.1730841650.4695.python-list@python.org>
References: <871pzrmcky.fsf@zedat.fu-berlin.de>
 <ZyqMLOUxvnwARS2e@cskk.homeip.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
X-Trace: news.uni-berlin.de jIbULRnIjMmC8kIz6LzvqQ/9DA9+55Ijbk2TzNf74JuA==
Cancel-Lock: sha1:8ginHsUQk9nkalDSVgVUAqnZqXI= sha256:AcLfFAjFJI7MMBw2Lso/QyctG4rCvy6fSJfGugZeuMk=
Return-Path: <cameron@cskk.id.au>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
 dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.005
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'this:': 0.03; 'string':
 0.07; 'cc:addr:python-list': 0.09; 'set.': 0.09; 'cheers,': 0.11;
 'cc:no real name:2**0': 0.14; '(even': 0.16; 'bennett': 0.16;
 'cameron': 0.16; 'encoding': 0.16; 'encoding.': 0.16;
 'from:addr:cs': 0.16; 'from:addr:cskk.id.au': 0.16;
 'from:name:cameron simpson': 0.16; 'message-id:@cskk.homeip.net':
 0.16; 'received:13.237': 0.16; 'received:13.237.201': 0.16;
 'received:13.237.201.189': 0.16; 'received:cskk.id.au': 0.16;
 'received:id.au': 0.16; 'received:mail.cskk.id.au': 0.16;
 'simpson': 0.16; 'wildly': 0.16; 'wrote:': 0.16; "can't": 0.17;
 'cc:addr:python.org': 0.20; 'cc:2**0': 0.25; 'bit': 0.27;
 'present': 0.30; 'whole': 0.30; 'header:User-Agent:1': 0.30;
 "doesn't": 0.32; 'requiring': 0.32; 'but': 0.32; 'header:In-Reply-
 To:1': 0.34; 'printing': 0.34; 'meaning': 0.35; 'received:au':
 0.35; "it's": 0.37; 'example': 0.37; 'means': 0.38; 'text': 0.39;
 'valid': 0.39; 'want': 0.40; 'identified': 0.62; 'subject': 0.63;
 'email': 0.63; 'received:13': 0.64; 'thus': 0.64;
 'received:userid': 0.66; 'further': 0.69; 'content': 0.72;
 'little': 0.73; 'lines,': 0.84; 'surprised': 0.84; 'subject:UTF':
 0.91; 'subject:mail': 0.95
Mail-Followup-To: Loris Bennett <loris.bennett@fu-berlin.de>,
 python-list@python.org
Content-Disposition: inline
In-Reply-To: <871pzrmcky.fsf@zedat.fu-berlin.de>
User-Agent: Mutt/2.2.13 (2024-03-09)
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <ZyqMLOUxvnwARS2e@cskk.homeip.net>
X-Mailman-Original-References: <871pzrmcky.fsf@zedat.fu-berlin.de>
Bytes: 4722

On 04Nov2024 13:02, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>OK, so I can do:
>
>######################################################################
>if args.verbose:
>    for k in mail.keys():
>        print(f"{k}: {mail.get(k)}")
>    print('')
>    print(mail.get_content())
>######################################################################
>
>prints what I want and is not wildly clunky, but I am a little surprised
>that I can't get a string representation of the whole email in one go.

A string representation of the whole message needs to be correctly 
encoded so that its components can be identified mechanically. So it 
needs to be a syntacticly valid RFC5322 message. Thus the encoding.

As an example (slightly contrived) of why this is important, multipart 
messages are delimited with distinct lines, and their content may not 
present such a line (even f it's in the "raw" original data).

So printing a whole message transcribes it in the encoded form so that 
it can be decoded mechanically. And conservativly, this is usually an 
ASCII compatibly encoding so that it can traverse various systems 
undamaged. This means the text requiring UTF8 encoding get further 
encoded as quoted printable to avoid ambiguity about the meaning of 
bytes/octets which have their high bit set.

BTW, doesn't this:

     for k in mail.keys():
         print(f"{k}: {mail.get(k)}")

print the quoted printable (i.e. not decoded) form of subject lines?

Cheers,
Cameron Simpson <cs@cskk.id.au>