| Deutsch English Français Italiano |
|
<875xp7nwus.fsf@zedat.fu-berlin.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!news.in-chemnitz.de!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: "Loris Bennett" <loris.bennett@fu-berlin.de>
Newsgroups: comp.lang.python
Subject: Re: Printing UTF-8 mail to terminal
Date: Fri, 01 Nov 2024 10:10:03 +0100
Organization: FUB-IT, Freie =?utf-8?Q?Universit=C3=A4t?= Berlin
Lines: 105
Message-ID: <875xp7nwus.fsf@zedat.fu-berlin.de>
References: <878qu49tii.fsf@zedat.fu-berlin.de>
<ZyPtsLSme7IJ-q4j@cskk.homeip.net>
<mailman.63.1730408232.4695.python-list@python.org>
<87msijo2cd.fsf@zedat.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de d0A9IH+Li+s7M7wAQ4teiQvaoAmxBcmdPOpzuGedwo6n+a
Cancel-Lock: sha1:cmZwrIDAcdUi4798XRpapXgmCm8= sha1:F3eo7XCCZDS30Wg/LzckvwylPFU= sha256:DfogCWjb8/NrMJhlFEtx+qapWnDoWnvJ5FhLz2TKgy4=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Bytes: 4089
"Loris Bennett" <loris.bennett@fu-berlin.de> writes:
> Cameron Simpson <cs@cskk.id.au> writes:
>
>> On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
>>>I have a command-line program which creates an email containing German
>>>umlauts. On receiving the mail, my mail client displays the subject and
>>>body correctly:
>> [...]
>>>So far, so good. However, when I use the --verbose option to print
>>>the mail to the terminal via
>>>
>>> if args.verbose:
>>> print(mail)
>>>
>>>I get:
>>>
>>> Subject: Übungsbetreff
>>>
>>> Sehr geehrter Herr Dr. Bennett,
>>>
>>> Dies ist eine =C3=9Cbung.
>>>
>>>What do I need to do to prevent the body from getting mangled?
>>
>> That looks to me like quoted-printable. This is an encoding for binary
>> transport of text to make it robust against not 8-buit clean
>> transports. So your Unicode text is encodings as UTF-8, and then that
>> is encoded in quoted-printable for transport through the email system.
>
> As I mentioned, I think the problem is to do with the way the salutation
> text provided by the "salutation server" and the mail body from a file
> are encoded. This seems to be different.
>
>> Your terminal probably accepts UTF-8 - I imagine other German text
>> renders corectly?
>
> Yes, it does.
>
>> You need to get the text and undo the quoted-printable encoding.
>>
>> If you're using the Python email module to parse (or construct) the
>> message as a `Message` object I'd expect that to happen automatically.
>
> I am using
>
> email.message.EmailMessage
>
> as, from the Python documentation
>
> https://docs.python.org/3/library/email.examples.html
>
> I gathered that that is the standard approach.
>
> And you are right that encoding for the actual mail which is received is
> automatically sorted out. If I display the raw email in my client I get
> the following:
>
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: quoted-printable
> ...
> Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
> ...
> Dies ist eine =C3=9Cbung.
>
> I would interpret that as meaning that the subject and body are encoded
> in the same way.
>
> The problem just occurs with the unsent string representation printed to
> the terminal.
If I log the body like this
body = f"{salutation},\n\n{text}\n{signature}"
logger.debug("body: " + body)
and look at the log file in my terminal I see
2024-11-01 09:59:12,318 - DEBUG - mailer:create_body - body: Sehr geehrter Herr Dr. Bennett,
Dies ist eine Übung.
...
as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessary in order to print the
EmailMessage object to the terminal, such that the quoted-printable
parts are turned (back) into UTF-8?
Cheers,
Loris
--
This signature is currently under constuction.