Deutsch English Français Italiano |
<875xp7nwus.fsf@zedat.fu-berlin.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!news.in-chemnitz.de!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: "Loris Bennett" <loris.bennett@fu-berlin.de> Newsgroups: comp.lang.python Subject: Re: Printing UTF-8 mail to terminal Date: Fri, 01 Nov 2024 10:10:03 +0100 Organization: FUB-IT, Freie =?utf-8?Q?Universit=C3=A4t?= Berlin Lines: 105 Message-ID: <875xp7nwus.fsf@zedat.fu-berlin.de> References: <878qu49tii.fsf@zedat.fu-berlin.de> <ZyPtsLSme7IJ-q4j@cskk.homeip.net> <mailman.63.1730408232.4695.python-list@python.org> <87msijo2cd.fsf@zedat.fu-berlin.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de d0A9IH+Li+s7M7wAQ4teiQvaoAmxBcmdPOpzuGedwo6n+a Cancel-Lock: sha1:cmZwrIDAcdUi4798XRpapXgmCm8= sha1:F3eo7XCCZDS30Wg/LzckvwylPFU= sha256:DfogCWjb8/NrMJhlFEtx+qapWnDoWnvJ5FhLz2TKgy4= User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) Bytes: 4089 "Loris Bennett" <loris.bennett@fu-berlin.de> writes: > Cameron Simpson <cs@cskk.id.au> writes: > >> On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >>>I have a command-line program which creates an email containing German >>>umlauts. On receiving the mail, my mail client displays the subject and >>>body correctly: >> [...] >>>So far, so good. However, when I use the --verbose option to print >>>the mail to the terminal via >>> >>> if args.verbose: >>> print(mail) >>> >>>I get: >>> >>> Subject: Übungsbetreff >>> >>> Sehr geehrter Herr Dr. Bennett, >>> >>> Dies ist eine =C3=9Cbung. >>> >>>What do I need to do to prevent the body from getting mangled? >> >> That looks to me like quoted-printable. This is an encoding for binary >> transport of text to make it robust against not 8-buit clean >> transports. So your Unicode text is encodings as UTF-8, and then that >> is encoded in quoted-printable for transport through the email system. > > As I mentioned, I think the problem is to do with the way the salutation > text provided by the "salutation server" and the mail body from a file > are encoded. This seems to be different. > >> Your terminal probably accepts UTF-8 - I imagine other German text >> renders corectly? > > Yes, it does. > >> You need to get the text and undo the quoted-printable encoding. >> >> If you're using the Python email module to parse (or construct) the >> message as a `Message` object I'd expect that to happen automatically. > > I am using > > email.message.EmailMessage > > as, from the Python documentation > > https://docs.python.org/3/library/email.examples.html > > I gathered that that is the standard approach. > > And you are right that encoding for the actual mail which is received is > automatically sorted out. If I display the raw email in my client I get > the following: > > Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: quoted-printable > ... > Subject: =?utf-8?q?=C3=9Cbungsbetreff?= > ... > Dies ist eine =C3=9Cbung. > > I would interpret that as meaning that the subject and body are encoded > in the same way. > > The problem just occurs with the unsent string representation printed to > the terminal. If I log the body like this body = f"{salutation},\n\n{text}\n{signature}" logger.debug("body: " + body) and look at the log file in my terminal I see 2024-11-01 09:59:12,318 - DEBUG - mailer:create_body - body: Sehr geehrter Herr Dr. Bennett, Dies ist eine Übung. ... as expected. The non-UTF-8 text occurs when I do mail = EmailMessage() mail.set_content(body, cte="quoted-printable") ... if args.verbose: print(mail) which is presumably also correct. The question is: What conversion is necessary in order to print the EmailMessage object to the terminal, such that the quoted-printable parts are turned (back) into UTF-8? Cheers, Loris -- This signature is currently under constuction.