Path: ...!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: "Loris Bennett" Newsgroups: comp.lang.python Subject: Re: Printing UTF-8 mail to terminal Date: Mon, 04 Nov 2024 11:57:37 +0100 Organization: FUB-IT, Freie =?utf-8?Q?Universit=C3=A4t?= Berlin Lines: 110 Message-ID: <875xp3mfku.fsf@zedat.fu-berlin.de> References: <875xp7nwus.fsf@zedat.fu-berlin.de> <87ed3rmg7g.fsf@zedat.fu-berlin.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de K3NFHovAT4eiybs6ZWzXlwQ2SAvDFsVuO+/zgfdq87F6Pw Cancel-Lock: sha1:dCJHonEM2ycvtkvQ3gYP8nEzZa0= sha1:aWb9lRBT+flGz+gjOvbOP1bYa9M= sha256:tuwMZ8fJUZg5Fd3ngrcj0mDobU4VFHGrzypxDdxhooY= User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) Bytes: 4125 "Loris Bennett" writes: > Cameron Simpson writes: > >> On 01Nov2024 10:10, Loris Bennett wrote: >>>as expected. The non-UTF-8 text occurs when I do >>> >>> mail = EmailMessage() >>> mail.set_content(body, cte="quoted-printable") >>> ... >>> >>> if args.verbose: >>> print(mail) >>> >>>which is presumably also correct. >>> >>>The question is: What conversion is necessary in order to print the >>>EmailMessage object to the terminal, such that the quoted-printable >>>parts are turned (back) into UTF-8? >> >> Do you still have access to `body` ? That would be the original >> message text? Otherwise maybe: >> >> print(mail.get_content()) >> >> The objective is to obtain the message body Unicode text (i.e. a >> regular Python string with the original text, unencoded). And to print >> that. > > With the following: > > ###################################################################### > > import email.message > > m = email.message.EmailMessage() > > m['Subject'] = 'Übung' > > m.set_content('Dies ist eine Übung') > print('== cte: default == \n') > print(m) > > print('-- full mail ---') > print(m) > print('-- just content--') > print(m.get_content()) > > m.set_content('Dies ist eine Übung', cte='quoted-printable') > print('== cte: quoted-printable ==\n') > print('-- full mail --') > print(m) > print('-- just content --') > print(m.get_content()) > > ###################################################################### > > I get the following output: > > ###################################################################### > > == cte: default == > > Subject: Übung > Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: base64 > MIME-Version: 1.0 > > RGllcyBpc3QgZWluZSDDnGJ1bmcK > > -- full mail --- > Subject: Übung > Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: base64 > MIME-Version: 1.0 > > RGllcyBpc3QgZWluZSDDnGJ1bmcK > > -- just content-- > Dies ist eine Übung > > == cte: quoted-printable == > > -- full mail -- > Subject: Übung > MIME-Version: 1.0 > Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: quoted-printable > > Dies ist eine =C3=9Cbung > > -- just content -- > Dies ist eine Übung > > ###################################################################### > > So in both cases the subject is fine, but it is unclear to me how to > print the body. Or rather, I know how to print the body OK, but I don't > know how to print the headers separately - there seems to be nothing > like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the > headers, but that seems a little clunky. Sorry, I am confusing the terminology here. The 'body' seems to be the headers plus the 'content'. So I can print the *content* without the headers OK, but I can't easily print all the headers separately. If just print the body, i.e. headers plus content, the umlauts in the content are not resolved. -- This signature is currently under constuction.