| Deutsch English Français Italiano |
|
<decode_header-20250104154914@ram.dialup.fu-berlin.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.nobody.at!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: How to manage accented characters in mail header?
Date: 4 Jan 2025 14:49:38 GMT
Organization: Stefan Ram
Lines: 56
Expires: 1 Jan 2026 11:59:58 GMT
Message-ID: <decode_header-20250104154914@ram.dialup.fu-berlin.de>
References: <satn4l-6sqh.ln1@q957.zbmc.eu>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de QsY0BrpMt8eJKzMIpqRjFQtf+P7iJ8hioMhHWM5vvHAPHK
Cancel-Lock: sha1:MWrj05wheRgblKWC1UBRlrDqfas= sha256:860dXZQ7d0CJ/Zbtn8ICCHtCl+ssLJol2WtNC43eClA=
X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Bytes: 3400
Chris Green <cl@isbd.net> wrote or quoted:
>From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr>
In Python, when you roll with decode_header from the email.header
module, it spits out a list of parts, where each part is like
a tuple of (decoded string, charset). To smash these decoded
sections into one string, you’ll want to loop through the list,
decode each piece (if it needs it), and then throw them together.
Here’s a straightforward example of how to pull this off:
from email.header import decode_header
# Example header
header_example = \
'From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= <sebastien.crignon@amvs.fr>'
# Decode the header
decoded_parts = decode_header(header_example)
# Kick off an empty list for the decoded strings
decoded_strings = []
for part, charset in decoded_parts:
if isinstance(part, bytes):
# Decode the bytes to a string using the charset
decoded_string = part.decode(charset or 'utf-8')
else:
# If it’s already a string, just roll with it
decoded_string = part
decoded_strings.append(decoded_string)
# Join the parts into a single string
final_string = ''.join(decoded_strings)
print(final_string)# From: Sébastien Crignon <sebastien.crignon@amvs.fr>
Breakdown
decode_header(header_example): This line takes your email header
and breaks it down into a list of tuples.
Looping through decoded_parts: You check if each part is in
bytes. If it is, you decode it using whatever charset it’s
got (defaulting to 'utf-8' if it’s a little vague).
Appending Decoded Strings: You toss each decoded part into a list.
Joining Strings: Finally, you use ''.join(decoded_strings) to glue
all the decoded strings into a single, coherent piece.
Just a Heads Up
Keep an eye out for cases where the charset might be None. In those
moments, it’s smart to fall back to 'utf-8' or something safe.