Path: ...!news.nobody.at!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: ram@zedat.fu-berlin.de (Stefan Ram) Newsgroups: comp.lang.python Subject: Re: How to manage accented characters in mail header? Date: 4 Jan 2025 14:49:38 GMT Organization: Stefan Ram Lines: 56 Expires: 1 Jan 2026 11:59:58 GMT Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de QsY0BrpMt8eJKzMIpqRjFQtf+P7iJ8hioMhHWM5vvHAPHK Cancel-Lock: sha1:MWrj05wheRgblKWC1UBRlrDqfas= sha256:860dXZQ7d0CJ/Zbtn8ICCHtCl+ssLJol2WtNC43eClA= X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved. Distribution through any means other than regular usenet channels is forbidden. It is forbidden to publish this article in the Web, to change URIs of this article into links, and to transfer the body without this notice, but quotations of parts in other Usenet posts are allowed. X-No-Archive: Yes Archive: no X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some services to mirror the article in the web. But the article may be kept on a Usenet archive server with only NNTP access. X-No-Html: yes Content-Language: en-US Bytes: 3400 Chris Green wrote or quoted: >From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= In Python, when you roll with decode_header from the email.header module, it spits out a list of parts, where each part is like a tuple of (decoded string, charset). To smash these decoded sections into one string, you’ll want to loop through the list, decode each piece (if it needs it), and then throw them together. Here’s a straightforward example of how to pull this off: from email.header import decode_header # Example header header_example = \ 'From: =?utf-8?B?U8OpYmFzdGllbiBDcmlnbm9u?= ' # Decode the header decoded_parts = decode_header(header_example) # Kick off an empty list for the decoded strings decoded_strings = [] for part, charset in decoded_parts: if isinstance(part, bytes): # Decode the bytes to a string using the charset decoded_string = part.decode(charset or 'utf-8') else: # If it’s already a string, just roll with it decoded_string = part decoded_strings.append(decoded_string) # Join the parts into a single string final_string = ''.join(decoded_strings) print(final_string)# From: Sébastien Crignon Breakdown decode_header(header_example): This line takes your email header and breaks it down into a list of tuples. Looping through decoded_parts: You check if each part is in bytes. If it is, you decode it using whatever charset it’s got (defaulting to 'utf-8' if it’s a little vague). Appending Decoded Strings: You toss each decoded part into a list. Joining Strings: Finally, you use ''.join(decoded_strings) to glue all the decoded strings into a single, coherent piece. Just a Heads Up Keep an eye out for cases where the charset might be None. In those moments, it’s smart to fall back to 'utf-8' or something safe.