Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail From: ram@zedat.fu-berlin.de (Stefan Ram) Newsgroups: comp.lang.python Subject: Re: Decoding bytes to text strings in Python 2 Date: 21 Jun 2024 17:43:13 GMT Organization: Stefan Ram Lines: 21 Expires: 1 Feb 2025 11:59:58 GMT Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de tFjDN6td2uKpGCjzgqeOIg3YmJWC3EiZk9EIxNS4SAqyMY Cancel-Lock: sha1:jtaWNGKgUUrDDhU2rHvhdecBh3E= sha256:EycplbC7kEe+34X2m2hVDM5P2vlmeqAT1nnIu+SUEEY= X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved. Distribution through any means other than regular usenet channels is forbidden. It is forbidden to publish this article in the Web, to change URIs of this article into links, and to transfer the body without this notice, but quotations of parts in other Usenet posts are allowed. X-No-Archive: Yes Archive: no X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some services to mirror the article in the web. But the article may be kept on a Usenet archive server with only NNTP access. X-No-Html: yes Content-Language: en-US Bytes: 2320 Rayner Lucas wrote or quoted: >What's Python 2 doing here? sys.getdefaultencoding() returns 'ascii', >but it's clearly not attempting to display the bytes as ASCII (or >cp1252, or ISO-8859-1). How is it deciding on some sort of almost-but- >not-quite UTF-8 decoding? I didn't really do a super thorough deep dive on this, but I'm just giving the initial impression without actually being familiar with Tkinter under Python 2, so I might be wrong! The Text widget typically expects text in Tcl encoding, which is usually UTF-8. This is independent of the result returned by sys.get- defaultencoding()! If a UTF-8 string is inserted directly as a bytes object, its code points will be displayed correctly by the Text widget as long as they are in the BMP (Basic Multilingual Plane), as you already found out yourself.