Deutsch   English   Français   Italiano  
<Text-20240621184010@ram.dialup.fu-berlin.de>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: Re: Decoding bytes to text strings in Python 2
Date: 21 Jun 2024 17:43:13 GMT
Organization: Stefan Ram
Lines: 21
Expires: 1 Feb 2025 11:59:58 GMT
Message-ID: <Text-20240621184010@ram.dialup.fu-berlin.de>
References: <MPG.40dfb14de0110a999896df@news.eternal-september.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de tFjDN6td2uKpGCjzgqeOIg3YmJWC3EiZk9EIxNS4SAqyMY
Cancel-Lock: sha1:jtaWNGKgUUrDDhU2rHvhdecBh3E= sha256:EycplbC7kEe+34X2m2hVDM5P2vlmeqAT1nnIu+SUEEY=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
	Distribution through any means other than regular usenet
	channels is forbidden. It is forbidden to publish this
	article in the Web, to change URIs of this article into links,
        and to transfer the body without this notice, but quotations
        of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
	services to mirror the article in the web. But the article may
	be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Bytes: 2320

Rayner Lucas <usenet202101@magic-cookie.co.ukNOSPAMPLEASE> wrote or quoted:
>What's Python 2 doing here? sys.getdefaultencoding() returns 'ascii', 
>but it's clearly not attempting to display the bytes as ASCII (or 
>cp1252, or ISO-8859-1). How is it deciding on some sort of almost-but-
>not-quite UTF-8 decoding?

  I didn't really do a super thorough deep dive on this,
  but I'm just giving the initial impression without 
  actually being familiar with Tkinter under Python 2,
  so I might be wrong!

  The Text widget typically expects text in Tcl encoding,
  which is usually UTF-8. 

  This is independent of the result returned by sys.get-
  defaultencoding()! 

  If a UTF-8 string is inserted directly as a bytes object,
  its code points will be displayed correctly by the Text 
  widget as long as they are in the BMP (Basic Multilingual
  Plane), as you already found out yourself.