Deutsch English Français Italiano |
<mailman.162.1719185446.2909.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico <rosuav@gmail.com> Newsgroups: comp.lang.python Subject: Re: Decoding bytes to text strings in Python 2 Date: Mon, 24 Jun 2024 09:30:30 +1000 Lines: 55 Message-ID: <mailman.162.1719185446.2909.python-list@python.org> References: <MPG.40dfb14de0110a999896df@news.eternal-september.org> <CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com> <mailman.159.1718991773.2909.python-list@python.org> <MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org> <CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: news.uni-berlin.de HTZaXDl0QzsZgW+mYQkGzQSSYeEticw3bu6cAvMkgYiA== Cancel-Lock: sha1:SLy+iQ7zQyoxJSd6G0f77jZ6+RI= sha256:1IBxcdQiOKRMe1d0a7onUFn6CoMOLuJ95p28Kz9oBZU= Return-Path: <rosuav@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=gmail.com header.i=@gmail.com header.b=k99/U3ae; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'looks': 0.02; 'generated': 0.03; 'error:': 0.05; 'fairly': 0.05; 'hopefully': 0.07; 'tkinter': 0.07; 'utf-8': 0.07; 'characters,': 0.09; 'converting': 0.09; 'debian': 0.09; 'linux': 0.09; 'subject:Python': 0.12; 'problem.': 0.15; '(because': 0.16; '(when': 0.16; '*think*': 0.16; '2024': 0.16; 'chrisa': 0.16; 'encoding': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'help!': 0.16; 'interpreter': 0.16; 'solved': 0.16; 'text)': 0.16; 'using.': 0.16; 'windows.': 0.16; 'wrote:': 0.16; 'problem': 0.16; 'python': 0.16; 'instead': 0.17; 'to:addr:python-list': 0.20; 'skip:_ 10': 0.22; 'install': 0.23; 'code': 0.23; 'idea': 0.24; '(and': 0.25; 'python,': 0.25; 'seems': 0.26; 'jun': 0.26; "isn't": 0.27; 'else': 0.27; 'old': 0.27; 'error': 0.29; "doesn't": 0.32; 'grateful': 0.32; 'python-list': 0.32; "wouldn't": 0.32; 'message-id:@mail.gmail.com': 0.32; 'but': 0.32; "i'm": 0.33; 'windows': 0.34; 'able': 0.34; 'header:In-Reply- To:1': 0.34; 'received:google.com': 0.34; 'trying': 0.35; 'from:addr:gmail.com': 0.35; 'built': 0.36; 'display': 0.36; 'mon,': 0.36; 'system,': 0.36; 'really': 0.37; "it's": 0.37; 'way': 0.38; 'least': 0.39; 'use': 0.39; 'rest': 0.39; 'still': 0.40; 'case.': 0.40; 'something': 0.40; 'should': 0.40; 'above': 0.62; 'true': 0.63; 'email addr:gmail.com': 0.63; 'once': 0.63; 'range': 0.64; 'your': 0.64; 'look': 0.65; 'that,': 0.67; 'order': 0.69; 'hybrid': 0.69; 'soon!': 0.70; 'longer': 0.71; 'deal': 0.73; "you'll": 0.73; 'article': 0.73; 'operate': 0.75; '(that': 0.84; 'bothered': 0.84; 'characters': 0.84; 'converts': 0.84; 'ultimately': 0.84; 'curiosity': 0.91; 'lucas': 0.91; 'reliable.': 0.91; 'migrate': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719185443; x=1719790243; darn=python.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=qqQT23snBArHo7idTPzAiAMfLZYznF2LisAbawz1KfA=; b=k99/U3aerZSa8L38hwEfm5VTaAxS3xVsK5aKxPceZPbs8DdW2Ve+JNjxbdGveiUfuZ qYBn1tav1ZYJRx7b3z6J8U32Q7ZvWD+DaPQ2AU3A9zBhyE3KMYEP1znPnMz507kukTFz /iW8LG/PZ1GGCZVsEluSZGCUFcaRyMVVxyOnyIdFHomHq/h7JdTPbJ530SVJaQUJexgH JR48WvVu6GsoolCl/NSOVHXnkLkYqNtDhbmOtFe1K01vUNUiOg6OLmn7+SfonXNKcafw 785WewjnX3fjlfGGEUwlHsKrQkm+6kB/OhcWe31XtkCCSLJjBk3nCKX+ymG4OeP96yg/ zJkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719185443; x=1719790243; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qqQT23snBArHo7idTPzAiAMfLZYznF2LisAbawz1KfA=; b=bjkCbdE/9elLw1aVoWOMLxiJ1zs+v7DduPTdcTKVU/dOwMD8ovDcJgoj/joi16UpAb DSFz1i7NNQFNC0u1F/ynxP5xnGJ2F6miGbMj9xpGXRA1G+U32rUA83XUi+mVoA1jX6Zq 6JMlr5oKLv57BsJDjFYmkme+VQiN/YYjiYzO5UZuQu2hf3Yt3E2XEYeFNkivhaf0/GVY B31qBHryth6z+onjEqNotm/uHWbnSp9iJODjFTKZFqmo03NrVaJ9PuS1LYySKiR3cehJ Qmgt8SjVhwHr8a0xtb3/vNsnAFkq71M3T7hFly/XoJ3/HC7iw7pUWugIw+xjemLD2BrJ xk1Q== X-Gm-Message-State: AOJu0Yw/iYXOYgPvSBMJmV9Z0M5auHcvYGuC6NxYXYKIT6Tv9CJ7bV/8 ayJcgidp5wbsc1KTgwd2S/BppanTXNGZ/q1qd+t/I8crpJfbwNHDQ4TiWLfDcYrNkBm5ZN3GfYm /kwEID6IiPJUmdUxTcjxIPWrOdZQB4g== X-Google-Smtp-Source: AGHT+IEp1funNj06euhDerW1Le95U9KIajE2js8E0js4289Jh6W/o62XrUIhWzrEPHbHiuQclKyhLWffzED/LiCrpM8= X-Received: by 2002:a05:6512:3449:b0:52c:dbf9:7e54 with SMTP id 2adb3069b0e04-52ce1836330mr1716246e87.41.1719185442277; Sun, 23 Jun 2024 16:30:42 -0700 (PDT) In-Reply-To: <MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <CAPTjJmrOfrz0RoYO9nhB+d+u7m0QLqYqfO2d4nSa4HG+LeLUdg@mail.gmail.com> X-Mailman-Original-References: <MPG.40dfb14de0110a999896df@news.eternal-september.org> <CAPTjJmpAYU2yxUhJd2mG4vkkK7JsViyF+7oat_Gw=AmfNi=A8g@mail.gmail.com> <mailman.159.1718991773.2909.python-list@python.org> <MPG.40e0d04661dcc7cf9896e0@news.eternal-september.org> Bytes: 8330 On Mon, 24 Jun 2024 at 08:20, Rayner Lucas via Python-list <python-list@python.org> wrote: > > In article <mailman.159.1718991773.2909.python-list@python.org>, > rosuav@gmail.com says... > > > > If you switch to a Linux system, it should work correctly, and you'll > > be able to migrate the rest of the way onto Python 3. Once you achieve > > that, you'll be able to operate on Windows or Linux equivalently, > > since Python 3 solved this problem. At least, I *think* it will; my > > current system has a Python 2 installed, but doesn't have tkinter > > (because I never bothered to install it), and it's no longer available > > from the upstream Debian repos, so I only tested it in the console. > > But the decoding certainly worked. > > Thank you for the idea of trying it on a Linux system. I did so, and my > example code generated the error: > > _tkinter.TclError: character U+1f40d is above the range (U+0000-U+FFFF) > allowed by Tcl > > So it looks like the problem is ultimately due to a limitation of > Tcl/Tk. Yep, that seems to be the case. Not sure if that's still true on a more recent Python, but it does look like you won't get astral characters in tkinter on the one you're using. > I'm still not sure why it doesn't give an error on Windows and Because of the aforementioned weirdness of old (that is: pre-3.3) Python versions on Windows. They were built to use a messy, buggy hybrid of UCS-2 and UTF-16. Sometimes this got you around problems, or at least masked them; but it wouldn't be reliable. That's why, in Python 3.3, all that was fixed :) > instead either works (when UTF-8 encoding is specified) or converts the > out-of-range characters to ones it can display (when the encoding isn't > specified). But now I know what the root of the problem is, I can deal > with it appropriately (and my curiosity is at least partly satisfied). Converting out-of-range characters is fairly straightforward, at least as long as your Python interpreter is correctly built (so, Python 3, or a Linux build of Python 2). "".join(c if ord(c) < 65536 else "?" for c in text) > This has given me a much better understanding of what I need to do in > order to migrate to Python 3 and add proper support for non-ASCII > characters, so I'm very grateful for your help! > Excellent. Hopefully all this mess is just a transitional state and you'll get to something that REALLY works, soon! ChrisA