Deutsch English Français Italiano |
<mailman.83.1717441107.2909.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Grant Edwards <grant.b.edwards@gmail.com> Newsgroups: comp.lang.python Subject: Re: From JoyceUlysses.txt -- words occurring exactly once Date: Mon, 03 Jun 2024 14:58:26 -0400 (EDT) Lines: 14 Message-ID: <mailman.83.1717441107.2909.python-list@python.org> References: <v3am2l$1qf6m$3@dont-email.me> <26202.4083.590062.42312@ixdm.fritz.box> <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us> <mailman.81.1717270463.2909.python-list@python.org> <20240603104742.1664b37c@fedora> <4VtNKZ70YdznVGW@mail.python.org> X-Trace: news.uni-berlin.de PGnm2PorituhI1bgi5A0zwjPb8Pud2BnrlsrVxN6DhHQ== Cancel-Lock: sha1:cDcvjmujbaSBcvnyp1zME3r0peY= sha256:t5ZBfBYKxhugaOUseR4loukCiqnA4gL48zQT7z2RpPs= Return-Path: <grant.b.edwards@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=none reason="no signature"; dkim-adsp=none (unprotected policy); dkim-atps=neutral X-Spam-Status: OK 0.118 X-Spam-Level: * X-Spam-Evidence: '*H*': 0.79; '*S*': 0.03; 'edward': 0.09; 'conversion': 0.16; 'from:addr:grant.b.edwards': 0.16; 'from:name:grant edwards': 0.16; 'subject: -- ': 0.16; 'subject:words': 0.16; 'unicode': 0.16; 'wrote:': 0.16; 'to:addr :python-list': 0.20; 'problem,': 0.22; 'teach': 0.22; 'lines': 0.23; 'python,': 0.25; 'header:User-Agent:1': 0.30; 'python-list': 0.32; "i'm": 0.33; 'running': 0.34; 'from:addr:gmail.com': 0.35; 'couple': 0.37; 'means': 0.38; 'read': 0.38; 'something': 0.40; 'back': 0.67; 'message-id:invalid': 0.68; 'right': 0.68; 'order': 0.69; 'converted': 0.84; 'subject:From': 0.91; 'subject:once': 0.91; 'hundred': 0.93 User-Agent: slrn/1.0.3 (Linux) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <4VtNKZ70YdznVGW@mail.python.org> X-Mailman-Original-References: <v3am2l$1qf6m$3@dont-email.me> <26202.4083.590062.42312@ixdm.fritz.box> <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us> <mailman.81.1717270463.2909.python-list@python.org> <20240603104742.1664b37c@fedora> Bytes: 3321 On 2024-06-03, Edward Teach via Python-list <python-list@python.org> wrote: > The Gutenburg Project publishes "plain text". That's another > problem, because "plain text" means UTF-8....and that means > unicode...and that means running some sort of unicode-to-ascii > conversion in order to get something like "words". A couple of > hours....a couple of hundred lines of C....problem solved! I'm curious. Why does it need to be converted frum Unicode to ASCII? When you read it into Python, it gets converted right back to Unicode...