Deutsch   English   Français   Italiano  
<mailman.83.1717441107.2909.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Grant Edwards <grant.b.edwards@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
Date: Mon, 03 Jun 2024 14:58:26 -0400 (EDT)
Lines: 14
Message-ID: <mailman.83.1717441107.2909.python-list@python.org>
References: <v3am2l$1qf6m$3@dont-email.me>
 <26202.4083.590062.42312@ixdm.fritz.box>
 <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us>
 <mailman.81.1717270463.2909.python-list@python.org>
 <20240603104742.1664b37c@fedora> <4VtNKZ70YdznVGW@mail.python.org>
X-Trace: news.uni-berlin.de PGnm2PorituhI1bgi5A0zwjPb8Pud2BnrlsrVxN6DhHQ==
Cancel-Lock: sha1:cDcvjmujbaSBcvnyp1zME3r0peY= sha256:t5ZBfBYKxhugaOUseR4loukCiqnA4gL48zQT7z2RpPs=
Return-Path: <grant.b.edwards@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=none reason="no signature";
 dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.118
X-Spam-Level: *
X-Spam-Evidence: '*H*': 0.79; '*S*': 0.03; 'edward': 0.09;
 'conversion': 0.16; 'from:addr:grant.b.edwards': 0.16;
 'from:name:grant edwards': 0.16; 'subject: -- ': 0.16;
 'subject:words': 0.16; 'unicode': 0.16; 'wrote:': 0.16; 'to:addr
 :python-list': 0.20; 'problem,': 0.22; 'teach': 0.22; 'lines':
 0.23; 'python,': 0.25; 'header:User-Agent:1': 0.30; 'python-list':
 0.32; "i'm": 0.33; 'running': 0.34; 'from:addr:gmail.com': 0.35;
 'couple': 0.37; 'means': 0.38; 'read': 0.38; 'something': 0.40;
 'back': 0.67; 'message-id:invalid': 0.68; 'right': 0.68; 'order':
 0.69; 'converted': 0.84; 'subject:From': 0.91; 'subject:once':
 0.91; 'hundred': 0.93
User-Agent: slrn/1.0.3 (Linux)
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <4VtNKZ70YdznVGW@mail.python.org>
X-Mailman-Original-References: <v3am2l$1qf6m$3@dont-email.me>
 <26202.4083.590062.42312@ixdm.fritz.box>
 <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us>
 <mailman.81.1717270463.2909.python-list@python.org>
 <20240603104742.1664b37c@fedora>
Bytes: 3321

On 2024-06-03, Edward Teach via Python-list <python-list@python.org> wrote:

> The Gutenburg Project publishes "plain text".  That's another
> problem, because "plain text" means UTF-8....and that means
> unicode...and that means running some sort of unicode-to-ascii
> conversion in order to get something like "words".  A couple of
> hours....a couple of hundred lines of C....problem solved!

I'm curious.  Why does it need to be converted frum Unicode to ASCII?

When you read it into Python, it gets converted right back to Unicode...