Deutsch   English   Français   Italiano  
<mailman.84.1717519110.2909.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: dieter.maurer@online.de
Newsgroups: comp.lang.python
Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
Date: Tue, 4 Jun 2024 18:13:47 +0200
Lines: 12
Message-ID: <mailman.84.1717519110.2909.python-list@python.org>
References: <v3am2l$1qf6m$3@dont-email.me>
 <26202.4083.590062.42312@ixdm.fritz.box>
 <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us>
 <mailman.81.1717270463.2909.python-list@python.org>
 <20240603104742.1664b37c@fedora>
 <26207.15675.710915.692146@ixdm.fritz.box>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de pJ/a4/9Fzzf6KeDC4HB3Hw85inBWEGeN4Z8kQ57s5dpQ==
Cancel-Lock: sha1:9zNt1SYZr44rqec4N0kAjv2HvPU= sha256:DEmntPTCxS1YTRgeNtsIZDXhWBIVXtoAhZMIqRBQ48Y=
Return-Path: <dieter.maurer@online.de>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=online.de header.i=dieter.maurer@online.de
 header.b=XmlMwmj3; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.015
X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'received:212.227': 0.07;
 'cc:addr:python-list': 0.09; 'edward': 0.09; 'expression': 0.09;
 'received:212.227.126': 0.09; 'cc:no real name:2**0': 0.14;
 'conversion': 0.16; 'subject: -- ': 0.16; 'subject:words': 0.16;
 'unicode': 0.16; 'cc:addr:python.org': 0.20; 'problem,': 0.22;
 'teach': 0.22; 'lines': 0.23; 'received:de': 0.23; 'cc:2**0':
 0.25; 'example,': 0.28; 'letter,': 0.32;
 'received:kundenserver.de': 0.32; 'received:mout.kundenserver.de':
 0.32; 'header:In-Reply-To:1': 0.34; 'running': 0.34; 'couple':
 0.37; 'received:192.168': 0.37; 'means': 0.38; 'wrote': 0.39;
 'received:212': 0.62; 'order': 0.69; 'subject:From': 0.91;
 'subject:once': 0.91; 'hundred': 0.93
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=online.de;
 s=s42582890; t=1717519107; x=1718123907; i=dieter.maurer@online.de;
 bh=GC3YqqDMA9pBuJv/v+17B8jKtim+PGprscUpK3vKzt0=;
 h=X-UI-Sender-Class:MIME-Version:Content-Type:
 Content-Transfer-Encoding:Message-ID:Date:From:To:Cc:Subject:
 In-Reply-To:References:cc:content-transfer-encoding:content-type:
 date:from:message-id:mime-version:reply-to:subject:to;
 b=XmlMwmj369eWHYSPx3jtdvtQ7NghB+K1sexCBbVivkTnkCYHCzj5s175Yarn7trw
 E/O+p0YglAJob6MS6J7+NM+BaNvvf5Bu548NrCoFNKcOGtaKQ0kbYksAlBb9oH18V
 bR5pIB3eG4OJF6mNumdNaljOF5VA19QDg6Y5AqVXV3JcdhprbJDhH7N+6bsptrxEG
 S4VOBZ/Z3U6SVE1TyGe5eNWKNkqRsPmqthdfp3E1ehmsT5/DJOtOu7bOJkErsUoZa
 AGofbKWPHAO5l5lqoD714NQsnRuIFNcTlOhypTyzmQPks0eqaXa19q4rIxFfQOUXk
 G/KdqpcuDGLfh8XQFQ==
X-UI-Sender-Class: 6003b46c-3fee-4677-9b8b-2b628d989298
In-Reply-To: <20240603104742.1664b37c@fedora>
X-Mailer: VM 8.0.12-devo-585 under 21.4 (patch 24) "Standard C" XEmacs Lucid
 (x86_64-linux-gnu)
X-Provags-ID: V03:K1:YGE7aF/IRFYJas64R/+WK/lkSPtQqht2a1jsaVpl8mDg/0cInpV
 j2Im8NK12e2upUmqkHKboBhaMuErtv5d0s8vDu0HUGCHJOAz3O7VtM+1xRQhNjr9vj2p6pb
 QBkYQ0I10gI+4NZozU9Usv5mu9191lIwW/MZ9hVaxsa0pqAB39wxkXasm93AWBjK/4eW4Zd
 2pHyAn11IAQcrlUjEJQYA==
X-Spam-Flag: NO
UI-OutboundReport: notjunk:1;M01:P0:Ph1T08M8Bsc=;4iMOe4DOOQwqzzgE0c+kw8q57i4
 zYFqOuTKzSVoNjBvkHaIUoMgqJJ6i9V+eZQjWnrhMe4P9q38Hixkn+7+KCOR/RsA7dm3PUNDJ
 gFMJOtjm14xqTElMIu7fz8nEil39DxTTCucjII9E2aza8+5X/ryFmuxYIMJn0SQB1vsOBWBuE
 9iADxUNyWFtaHr0Nr2nQHit8JSbPZrAYok1rCZ6sV89l+B2wONhEo44Z2ys7WvqMO9/B9zjw2
 T2keC2aTU0wXccdUHA+bgRbZEs+2tZPen+Ze4pxArPAeY/UEO04SPkdHQcb87v/tgqz8jq23m
 dVW6gW1Ttle0GoBW2LTxNcnPMix7LtsK3m/wZb7fE7y/DKlzhim/hTrZqU4Hzowi8+V/rSCff
 32M0dq2gw+6wgD8i/jTWxPS2POfRjp8VBW/zuCJKeQoKSsezn3KGl3wmI/19MXyXQRXGTAQMm
 Gp7Mg3tK6auRhOgJXZtWRycX/V95slyXGNXWgp2tihF3goAbm47qbmQkLi/QTKNVkGdJlj/B3
 Q0Q3kQvK25zJ3m6hAJNYz1hwoQRCE6JmVIiEUfG8FFN7mRl4hDnUdcx/XKBpIuSwtyNrjc3EO
 QfxHmoDKlGIBReP4UCY+i3Ejbdq+GKuY1gdlDl0iXthLdGFzZjikzJRxAVvpUXcErsBjYdoir
 T1FMJ7+Fg6Xh5txoFvPANr2aynlTuZtBAM39Ee1drNJCwTwAouIh3lUH89STcFj0z5eG+TkSX
 zWoXeimx9q9Cf1Cwl53mnugJXuAwg8uLy6ZdVwVwyyg+2iy0hd87eg=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <26207.15675.710915.692146@ixdm.fritz.box>
X-Mailman-Original-References: <v3am2l$1qf6m$3@dont-email.me>
 <26202.4083.590062.42312@ixdm.fritz.box>
 <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us>
 <mailman.81.1717270463.2909.python-list@python.org>
 <20240603104742.1664b37c@fedora>
Bytes: 5701

Edward Teach wrote at 2024-6-3 10:47 +0100:
> ...
>The Gutenburg Project publishes "plain text".  That's another problem,
>because "plain text" means UTF-8....and that means unicode...and that
>means running some sort of unicode-to-ascii conversion in order to get
>something like "words".  A couple of hours....a couple of hundred lines
>of C....problem solved!

Unicode supports the notion "owrd" even better "ASCII".
For example, the `\w` (word charavter) regular expression wild card,
works for Unicode like for ASCII (of course with enhanced letter,
digits, punctuation, etc.)