Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail From: dieter.maurer@online.de Newsgroups: comp.lang.python Subject: Re: From JoyceUlysses.txt -- words occurring exactly once Date: Tue, 4 Jun 2024 18:13:47 +0200 Lines: 12 Message-ID: References: <26202.4083.590062.42312@ixdm.fritz.box> <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us> <20240603104742.1664b37c@fedora> <26207.15675.710915.692146@ixdm.fritz.box> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de pJ/a4/9Fzzf6KeDC4HB3Hw85inBWEGeN4Z8kQ57s5dpQ== Cancel-Lock: sha1:9zNt1SYZr44rqec4N0kAjv2HvPU= sha256:DEmntPTCxS1YTRgeNtsIZDXhWBIVXtoAhZMIqRBQ48Y= Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=online.de header.i=dieter.maurer@online.de header.b=XmlMwmj3; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.015 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'received:212.227': 0.07; 'cc:addr:python-list': 0.09; 'edward': 0.09; 'expression': 0.09; 'received:212.227.126': 0.09; 'cc:no real name:2**0': 0.14; 'conversion': 0.16; 'subject: -- ': 0.16; 'subject:words': 0.16; 'unicode': 0.16; 'cc:addr:python.org': 0.20; 'problem,': 0.22; 'teach': 0.22; 'lines': 0.23; 'received:de': 0.23; 'cc:2**0': 0.25; 'example,': 0.28; 'letter,': 0.32; 'received:kundenserver.de': 0.32; 'received:mout.kundenserver.de': 0.32; 'header:In-Reply-To:1': 0.34; 'running': 0.34; 'couple': 0.37; 'received:192.168': 0.37; 'means': 0.38; 'wrote': 0.39; 'received:212': 0.62; 'order': 0.69; 'subject:From': 0.91; 'subject:once': 0.91; 'hundred': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=online.de; s=s42582890; t=1717519107; x=1718123907; i=dieter.maurer@online.de; bh=GC3YqqDMA9pBuJv/v+17B8jKtim+PGprscUpK3vKzt0=; h=X-UI-Sender-Class:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-ID:Date:From:To:Cc:Subject: In-Reply-To:References:cc:content-transfer-encoding:content-type: date:from:message-id:mime-version:reply-to:subject:to; b=XmlMwmj369eWHYSPx3jtdvtQ7NghB+K1sexCBbVivkTnkCYHCzj5s175Yarn7trw E/O+p0YglAJob6MS6J7+NM+BaNvvf5Bu548NrCoFNKcOGtaKQ0kbYksAlBb9oH18V bR5pIB3eG4OJF6mNumdNaljOF5VA19QDg6Y5AqVXV3JcdhprbJDhH7N+6bsptrxEG S4VOBZ/Z3U6SVE1TyGe5eNWKNkqRsPmqthdfp3E1ehmsT5/DJOtOu7bOJkErsUoZa AGofbKWPHAO5l5lqoD714NQsnRuIFNcTlOhypTyzmQPks0eqaXa19q4rIxFfQOUXk G/KdqpcuDGLfh8XQFQ== X-UI-Sender-Class: 6003b46c-3fee-4677-9b8b-2b628d989298 In-Reply-To: <20240603104742.1664b37c@fedora> X-Mailer: VM 8.0.12-devo-585 under 21.4 (patch 24) "Standard C" XEmacs Lucid (x86_64-linux-gnu) X-Provags-ID: V03:K1:YGE7aF/IRFYJas64R/+WK/lkSPtQqht2a1jsaVpl8mDg/0cInpV j2Im8NK12e2upUmqkHKboBhaMuErtv5d0s8vDu0HUGCHJOAz3O7VtM+1xRQhNjr9vj2p6pb QBkYQ0I10gI+4NZozU9Usv5mu9191lIwW/MZ9hVaxsa0pqAB39wxkXasm93AWBjK/4eW4Zd 2pHyAn11IAQcrlUjEJQYA== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:Ph1T08M8Bsc=;4iMOe4DOOQwqzzgE0c+kw8q57i4 zYFqOuTKzSVoNjBvkHaIUoMgqJJ6i9V+eZQjWnrhMe4P9q38Hixkn+7+KCOR/RsA7dm3PUNDJ gFMJOtjm14xqTElMIu7fz8nEil39DxTTCucjII9E2aza8+5X/ryFmuxYIMJn0SQB1vsOBWBuE 9iADxUNyWFtaHr0Nr2nQHit8JSbPZrAYok1rCZ6sV89l+B2wONhEo44Z2ys7WvqMO9/B9zjw2 T2keC2aTU0wXccdUHA+bgRbZEs+2tZPen+Ze4pxArPAeY/UEO04SPkdHQcb87v/tgqz8jq23m dVW6gW1Ttle0GoBW2LTxNcnPMix7LtsK3m/wZb7fE7y/DKlzhim/hTrZqU4Hzowi8+V/rSCff 32M0dq2gw+6wgD8i/jTWxPS2POfRjp8VBW/zuCJKeQoKSsezn3KGl3wmI/19MXyXQRXGTAQMm Gp7Mg3tK6auRhOgJXZtWRycX/V95slyXGNXWgp2tihF3goAbm47qbmQkLi/QTKNVkGdJlj/B3 Q0Q3kQvK25zJ3m6hAJNYz1hwoQRCE6JmVIiEUfG8FFN7mRl4hDnUdcx/XKBpIuSwtyNrjc3EO QfxHmoDKlGIBReP4UCY+i3Ejbdq+GKuY1gdlDl0iXthLdGFzZjikzJRxAVvpUXcErsBjYdoir T1FMJ7+Fg6Xh5txoFvPANr2aynlTuZtBAM39Ee1drNJCwTwAouIh3lUH89STcFj0z5eG+TkSX zWoXeimx9q9Cf1Cwl53mnugJXuAwg8uLy6ZdVwVwyyg+2iy0hd87eg= X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <26207.15675.710915.692146@ixdm.fritz.box> X-Mailman-Original-References: <26202.4083.590062.42312@ixdm.fritz.box> <32b20599-1cf1-4aeb-904b-b9afa3dea3a3@wichmann.us> <20240603104742.1664b37c@fedora> Bytes: 5701 Edward Teach wrote at 2024-6-3 10:47 +0100: > ... >The Gutenburg Project publishes "plain text". That's another problem, >because "plain text" means UTF-8....and that means unicode...and that >means running some sort of unicode-to-ascii conversion in order to get >something like "words". A couple of hours....a couple of hundred lines >of C....problem solved! Unicode supports the notion "owrd" even better "ASCII". For example, the `\w` (word charavter) regular expression wild card, works for Unicode like for ASCII (of course with enhanced letter, digits, punctuation, etc.)