Deutsch English Français Italiano |
<mailman.93.1717699659.2909.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: Thomas Passin <list1@tompassin.net> Newsgroups: comp.lang.python Subject: Re: From JoyceUlysses.txt -- words occurring exactly once Date: Wed, 5 Jun 2024 07:10:19 -0400 Lines: 85 Message-ID: <mailman.93.1717699659.2909.python-list@python.org> References: <v3am2l$1qf6m$3@dont-email.me> <aef0bc5c-b0b6-4d7d-af05-cc22c165f327@DancesWithMice.info> <mailman.74.1717103931.2909.python-list@python.org> <v3bcgu$229eq$1@dont-email.me> <3dedbc3b-7db0-4a39-863f-56324d434b12@DancesWithMice.info> <8409fd89-8b42-43c4-8511-704d57b3a4be@tompassin.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de CKIbykCGcB194PY/GaRrHQ4B2fsnWfg8+/mljxV7F7rA== Cancel-Lock: sha1:js8ErNhaGGJ+/1i1kOyg6Yqyl8o= sha256:kK4Br7WRlmk28E1oVXktg6z+wQqX6o0TN/5vuwZmo04= Return-Path: <list1@tompassin.net> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=tompassin.net header.i=@tompassin.net header.b=nUnuuDQ1; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: UNSURE 0.206 X-Spam-Level: ** X-Spam-Evidence: '*H*': 0.59; '*S*': 0.00; 'coders': 0.05; 'skip:\xc2 30': 0.07; 'tests': 0.07; 'hyphenated': 0.09; 'insist': 0.09; 'skip:\xc2 20': 0.09; 'import': 0.15; '2.\xc2\xa0': 0.16; '>>>>': 0.16; 'nuances': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16; 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16; 'received:dreamhost.com': 0.16; 'reminded': 0.16; 'skip:\xc2 60': 0.16; 'solved': 0.16; 'subject: -- ': 0.16; 'subject:words': 0.16; 'tests,': 0.16; 'wrote:': 0.16; 'python': 0.16; "can't": 0.17; 'pm,': 0.19; 'to:addr:python-list': 0.20; 'issue': 0.21; 'integration': 0.22; 'code': 0.23; "i'd": 0.24; '(and': 0.25; 'python,': 0.25; 'programming': 0.25; 'listing': 0.26; 'else': 0.27; '>>>': 0.28; 'teacher': 0.28; 'header:User-Agent:1': 0.30; 'attempt': 0.31; 'code,': 0.31; 'am,': 0.31; 'program': 0.31; 'do.': 0.32; 'python-list': 0.32; 'realize': 0.32; 'received:10.0': 0.32; 'received:mailchannels.net': 0.32; 'received:relay.mailchannels.net': 0.32; 'split': 0.32; 'skip:2 10': 0.32; 'but': 0.32; "i'm": 0.33; 'there': 0.33; 'someone': 0.34; 'able': 0.34; 'header:In-Reply-To:1': 0.34; 'words': 0.35; 'also,': 0.36; 'possibly': 0.36; 'using': 0.37; "it's": 0.37; 'hard': 0.37; 'this.': 0.37; 'file': 0.38; 'could': 0.38; 'text': 0.39; 'otherwise': 0.39; 'list': 0.39; 'use': 0.39; 'decide': 0.39; 'finding': 0.39; 'define': 0.40; 'learn': 0.40; 'try': 0.40; 'should': 0.40; 'lack': 0.60; 'url-ip:104.21/16': 0.61; 'seen': 0.62; 'skip:\xc2 10': 0.62; 'here': 0.62; 'come': 0.62; 'skip:b 10': 0.63; 'our': 0.64; 'complete': 0.64; 'skip:r 20': 0.64; 'clear': 0.64; 'full': 0.64; 're:': 0.64; 'years': 0.65; 'back': 0.67; 'header:Received:6': 0.67; 'received:64': 0.67; 'per': 0.68; 'exactly': 0.68; 'acceptable': 0.69; 'acceptance': 0.69; 'clarity': 0.69; 'counter': 0.69; 'manner': 0.69; 'times': 0.69; 'truly': 0.70; 'interesting': 0.71; 'history': 0.75; '8bit%:100': 0.76; '(you': 0.76; 'supposed': 0.76; 'treat': 0.76; 'seek': 0.81; 'unit': 0.81; 'counter.': 0.84; 'initiative,': 0.84; 'novel': 0.84; 'occurring': 0.84; 'url:blogs': 0.84; 'sad': 0.91; 'subject:From': 0.91; 'subject:once': 0.91; 'will.': 0.91; 'aspects': 0.93; 'ibm': 0.95 X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1717585820; a=rsa-sha256; cv=none; b=DhvfwQvygwK0fAubR9mnMVK5XTdAcynfoBsdYs5TTWCj77pLMOd5RVYyGQS3nYVzHrgjk2 +nkaBDSNgIZdTkl/oY/7Mcb/VV8e9UjAJlBVE3+4oEQcmrdlR/YV28dx+FiUQwwyg6B/Wn LWxNCIY30ppZeQWbh6bZO8EXApZK9q/vlsPT+5jopgg63E4ZSUaa2toqciDk7FBf+t8KuX R9u9CTAivRk4tJQjgv4G/EKrL5Hnco0sRppNPOhZolRoKbm+kJycAQyFjzAofegULaRoIK fU5WOmVzabmL9phFXibhpa4RXNb0FUkD4MmqbOCXPIomUIpn1aj/gQVkR+7h1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1717585820; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MPaKaPa4ON/2KTDFReCnaPWc6NkMRSGnJ7LH/y6fW+o=; b=iN3ZUO3kC1H68kU4JD0TnpTtRTEDDUEyG2kwUnOFm03WYKgi1j68fGmKiHzN2ALU2b2zFi Doq/+w10b3yCEcb0VE48vHVuA7BXT7wTfFhGHsv/0GRSHh4eWrdHpI53pPhHyHWz+CXTUD bfY8Kn4ZnaIgnRQkNt4HfLBhnqGTSS6yVsNFEp4m+s9xX4ME+zNagJJwQvG4jq6B1Ah+lu A9JdumY7vcsVXP+XaQL2dNZh8zuBHBSKOj9yXGyUtXF7chjuOI09GvurlazQYpcFV0MVly 6jnJTfcKdxrMCU8l6NTO8j8zSKNgAgMnDRVJvgxFLRoLyjrgnap65QED78pobg== ARC-Authentication-Results: i=1; rspamd-7f76976655-hc9r6; auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net X-MailChannels-Auth-Id: dreamhost X-Befitting-Army: 291ef1c337117b89_1717585821504_1005025361 X-MC-Loop-Signature: 1717585821504:1814110104 X-MC-Ingress-Time: 1717585821504 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net; s=dreamhost; t=1717585820; bh=MPaKaPa4ON/2KTDFReCnaPWc6NkMRSGnJ7LH/y6fW+o=; h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding; b=nUnuuDQ1QWi9Ns6feXzMNOKyWw7pu12cgz2wysZ/a2TqcOiK5tBStHcHR6me2iQIs YW5EI6wWjbuKHfWYI9LVzAmTWXaenmKHilto/QZXtfK+1JWjeuY43v7Q1kwK1BDPdR zRNgG2NOxXE2UynNaOvfOskW/vjOU3KwSjiPOXY1thzWY54QpA6ldNHRZY5DD4VQDJ +FxkcfloYXXhLcvitAVzW0VLWSPhsDAG925Sw/huP8b7R3KVSlbBBkDq6CkTgte1CY ZQ5rK5qnEqW1SjOB+g8Tr2Kb9+f6otJFf/vqKLd5m3ke8twbQSI6GWXbjq4KPIniry faUjuR/2KjBIQ== User-Agent: Mozilla Thunderbird Content-Language: en-US In-Reply-To: <3dedbc3b-7db0-4a39-863f-56324d434b12@DancesWithMice.info> X-Mailman-Approved-At: Thu, 06 Jun 2024 14:47:38 -0400 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <8409fd89-8b42-43c4-8511-704d57b3a4be@tompassin.net> X-Mailman-Original-References: <v3am2l$1qf6m$3@dont-email.me> <aef0bc5c-b0b6-4d7d-af05-cc22c165f327@DancesWithMice.info> <mailman.74.1717103931.2909.python-list@python.org> <v3bcgu$229eq$1@dont-email.me> <3dedbc3b-7db0-4a39-863f-56324d434b12@DancesWithMice.info> Bytes: 10335 On 6/5/2024 12:33 AM, dn via Python-list wrote: > On 31/05/24 14:26, HenHanna via Python-list wrote: >> On 5/30/2024 2:18 PM, dn wrote: >>> On 31/05/24 08:03, HenHanna via Python-list wrote: >>>> >>>> Given a text file of a novel (JoyceUlysses.txt) ... >>>> >>>> could someone give me a pretty fast (and simple) Python program >>>> that'd give me a list of all words occurring exactly once? >>>> >>>> -- Also, a list of words occurring once, twice or 3 >>>> times >>>> >>>> >>>> >>>> re: hyphenated words (you can treat it anyway you like) >>>> >>>> but ideally, i'd treat [editor-in-chief] >>>> [go-ahead] [pen-knife] >>>> [know-how] [far-fetched] ... >>>> as one unit. >> >> >>> >>> Split into words - defined as you will. >>> Use Counter. >>> >>> Show some (of your) code and we'll be happy to critique... >> >> >> hard to decide what to do with hyphens >> and apostrophes >> (I'd, he's, can't, haven't, A's and B's) >> >> >> 2-step-Process >> >> 1. make a file listing all words (one word per line) >> >> 2. then, doing the counting. using >> from collections import Counter > > > Apologies for lateness - only just able to come back to this. > > This issue is not Python, and is not solved by code! > > If you/your teacher can't define a "word", the code, any code, will > almost-certainly be wrong! > > > One of the interesting aspects of our work is that we can write all > manner of tests to try to ensure that the code is correct: unit tests, > integration tests, system tests, acceptance tests, eye-tests, ... > > However, there is no such thing as a test (or proof) that statements of > requirements are complete or correct! > (nor for any other previous stages of the full project life-cycle) > > As coders we need to learn to require clear specifications and not > attempt to read-between-the-lines, use our initiative, or otherwise 'not > bother the ...'. When there is ambiguity, we should go back to the > user/client/boss and seek clarification. They are the > domain/subject-matter experts... > > I'm reminded of a cartoon, possibly from some IBM source, first seen in > black-and-white but here in living-color: > https://www.monolithic.org/blogs/presidents-sphere/what-the-customer-really-wants That one's been kicking around for years ... good job in finding a link for it! > That has been the sad history of programming and dev.projects - wherein > we are blamed for every short-coming, because no-one else understands > the nuances of development projects. ========== REMAINDER OF ARTICLE TRUNCATED ==========