Deutsch   English   Français   Italiano  
<v3c7st$26biv$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "B. Pym" <No_spamming@noWhere_7073.org>
Newsgroups: comp.lang.lisp,comp.lang.scheme
Subject: Re: From JoyceUlysses.txt -- words occurring exactly once
Date: Fri, 31 May 2024 10:13:50 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <v3c7st$26biv$1@dont-email.me>
References: <v3ame4$1qf6m$5@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Injection-Date: Fri, 31 May 2024 12:13:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="6c2b9b9238357433b68a6ad6acbc6363";
	logging-data="2305631"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+lq/+0ukfdWOEHT9W9Ot2H"
User-Agent: XanaNews/1.18.1.6
Cancel-Lock: sha1:XLLkddecDl9FUISxDGw2H0gfzv4=
Bytes: 2954

On 5/30/2024, HenHanna wrote:

> 
> i'd not use Gauche for this, but maybe someone can change my mind.
> 
> 
> _______________________
> From JoyceUlysses.txt -- words occurring exactly once
> 
> 
> Given a text file of a novel (JoyceUlysses.txt) ...
> 
> could someone give me a pretty fast (and simple) program that'd give me a list of all words occurring exactly once?
> 
>               -- Also, a list of words occurring once, twice or 3 times
> 
> 
> 
> re: hyphenated words        (you can treat it anyway you like)
> 
>        ideally, i'd treat  [editor-in-chief]
>                            [go-ahead]  [pen-knife]
>                            [know-how]  [far-fetched] ...
>        as one unit.

Gauche Scheme

(use file.util)  ;; file->string
(use srfi-13)  ;; character sets
(use srfi-14)  ;; string-tokenize

(define h (make-hash-table 'string=?))

(dolist
  (s
    (string-tokenize (file->string "Alice.txt")
      (char-set-adjoin char-set:letter #\-)))
  (hash-table-update! h
    (regexp-replace* (string-upcase s) #/^-+/ "" #/-+$/ "")
    (pa$ + 1) 0))

(filter (lambda(kv) (< (cdr kv) 3))
  (hash-table->alist h))

  ===>

(("LASTED" . 2) ("WAY--NEVER" . 1) ("VISIT" . 1) ("CHANCED" . 1)
 ("WILDLY" . 2) ("BEHEAD" . 1) ("PROMISE" . 1) ("MEANWHILE" . 1)
 ("ENGAGED" . 1) ("KNIFE" . 2) ("ROARED" . 1) ("RETIRE" . 1)
 ("BLACKING" . 1) ("HATED" . 1) ("BRIGHT-EYED" . 1)
 ("SHEEP-BELLS" . 1) ("PROTECTION" . 1) ("CRIES" . 1) ("ADA" . 1)
 ("ENJOY" . 1) ("WRITHING" . 1) ("RAW" . 1) ("APPEALED" . 1)
 ("RELIEVED" . 1) ("CHILDHOOD" . 1) ("WEPT" . 1) ("RACE-COURSE" . 1)
 ("THEIRS" . 1) ("MAD--AT" . 1) ("SPOKEN" . 1) ("PENCILS" . 1)
 ("CLEAR" . 2) ("TREADING" . 2) ("RETURNED" . 2) ("CHERRY-TART" . 1)
 ("UNEASY" . 1) ("LOW-SPIRITED" . 1) ("BONE" . 1) ("PROMISED" . 1)
 ("HAPPENING" . 1) ("OYSTER" . 1) ("PATIENTLY" . 2) ("NEEDS" . 1)
 ("LESSON-BOOK" . 1) ("PITIED" . 1) ("UNCOMFORTABLY" . 1)
 ("ANTIPATHIES" . 1) ("PICTURED" . 1) ("DESPERATE" . 1)
 ("ENGRAVED" . 1)
 ...
)