Deutsch   English   Français   Italiano  
<878qwts8bd.fsf@bsb.me.uk>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ben Bacarisse <ben@bsb.me.uk>
Newsgroups: comp.lang.awk
Subject: Re: (Long post) Metaphone Algorithm In AWK
Date: Mon, 19 Aug 2024 00:46:46 +0100
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <878qwts8bd.fsf@bsb.me.uk>
References: <v9qbgh$1u7qe$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 19 Aug 2024 01:46:48 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="153c0803c54c022691c586705843dea6";
	logging-data="2706798"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19egsdzgNC43Zm3joNIzWbA5OIjgcMla/I="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:aF8PA8GcYWDWLhUHSQTkWI7pWj8=
	sha1:iGl4brDPO2p5cpBfJhpB0v3H4TE=
X-BSB-Auth: 1.98569a2827c71aae4ff8.20240819004646BST.878qwts8bd.fsf@bsb.me.uk
Bytes: 3539

porkchop@invalid.foo (Mike Sanders) writes:

> Hi folks, hope you all are doing well.
>
> Please excuse long post, wanted to share this, some might find
> it handy given a certain context. Must run, I'm very behind in
> my work (hey I'm always running behind!)

Using a word list, I found some odd matches.  For example:

$ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas
drunkeness
indigestion

Are these really metaphone matches for "texas"?  It's possible (I don't
know the algorithm at all well) but I found it surprising.

> # metaphone.awk: Michael Sanders - 2024
> #
> # example invocation:
> #
> # echo "texas taxes taxi" | awk -f metaphone.awk -v find=texas
> #
> # notes:
> #
> # ever notice when you search for (say):
> #
> # 'i went to the zu'
> #
> # & your chosen search engine suggests something like:
> #
> # 'did you mean i went to the zoo'
> #
> # the metaphone algorithm handles such cases pretty well actually...
> #
> # Metaphone  is a phonetic algorithm, published by Lawrence Philips in
> # 1990,   for  indexing  words  by  their  English  pronunciation.  It
> # fundamentally improves on the Soundex algorithm by using information
> # about   variations  and  inconsistencies  in  English  spelling  and
> # pronunciation  to  produce  a  more  accurate encoding, which does a
> # better job of matching words and names which sound similar.
> # https://en.wikipedia.org/wiki/Metaphone
> #
> # english only (sorry)
> #
> # not extensively tested, nevertheless a solid start, if you
> # improve this code please share your results
> #
> # other implentations...
> #
> # gist:  https://gist.github.com/Rostepher/b688f709587ac145a0b3
> #
> # BASIC: http://aspell.net/metaphone/metaphone.basic
> #
> # C:     http://aspell.net/metaphone/metaphone-kuhn.txt

I wanted a "reference" implementation I could try, but this is not a
useful C program.  It's in a odd dialect (it uses void but has K&R
function definitions) and has loads of undefined behaviours (strcpy of
overlapping strings, use if uninitialised variables etc).

> # check if a character is a vowel
> function isvowel(c, is_vowel) {
>   is_vowel = c ~ /[AEIOU]/
>   return is_vowel
> }

I was not going to comment on the code, but this hit me just before I
posted.  Given the odd way AWK functions have to define locals, I tend
to use them only when really needed.  Here I think I would just write

function isvowel(c) {
   return c ~ /[AEIOU]/
}

-- 
Ben.