Deutsch   English   Français   Italiano  
<87r0alqpmo.fsf@bsb.me.uk>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ben Bacarisse <ben@bsb.me.uk>
Newsgroups: comp.lang.awk
Subject: Re: (Long post) Metaphone Algorithm In AWK
Date: Mon, 19 Aug 2024 02:15:43 +0100
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <87r0alqpmo.fsf@bsb.me.uk>
References: <v9qbgh$1u7qe$1@dont-email.me> <878qwts8bd.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 19 Aug 2024 03:15:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="153c0803c54c022691c586705843dea6";
	logging-data="2706798"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/TS82aeHhQssVf9Gg9ZnInmsDU2xy/cm0="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:EHNno+VF9OkB0HZvIZNVTjNcO5c=
	sha1:iIJBzgfqmrBN/C2ewLE0mHZDT40=
X-BSB-Auth: 1.2dec014a08557f7a8ee4.20240819021543BST.87r0alqpmo.fsf@bsb.me.uk
Bytes: 3767

A correction...

Ben Bacarisse <ben@bsb.me.uk> writes:

> porkchop@invalid.foo (Mike Sanders) writes:
>
>> Hi folks, hope you all are doing well.
>>
>> Please excuse long post, wanted to share this, some might find
>> it handy given a certain context. Must run, I'm very behind in
>> my work (hey I'm always running behind!)
>
> Using a word list, I found some odd matches.  For example:
>
> $ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas
> drunkeness
> indigestion
>
> Are these really metaphone matches for "texas"?  It's possible (I don't
> know the algorithm at all well) but I found it surprising.

I got the C code to compile and these should not match if the C code is
working correctly.

>> # metaphone.awk: Michael Sanders - 2024
>> #
>> # example invocation:
>> #
>> # echo "texas taxes taxi" | awk -f metaphone.awk -v find=texas
>> #
>> # notes:
>> #
>> # ever notice when you search for (say):
>> #
>> # 'i went to the zu'
>> #
>> # & your chosen search engine suggests something like:
>> #
>> # 'did you mean i went to the zoo'
>> #
>> # the metaphone algorithm handles such cases pretty well actually...
>> #
>> # Metaphone  is a phonetic algorithm, published by Lawrence Philips in
>> # 1990,   for  indexing  words  by  their  English  pronunciation.  It
>> # fundamentally improves on the Soundex algorithm by using information
>> # about   variations  and  inconsistencies  in  English  spelling  and
>> # pronunciation  to  produce  a  more  accurate encoding, which does a
>> # better job of matching words and names which sound similar.
>> # https://en.wikipedia.org/wiki/Metaphone
>> #
>> # english only (sorry)
>> #
>> # not extensively tested, nevertheless a solid start, if you
>> # improve this code please share your results
>> #
>> # other implentations...
>> #
>> # gist:  https://gist.github.com/Rostepher/b688f709587ac145a0b3
>> #
>> # BASIC: http://aspell.net/metaphone/metaphone.basic
>> #
>> # C:     http://aspell.net/metaphone/metaphone-kuhn.txt
>
> I wanted a "reference" implementation I could try, but this is not a
> useful C program.  It's in a odd dialect (it uses void but has K&R
> function definitions) and has loads of undefined behaviours (strcpy of
> overlapping strings, use if uninitialised variables etc).

The uninitialised variables were due to an undefined function.  Most
likely, that function was intended to initialise the array.  I've mocked
up the two undefined functions and can now get the code to run.  I don't
see any uninitialised variables being used now.  The code still has
undefined behaviour in some cases but I think that is limited to the use
of strcpy.

-- 
Ben.