Deutsch   English   Français   Italiano  

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!!!!.POSTED!not-for-mail
From: Ben Bacarisse <>
Newsgroups: comp.lang.awk
Subject: Re: (Long post) Metaphone Algorithm In AWK
Date: Wed, 21 Aug 2024 00:58:06 +0100
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <>
References: <v9qbgh$1u7qe$> <>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Wed, 21 Aug 2024 01:58:08 +0200 (CEST)
Injection-Info:; posting-host="fbbdc6bc93fc8e09c145618243cc2ab3";
	logging-data="3753466"; mail-complaints-to="";	posting-account="U2FsdGVkX19DwzqCNZqQHj01kedKi/suXis6yHXId2M="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:8CI9HVqHXsKZDOPYNc63jCEyGrM=
Bytes: 2016 (Mike Sanders) writes:

> Ben Bacarisse <> wrote:
>> Using a word list, I found some odd matches.  For example:
>> $ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas
>> drunkeness
>> indigestion
>> Are these really metaphone matches for "texas"?  It's possible (I don't
>> know the algorithm at all well) but I found it surprising.
> Ben, give this try when you can. Finally starting to wrap my mind around
> its usage a little more...

I don't know what your are asking for as this (your latest AWK) is not
just an implementation of the metaphone algorithm.  With the extra
Levenshtein test it "texas" matches only a few words.

However, if I remove the extra condition (that levenshtein($x, find) <=
2) your AWK code matches a different set of words to the C
implementation.  Looking a bit deeper, your AWK code give the code TKSS
to the word "texas" but the C code assigns is "TKS".
