Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse Newsgroups: comp.lang.awk Subject: Re: (Long post) Metaphone Algorithm In AWK Date: Wed, 21 Aug 2024 00:58:06 +0100 Organization: A noiseless patient Spider Lines: 27 Message-ID: <87wmkapx0x.fsf@bsb.me.uk> References: <878qwts8bd.fsf@bsb.me.uk> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Wed, 21 Aug 2024 01:58:08 +0200 (CEST) Injection-Info: dont-email.me; posting-host="fbbdc6bc93fc8e09c145618243cc2ab3"; logging-data="3753466"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19DwzqCNZqQHj01kedKi/suXis6yHXId2M=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:8CI9HVqHXsKZDOPYNc63jCEyGrM= sha1:giOD4xlJdGnb2leUQ29HfC7CH+w= X-BSB-Auth: 1.df81758ce7c6f93c2496.20240821005806BST.87wmkapx0x.fsf@bsb.me.uk Bytes: 2016 porkchop@invalid.foo (Mike Sanders) writes: > Ben Bacarisse wrote: > >> Using a word list, I found some odd matches. For example: >> >> $ echo "drunkeness indigestion" | awk -f metaphone.awk -v find=texas >> drunkeness >> indigestion >> >> Are these really metaphone matches for "texas"? It's possible (I don't >> know the algorithm at all well) but I found it surprising. > > Ben, give this try when you can. Finally starting to wrap my mind around > its usage a little more... I don't know what your are asking for as this (your latest AWK) is not just an implementation of the metaphone algorithm. With the extra Levenshtein test it "texas" matches only a few words. However, if I remove the extra condition (that levenshtein($x, find) <= 2) your AWK code matches a different set of words to the C implementation. Looking a bit deeper, your AWK code give the code TKSS to the word "texas" but the C code assigns is "TKS". -- Ben.