Deutsch English Français Italiano |
<va95m5$q367$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: porkchop@invalid.foo (Mike Sanders) Newsgroups: comp.lang.awk Subject: Soundex Algorithm in AWK Date: Fri, 23 Aug 2024 05:11:33 -0000 (UTC) Organization: A noiseless patient Spider Lines: 39 Sender: Mike Sanders <busybox@sdf.org> Message-ID: <va95m5$q367$1@dont-email.me> Injection-Date: Fri, 23 Aug 2024 07:11:33 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b537a019881267c51ad5afadf5b95d49"; logging-data="855239"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18MZw098ZHhJiC2BNvBF/vT" User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (NetBSD/9.3 (amd64)) Cancel-Lock: sha1:oO9Z0iBmmknKxPkl/7+mj56/2oo= Bytes: 1922 # Soundex Algorithm in AWK: Michael Sanders 2024 # example usage: awk -f soundex.awk < words.txt # see also: https://en.wikipedia.org/wiki/Soundex { print $0 " : " soundex($0) } function soundex(word, i, code, c, firstLetter, lastCode, buf) { word = toupper(word) # convert word to uppercase firstLetter = substr(word, 1, 1) code = buf = "" # map of letters to soundex digits for (i = 2; i <= length(word); i++) { c = substr(word, i, 1) if (c ~ /[BFPV]/) code = "1" else if (c ~ /[CGJKQSXZ]/) code = "2" else if (c ~ /[DT]/) code = "3" else if (c ~ /[L]/) code = "4" else if (c ~ /[MN]/) code = "5" else if (c ~ /[R]/) code = "6" else code = "" # skip A, E, I, O, U, H, W, Y # ignore consecutive identical codes if (code != lastCode && code != "") { buf = buf code lastCode = code } } # combine 1st letter with buf, pad with zeros or truncate to 4 characters return substr(firstLetter buf "000", 1, 4) } # eof -- :wq Mike Sanders