Deutsch   English   Français   Italiano  
<va95m5$q367$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: porkchop@invalid.foo (Mike Sanders)
Newsgroups: comp.lang.awk
Subject: Soundex Algorithm in AWK
Date: Fri, 23 Aug 2024 05:11:33 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 39
Sender: Mike Sanders <busybox@sdf.org>
Message-ID: <va95m5$q367$1@dont-email.me>
Injection-Date: Fri, 23 Aug 2024 07:11:33 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b537a019881267c51ad5afadf5b95d49";
	logging-data="855239"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18MZw098ZHhJiC2BNvBF/vT"
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (NetBSD/9.3 (amd64))
Cancel-Lock: sha1:oO9Z0iBmmknKxPkl/7+mj56/2oo=
Bytes: 1922

# Soundex Algorithm in AWK: Michael Sanders 2024
# example usage: awk -f soundex.awk < words.txt
# see also: https://en.wikipedia.org/wiki/Soundex

{ print $0 " : " soundex($0) }

function soundex(word, i, code, c, firstLetter, lastCode, buf) {
  word = toupper(word) # convert word to uppercase
  firstLetter = substr(word, 1, 1)
  code = buf = ""

  # map of letters to soundex digits
  for (i = 2; i <= length(word); i++) {
    c = substr(word, i, 1)
    if (c ~ /[BFPV]/)          code = "1"
    else if (c ~ /[CGJKQSXZ]/) code = "2"
    else if (c ~ /[DT]/)       code = "3"
    else if (c ~ /[L]/)        code = "4"
    else if (c ~ /[MN]/)       code = "5"
    else if (c ~ /[R]/)        code = "6"
    else code = "" # skip A, E, I, O, U, H, W, Y

    # ignore consecutive identical codes
    if (code != lastCode && code != "") {
      buf = buf code
      lastCode = code
    }
  }

  # combine 1st letter with buf, pad with zeros or truncate to 4 characters
  return substr(firstLetter buf "000", 1, 4)
}

# eof

-- 
:wq
Mike Sanders