Article <vt8j5u$1gmdg$1@news.xmission.com>

Deutsch English Français Italiano
<vt8j5u$1gmdg$1@news.xmission.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!xmission!nnrp.xmission!.POSTED.shell.xmission.com!not-for-mail
From: gazelle@shell.xmission.com (Kenny McCormack)
Newsgroups: comp.lang.awk
Subject: Re: Experiences with match() subexpressions?
Date: Thu, 10 Apr 2025 14:04:46 -0000 (UTC)
Organization: The official candy of the new Millennium
Message-ID: <vt8j5u$1gmdg$1@news.xmission.com>
References: <vt7qlq$2ge70$1@dont-email.me> <vt7qs4$2gior$1@dont-email.me> <vt88s7$1ghd2$1@news.xmission.com> <vt8bit$2uiq5$1@dont-email.me>
Injection-Date: Thu, 10 Apr 2025 14:04:46 -0000 (UTC)
Injection-Info: news.xmission.com; posting-host="shell.xmission.com:166.70.8.4";
	logging-data="1595824"; mail-complaints-to="abuse@xmission.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: gazelle@shell.xmission.com (Kenny McCormack)
Bytes: 3216
Lines: 51

In article <vt8bit$2uiq5$1@dont-email.me>,
Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
....
>> I have to admit that I (still) don't really understand how this match third
>> arg stuff works. 
....
>> I.e., I can never predict what will happen, so I always
>> just dump out the array and try to reverse-engineer it each time I need to
>> use it.
....
>Above output stuff appears because in 'arr' there's additional elements
>about the pattern positions stored.

Just to clarify, I wasn't looking for a tutorial (man page regurgitation).
I understand the man page description of match's 3rd arg as well as anyone;
I just find it that it doesn't do as much in practice as (I think) it
should - and that it is unpredictable (by me, anyway) what it will do (you
have to dump out the array and trial-and-error it to get it to do what you
want).  It promises more than it delivers.  I have much the same comments
to make about the similar functionality in Tcl (Expect).

None of which is criticism of the feature; as you say below, it basically
does as much as the underlying regexp library allows it to do.

....
>I think I'll do the parsing the straightforward two-step way as I did
>before the GNU Awk specific functions were available; it's probably
>also the clearest way to program that functionality.

Probably so.  BTW, it is not really "GNU Awk specific"; lots of languages
have this general capability.

Incidentally, here is a function of mine that uses match's 3rd arg.  I find
it useful.  This addresses a common AWK issue, where you have a line with
fields (in the usual AWK whitespace-delimited sense), but you need to know
the actual character positions of the fields (since they can move around
from line to line of input).  Note also that I'm not really sure where the
name "splitMatch" came from; it was just what popped into my head when I
was writing this...

--- Cut Here ---
# Find the character positions of each of the fields in string s.
# Note that s will usually be $0, and n will usually be NF.
function splitMatch(s,n,A,	i,t) {
    for (i=1; i<=n; i++) t = t "([^ \t]+)[ \t]*"
    return match(s,t,A)
    }
--- Cut Here ---

-- 
In the corner of the room on the ceiling is a large vampire bat who
is obviously deranged and holding his nose.