Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <87zfv9mkpj.fsf@nosuchdomain.example.com>
Deutsch   English   Français   Italiano  
<87zfv9mkpj.fsf@nosuchdomain.example.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith Thompson <Keith.S.Thompson+u@gmail.com>
Newsgroups: comp.lang.awk
Subject: Re: "sed" question
Date: Thu, 07 Mar 2024 20:06:00 -0800
Organization: None to speak of
Lines: 118
Message-ID: <87zfv9mkpj.fsf@nosuchdomain.example.com>
References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me>
	<usagql$j9bc$1@dont-email.me>
	<usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net>
	<usb6pa$ncok$1@dont-email.me>
	<usdk6k$so1$1@tncsrv09.home.tnetconsulting.net>
	<87bk7poa7u.fsf@nosuchdomain.example.com>
	<usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="ba4161de3c6afa3b79edb2bdfdc78ddd";
	logging-data="1595146"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19KFinAlTdW9wNwv5c2WrKr"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:iMaEYp5Kx6N/HqgDr/ctrD8w6Z8=
	sha1:eNFtYocnCVISnHsFja50eMdOQA8=
Bytes: 5175

Grant Taylor <gtaylor@tnetconsulting.net> writes:
> On 3/7/24 18:09, Keith Thompson wrote:
>> I know that's what awk does, but I don't think I would have expected
>> it if I didn't know about it.
>
> Okay.  I think that's a fair observation.
>
>> $0 is the current input line.
>
> Or $0 is the current /record/ in awk parlance.

Yes.

>> If you don't change anything, or if you modify $0 itself, whitespace
>> betweeen fields is preserved.
>
>> If you modify any of the fields, $0 is recomputed and whitespace
>> between tokens is collapsed.
>
> I don't agree with that.
>
>    % echo 'one  two   three' | awk '{print $0; print $1,$2,$3}'
>    one  two   three
>    one two three
>
> I didn't /modify/ anything and awk does print the fields with
> different white space.

That's just the semantics of print with comma-delimited arguments, just
like:

    % awk 'BEGIN{a="foo"; b="bar"; print a, b}'
    foo bar

Printing the values of $1, $2, and $3 doesn't change $0.  Writing to any
of $1, $2, $3, even with the same value, does change $0.

    $ echo 'one  two   three' | awk '{print $0; print $1,$2,$3; print $0; $2 = $2; print $0}'
    one  two   three
    one two three
    one  two   three
    one two three

>> awk *could* have been defined to preserve inter-field whitespace
>> even when you modify individual fields,
>
> I question the veracity of that.  Specifically when lengthening or
> shortening the value of a field.  E.g. replacing "two" with
> "fifteen". This is particularly germane when you look at $0 as a fixed
> width formatted output.

But awk doesn't work with fixed-width data.  The length of each field,
and the length of $0, is variable.

If awk *purely* dealt with input lines only as lists of tokens, then
this:

    echo 'one  two   three' | awk '{print $0}'

would print "one two three" rather than "one two three" (and awk would
lose the ability to deal with arbitrarily formatted input).  The fact
that the inter-field whitespace is reset only when individual fields are
touched feels arbitrary to me.

>> and I think I would have found that more intuitive.
>
> I don't agree.
>
>> (And ideally there would be a way to refer to that inter-field
>> whitespace.)
>
> Remember, awk is meant for working on fields of data in a record.  By
> default, the fields are delimited by white space characters.  I'll say 
> it this way, awk is meant for working on the non-white space
> characters.   Or yet another way, awk is not meant for working on
> white space charters.

Awk has strong builtin support for working on whitespace-delimited
fields, and that support tends to ignore the details of that whitespace.
But you can also write awk code that just deals with $0.

One trivial example:

    awk '{ count += length + 1 } END { print count }'

behaves similarly to `wc -l`, and counts whitespace characters just like
any other characters.

>> The fact that modifying a field has the side effect of messing up $0
>> seems counterintuitive.
>
> Maybe.
>
> But I think it's one that is acceptable for what awk is intended to do.

It's also the existing behavior, and changing it would break things, so
I wouldn't suggest changing it.

>> Perhaps the behavior matches your intuition better than it matches
>> mine.
>
> I sort of feel like you are wanting to / trying to use awk in places
> where sed might be better.  sed just sees a string of text and is 
> ignorant of any structure without a carefully crafted RE to provide it.

Not really.  I'm just remarking on one particular awk feature that I
find a bit counterintuitive.

Awk is optimized for working on records consisting of fields, and not
caring much about how much whitespace there is between fields.  But it's
flexible enought to do *lots* of other things.

[...]

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */