Deutsch English Français Italiano |
<87zfv9mkpj.fsf@nosuchdomain.example.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson <Keith.S.Thompson+u@gmail.com> Newsgroups: comp.lang.awk Subject: Re: "sed" question Date: Thu, 07 Mar 2024 20:06:00 -0800 Organization: None to speak of Lines: 118 Message-ID: <87zfv9mkpj.fsf@nosuchdomain.example.com> References: <us9vka$fepq$1@dont-email.me> <usa01v$fj5h$1@dont-email.me> <usagql$j9bc$1@dont-email.me> <usb5jv$4qa$3@tncsrv09.home.tnetconsulting.net> <usb6pa$ncok$1@dont-email.me> <usdk6k$so1$1@tncsrv09.home.tnetconsulting.net> <87bk7poa7u.fsf@nosuchdomain.example.com> <usdtn4$j2n$1@tncsrv09.home.tnetconsulting.net> MIME-Version: 1.0 Content-Type: text/plain Injection-Info: dont-email.me; posting-host="ba4161de3c6afa3b79edb2bdfdc78ddd"; logging-data="1595146"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KFinAlTdW9wNwv5c2WrKr" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cancel-Lock: sha1:iMaEYp5Kx6N/HqgDr/ctrD8w6Z8= sha1:eNFtYocnCVISnHsFja50eMdOQA8= Bytes: 5175 Grant Taylor <gtaylor@tnetconsulting.net> writes: > On 3/7/24 18:09, Keith Thompson wrote: >> I know that's what awk does, but I don't think I would have expected >> it if I didn't know about it. > > Okay. I think that's a fair observation. > >> $0 is the current input line. > > Or $0 is the current /record/ in awk parlance. Yes. >> If you don't change anything, or if you modify $0 itself, whitespace >> betweeen fields is preserved. > >> If you modify any of the fields, $0 is recomputed and whitespace >> between tokens is collapsed. > > I don't agree with that. > > % echo 'one two three' | awk '{print $0; print $1,$2,$3}' > one two three > one two three > > I didn't /modify/ anything and awk does print the fields with > different white space. That's just the semantics of print with comma-delimited arguments, just like: % awk 'BEGIN{a="foo"; b="bar"; print a, b}' foo bar Printing the values of $1, $2, and $3 doesn't change $0. Writing to any of $1, $2, $3, even with the same value, does change $0. $ echo 'one two three' | awk '{print $0; print $1,$2,$3; print $0; $2 = $2; print $0}' one two three one two three one two three one two three >> awk *could* have been defined to preserve inter-field whitespace >> even when you modify individual fields, > > I question the veracity of that. Specifically when lengthening or > shortening the value of a field. E.g. replacing "two" with > "fifteen". This is particularly germane when you look at $0 as a fixed > width formatted output. But awk doesn't work with fixed-width data. The length of each field, and the length of $0, is variable. If awk *purely* dealt with input lines only as lists of tokens, then this: echo 'one two three' | awk '{print $0}' would print "one two three" rather than "one two three" (and awk would lose the ability to deal with arbitrarily formatted input). The fact that the inter-field whitespace is reset only when individual fields are touched feels arbitrary to me. >> and I think I would have found that more intuitive. > > I don't agree. > >> (And ideally there would be a way to refer to that inter-field >> whitespace.) > > Remember, awk is meant for working on fields of data in a record. By > default, the fields are delimited by white space characters. I'll say > it this way, awk is meant for working on the non-white space > characters. Or yet another way, awk is not meant for working on > white space charters. Awk has strong builtin support for working on whitespace-delimited fields, and that support tends to ignore the details of that whitespace. But you can also write awk code that just deals with $0. One trivial example: awk '{ count += length + 1 } END { print count }' behaves similarly to `wc -l`, and counts whitespace characters just like any other characters. >> The fact that modifying a field has the side effect of messing up $0 >> seems counterintuitive. > > Maybe. > > But I think it's one that is acceptable for what awk is intended to do. It's also the existing behavior, and changing it would break things, so I wouldn't suggest changing it. >> Perhaps the behavior matches your intuition better than it matches >> mine. > > I sort of feel like you are wanting to / trying to use awk in places > where sed might be better. sed just sees a string of text and is > ignorant of any structure without a carefully crafted RE to provide it. Not really. I'm just remarking on one particular awk feature that I find a bit counterintuitive. Awk is optimized for working on records consisting of fields, and not caring much about how much whitespace there is between fields. But it's flexible enought to do *lots* of other things. [...] -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Medtronic void Void(void) { Void(); } /* The recursive call of the void */