Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Janis Papanagnou Newsgroups: comp.unix.shell Subject: Re: a sed question Date: Sun, 22 Dec 2024 00:50:45 +0100 Organization: A noiseless patient Spider Lines: 59 Message-ID: References: <874j304vv3.fsf@example.com> <87ed21xmb3.fsf@example.com> <87bjx4ww71.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Injection-Date: Sun, 22 Dec 2024 00:50:49 +0100 (CET) Injection-Info: dont-email.me; posting-host="9629fd46d3ef03abd119c82e77c4bd73"; logging-data="306245"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19HRN9Lrp8XH7SDWJ+U0HPi" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Cancel-Lock: sha1:c6TFqD3x1DJi9jr+cne3OJyZztE= X-Enigmail-Draft-Status: N1110 In-Reply-To: <87bjx4ww71.fsf@nosuchdomain.example.com> Bytes: 3561 On 21.12.2024 22:41, Keith Thompson wrote: > Janis Papanagnou writes: >> On 21.12.2024 13:17, Salvador Mirzo wrote: > [...] >> As previously mentioned, 'sed' might not be the best choice for >> developing such scripts; you might want to consider to learn 'awk'. >> >>> $ git log --oneline | head -1 | awk '{print $1}' >>> 2566d31 >> >> With Awk you don't need 'head', it can be done like this >> >> $ git log --oneline | awk 'NR==1 {print $1}' >> >> (For long input files you may want an early exit >> ...| awk 'NR==1 { print $1 ; exit(0) }' >> but that just as an aside.) > [...] > > This raises another issue: it's often possible to replace a command in a > pipeline that filters output with an option to the command that does the > same thing. There's no general rule for how to do this, since different > commands do things differently, but for the example above: > > git log --oneline -n 1 | awk '{print $1}' Yes. - I just used the OP's presented sample to show the principle (and not make up an own example to illustrate the case). In practice it goes even farther; with Awk typical pipeline command sequences that use utilities like cat, head, tail, grep, cut, sed, tr, wc, seq, tee, etc. can typically all be represented and combined by Awk. There's also the additional effect that if you want to pass some context information from a tool near the front of the pipe to a tool near the other end it's possible to maintain arbitrary state information within the Awk program. Of course, if you can _reduce_ the amount of data at an early stage (like in your 'git -n 1' sample) the earlier the better! (My 'git', BTW, doesn't seem to support an option '-n'; which might be another reason to let a standard tool like Awk do the task for which it has been defined, text-processing.) > > or even: > > git log -n 1 --format=%h > > I haven't memorized the "--format" option, so I don't generally us it in > ad-hoc one-liners, but I do use it in scripts. Note both of the above > commands avoid generating the entire list of log entries, which could > save significant time on a large repo. > > Using unnecessary commands in pipelines is Mostly Harmless, but IMHO > it's good to think about how to do things more efficiently. See also > "Useless use of cat" (UUOC). Janis