Deutsch   English   Français   Italiano  
<v250m9$1j3gp$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ed Morton <mortonspam@gmail.com>
Newsgroups: alt.comp.lang.awk,comp.lang.awk
Subject: Re: printing words without newlines?
Date: Thu, 16 May 2024 08:11:35 -0500
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <v250m9$1j3gp$1@dont-email.me>
References: <v1pi7c$2b87j$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 16 May 2024 15:11:38 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a617c8b0103dc27f8226efb26503b635";
	logging-data="1674777"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19XdlGlitrN4zAsyt2sU1qI"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:3Q9aYAfiMrGwNnt1A9CzFdGp9qc=
X-Antivirus-Status: Clean
In-Reply-To: <v1pi7c$2b87j$1@dont-email.me>
X-Antivirus: Avast (VPS 240516-2, 5/16/2024), Outbound message
Content-Language: en-US
Bytes: 3892

On 5/11/2024 11:57 PM, David Chmelik wrote:
> I'm learning more AWK basics and wrote function to read file, sort,
> print.  I use GNU AWK (gawk) and its sort but printing is harder to get
> working than anything... separate lines work, but when I use printf() or
> set ORS then use print (for words one line) all awk outputs (on FreeBSD
> UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
> before shell prompt)... 

Your input file probably has DOS line endings, see 
https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it 
for what that means and how to deal with them but basically either run 
`dos2unix` on your file before calling awk or add `sub(\r$/,"")` as I 
show below*.

is this normal (and I made mistake?) or am I
> approaching it wrong?  I recall BASIC prints new lines, but as I learned
> basic C and some derivatives, I'm used to newlines only being specified...
> ------------------------------------------------------------------------
> # print_file_words.awk
> # pass filename to function
> BEGIN { print_file_words("data.txt"); }
> 
> # read two-column array from file and sort lines and print
> function print_file_words(file) {
> # set record separator then use print
> # ORS=" "

Move the above to a BEGIN section so it is executed once total instead 
of once per input line.

>    while(getline<file) arr[$1]=$0

The above would spin off into an infinite loop if getline failed since 
in that case it'd return a negative number which would still evaluate to 
"true" when tested as a condition. It needs to be:

     while ( (getline < file) > 0 ) arr[$1] = $0

See http://awk.freeshell.org/AllAboutGetline for that and more info on 
using getline.

*This is where you'd strip CRs from the end of input lines. Do either of 
these, the first uses a non-POSIX extension function gensub() (which 
gawk has), the second would work in any awk:

     a) while ( (getline < file) > 0 ) arr[$1] = gensub(/\r$/,"",1)

     b) while ( (getline < file) > 0 ) { sub(/\r$/,""); arr[$1] = $0 }


>    PROCINFO["sorted_in"]="@ind_num_asc"
>    for(i in arr)
>    {
>      split(arr[i],arr2)
>      # output all words or on one line with ORS
>      print arr2[2]
>      # output all words on one line without needing ORS
>      #printf("%s ",arr2[2])
>    }

Add `print RS` after the loop if you had set ORS to a blank so the 
output ends in a newline and therefore is a valid POSIX text file, 
otherwise YMMV with what subsequent text processing tools can do with it.

     Ed.

> }
> ------------------------------------------------------------------------
> # sample data.txt
> 2 your
> 1 all
> 3 base
> 5 belong
> 4 are
> 7 us
> 6 to