Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Bart Newsgroups: comp.lang.c Subject: Re: Command line globber/tokenizer library for C? Date: Wed, 18 Sep 2024 01:07:17 +0100 Organization: A noiseless patient Spider Lines: 140 Message-ID: References: <20240912181625.00006e68@yahoo.com> <20240912223828.00005c10@yahoo.com> <861q1nfsjz.fsf@linuxsc.com> <20240915122211.000058b1@yahoo.com> <20240918024611.000002f3@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Wed, 18 Sep 2024 02:07:17 +0200 (CEST) Injection-Info: dont-email.me; posting-host="521c9b697fde59071240119733a6c915"; logging-data="3946673"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Dg4m7UdYZdKBq9AL4/yeO" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:6fAbsgR4awJfLdSzCeI/nOGVKdA= In-Reply-To: <20240918024611.000002f3@yahoo.com> Content-Language: en-GB Bytes: 5723 On 18/09/2024 00:46, Michael S wrote: > On Tue, 17 Sep 2024 22:34:33 -0000 (UTC) > antispam@fricas.org wrote: > >> Michael S wrote: >>> On Fri, 13 Sep 2024 09:05:04 -0700 >>> Tim Rentsch wrote: >>> >>>> Michael S writes: >>>> >>>> [..iterate over words in a string..] >>>> >>>> I couldn't resist writing some code along similar lines. The >>>> entry point is words_do(), which returns one on success and >>>> zero if the end of string is reached inside double quotes. >>>> >>>> >>>> typedef struct gopher_s *Gopher; >>>> struct gopher_s { void (*f)( Gopher, const char *, const char * ); >>>> }; >>>> >>>> static _Bool collect_word( const char *, const char *, _Bool, >>>> Gopher ); static _Bool is_space( char ); >>>> >>>> >>>> _Bool >>>> words_do( const char *s, Gopher go ){ >>>> char c = *s; >>>> >>>> return >>>> is_space(c) ? words_do( s+1, go ) >>>> : c ? collect_word( s, s, 1, go ) >>>> : /***************/ 1; >>>> } >>>> >>>> _Bool >>>> collect_word( const char *s, const char *r, _Bool w, Gopher go ){ >>>> char c = *s; >>>> >>>> return >>>> c == 0 ? go->f( go, r, s ), w >>>> : is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) >>>> : /***************/ collect_word( s+1, r, w ^ c == '"', go ); >>>> } >>>> >>>> _Bool >>>> is_space( char c ){ >>>> return c == ' ' || c == '\t'; >>>> } >>> >>> >> >> >> >>> Tested on godbolt. >>> gcc -O2 turns it into iteration starting from v.4.4 >>> clang -O2 turns it into iteration starting from v.4.0 >>> Latest icc still does not turn it into iteration at least along one >>> code paths. >>> Latest MSVC implements it as written, 100% recursion. >> >> I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word' >> are not tail calls and gcc 12 compiles them as normal call. > > Naturally. > >> The other calls are compiled to jumps. But call to 'collect_word' >> in 'words_do' is not "sibicall" and dependig in calling convention >> compiler may treat it narmal call. Two other calls, that is >> call to 'words_do' in 'words_do' and call to 'collect_word' in >> 'collect_word' are clearly tail self recursion and compiler >> should always optimize them to a jump. >> > > "Should" or not, MSVC does not eliminate them. > > The funny thing is that it does eliminate all four calls after I rewrote > the code in more boring style. > static > _Bool > collect_word( const char *s, const char *r, _Bool w, Gopher go ){ > char c = *s; > #if 1 > if (c == 0) { > go->f( go, r, s ); > return w; > } > if (is_space(c) && w) { > go->f( go, r, s ); > return words_do( s, go ); > } > return collect_word( s+1, r, w ^ c == '"', go ); > #else > return > c == 0 ? go->f( go, r, s ), w : > is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) : > /***************/ collect_word( s+1, r, w ^ c == '"', go ); > #endif > } I find such a coding style pretty much impossible to grasp and unpleasant to look at. I had to refactor it like this: --------------- static_Bool collect_word(char *s, char *r, _Bool w, Gopher go ) { char c = *s; #if 1 if (c == 0) { go->f(go, r, s); return w; } if (is_space(c) && w) { go->f(go, r, s); return words_do(s, go); } return collect_word(s+1, r, (w ^ c) == '"', go); #else if (c == 0) { go->f(go, r, s); return w; } else if (is_space(c) && w) { go->f(go, r, s); return words_do(s, go); } else { return collect_word(s+1, r, (w ^ c) = '"', go); } #endif } --------------- When I'd finished, I realised that those two conditional blocks do more or less the same thing! If that's what you mean by 'boring', then I'll all for it.