Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S Newsgroups: comp.lang.c Subject: Re: Command line globber/tokenizer library for C? Date: Wed, 18 Sep 2024 02:46:11 +0300 Organization: A noiseless patient Spider Lines: 123 Message-ID: <20240918024611.000002f3@yahoo.com> References: <20240912181625.00006e68@yahoo.com> <20240912223828.00005c10@yahoo.com> <861q1nfsjz.fsf@linuxsc.com> <20240915122211.000058b1@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Wed, 18 Sep 2024 01:46:14 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b98425cb3c65aa1e430607c237ac1db3"; logging-data="3926364"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1988cxvrycjEEY/vxK1KdRq116HvFxgCGo=" Cancel-Lock: sha1:ZewXZrNTMJOJZWci5UZxJoyC8E8= X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32) Bytes: 4937 On Tue, 17 Sep 2024 22:34:33 -0000 (UTC) antispam@fricas.org wrote: > Michael S wrote: > > On Fri, 13 Sep 2024 09:05:04 -0700 > > Tim Rentsch wrote: > > > >> Michael S writes: > >> > >> [..iterate over words in a string..] > >> > >> I couldn't resist writing some code along similar lines. The > >> entry point is words_do(), which returns one on success and > >> zero if the end of string is reached inside double quotes. > >> > >> > >> typedef struct gopher_s *Gopher; > >> struct gopher_s { void (*f)( Gopher, const char *, const char * ); > >> }; > >> > >> static _Bool collect_word( const char *, const char *, _Bool, > >> Gopher ); static _Bool is_space( char ); > >> > >> > >> _Bool > >> words_do( const char *s, Gopher go ){ > >> char c = *s; > >> > >> return > >> is_space(c) ? words_do( s+1, go ) > >> : c ? collect_word( s, s, 1, go ) > >> : /***************/ 1; > >> } > >> > >> _Bool > >> collect_word( const char *s, const char *r, _Bool w, Gopher go ){ > >> char c = *s; > >> > >> return > >> c == 0 ? go->f( go, r, s ), w > >> : is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) > >> : /***************/ collect_word( s+1, r, w ^ c == '"', go ); > >> } > >> > >> _Bool > >> is_space( char c ){ > >> return c == ' ' || c == '\t'; > >> } > > > > > > > > > Tested on godbolt. > > gcc -O2 turns it into iteration starting from v.4.4 > > clang -O2 turns it into iteration starting from v.4.0 > > Latest icc still does not turn it into iteration at least along one > > code paths. > > Latest MSVC implements it as written, 100% recursion. > > I tested using gcc 12. AFAICS calls to 'go->f' in 'collect_word' > are not tail calls and gcc 12 compiles them as normal call. Naturally. > The other calls are compiled to jumps. But call to 'collect_word' > in 'words_do' is not "sibicall" and dependig in calling convention > compiler may treat it narmal call. Two other calls, that is > call to 'words_do' in 'words_do' and call to 'collect_word' in > 'collect_word' are clearly tail self recursion and compiler > should always optimize them to a jump. > "Should" or not, MSVC does not eliminate them. The funny thing is that it does eliminate all four calls after I rewrote the code in more boring style. _Bool words_do( const char *s, Gopher go ){ char c = *s; #if 1 if (is_space(c)) return words_do( s+1, go ); if (c) return collect_word( s, s, 1, go ); return 1; #else return is_space(c) ? words_do( s+1, go ) : c ? collect_word( s, s, 1, go ): /***************/ 1; #endif } static _Bool collect_word( const char *s, const char *r, _Bool w, Gopher go ){ char c = *s; #if 1 if (c == 0) { go->f( go, r, s ); return w; } if (is_space(c) && w) { go->f( go, r, s ); return words_do( s, go ); } return collect_word( s+1, r, w ^ c == '"', go ); #else return c == 0 ? go->f( go, r, s ), w : is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) : /***************/ collect_word( s+1, r, w ^ c == '"', go ); #endif }