Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S Newsgroups: comp.lang.c Subject: Re: Command line globber/tokenizer library for C? Date: Sun, 15 Sep 2024 12:22:11 +0300 Organization: A noiseless patient Spider Lines: 97 Message-ID: <20240915122211.000058b1@yahoo.com> References: <20240912181625.00006e68@yahoo.com> <20240912223828.00005c10@yahoo.com> <861q1nfsjz.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Sun, 15 Sep 2024 11:21:47 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b1d11931d9aad568cedfe64b3767d71c"; logging-data="2177741"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19i8gJqLyjC7Mwkz42pc6k8aNGq6x3CjtQ=" Cancel-Lock: sha1:OYfUa/90OLtGe7FLzVhmwVD0aiY= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 4231 On Fri, 13 Sep 2024 09:05:04 -0700 Tim Rentsch wrote: > Michael S writes: > > [..iterate over words in a string..] > > > #include > > > > void parse(const char* src, > > void (*OnToken)(const char* beg, size_t len, void* context), > > void* context) { > > char c0 = ' ', c1 = '\t'; > > const char* beg = 0; > > for (;;src++) { > > char c = *src; > > if (c == c0 || c == c1 || c == 0) { > > if (beg) { > > OnToken(beg, src-beg, context); > > c0 = ' ', c1 = '\t'; > > beg = 0; > > } > > if (c == 0) > > break; > > } else if (!beg) { > > beg = src; > > if (c == '"') { > > c0 = c1 = c; > > ++beg; > > } > > } > > } > > } > > I couldn't resist writing some code along similar lines. The > entry point is words_do(), which returns one on success and > zero if the end of string is reached inside double quotes. > > > typedef struct gopher_s *Gopher; > struct gopher_s { void (*f)( Gopher, const char *, const char * ); }; > > static _Bool collect_word( const char *, const char *, _Bool, > Gopher ); static _Bool is_space( char ); > > > _Bool > words_do( const char *s, Gopher go ){ > char c = *s; > > return > is_space(c) ? words_do( s+1, go ) > : c ? collect_word( s, s, 1, go ) > : /***************/ 1; > } > > _Bool > collect_word( const char *s, const char *r, _Bool w, Gopher go ){ > char c = *s; > > return > c == 0 ? go->f( go, r, s ), w > : is_space(c) && w ? go->f( go, r, s ), words_do( s, go ) > : /***************/ collect_word( s+1, r, w ^ c == '"', go ); > } > > _Bool > is_space( char c ){ > return c == ' ' || c == '\t'; > } Can you give an example implementation of go->f() ? It seems to me that it would have to use CONTAINING_RECORD or container_of or analogous non-standard macro. Also, while formally the program is written in C, by spirit it's something else. May be, Lisp. Lisp compilers are known to be very good at tail call elimination. C compilers also can do it, but not reliably. In this particular case I am afraid that common C compilers will implement it as written, i.e. without turning recursion into iteration. Tested on godbolt. gcc -O2 turns it into iteration starting from v.4.4 clang -O2 turns it into iteration starting from v.4.0 Latest icc still does not turn it into iteration at least along one code paths. Latest MSVC implements it as written, 100% recursion.