Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Michael S Newsgroups: comp.lang.c Subject: Re: Buffer contents well-defined after fgets() reaches EOF ? Date: Sun, 16 Feb 2025 11:05:46 +0200 Organization: A noiseless patient Spider Lines: 112 Message-ID: <20250216110546.00003fb7@yahoo.com> References: <20250210124911.00006b31@yahoo.com> <86ldu9zxkb.fsf@linuxsc.com> <20250214165108.00002984@yahoo.com> <20250214085627.815@kylheku.com> <20250215192911.0000793d@yahoo.com> <20250215225202.179@kylheku.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Sun, 16 Feb 2025 10:05:46 +0100 (CET) Injection-Info: dont-email.me; posting-host="490a7a8232977f53c3c7e3489ae7e715"; logging-data="552049"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX193nmhtVibJNEXW33bkJij0XTosad+QXz4=" Cancel-Lock: sha1:aibmvUS8+HXFsrX6YrrAcOctFZo= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 5303 On Sun, 16 Feb 2025 07:32:23 -0000 (UTC) Kaz Kylheku <643-408-1753@kylheku.com> wrote: > On 2025-02-15, Michael S wrote: > > On Fri, 14 Feb 2025 20:51:38 +0100 > > Janis Papanagnou wrote: > >> > >> Actually, in the same code, I'm also using the strtok() function > > > > strtok() is one of the relatively small set of more problemetic > > functions in C library that are not thread-safe. > > The design of the strtok() API is not inherently unsafe against > threads; but it requires thread-local storage to be safe. > > Since ISO C has threads now, it now takes the opportunity to > explicitly removes any requirements for thread safety in strtok. > > However, it is possible for an implementation to step forward and > make it thread safe. For instance, in a POSIX system, a > thread-specific key can be allocated for strtok on library > initialization, or the first use of strtok (via pthread_once). > > static pthread_key_t strtok_key; > > // ... > > if (pthread_key_create(&strtok_key, NULL)) > ... > > Then strtok does > > char *strtok (char * restrict str, const char * restrit delim) > { > if (str == NULL) > str = pthread_getspecific(strtok_key); > > ... > > // all return paths do this, if str has changed: > pthread_setspecific(strtok_key, str); > return ...; > } > > Only problem is that this will not perform anywhere near as well as > strtok_r, which specifies an inexpensive location for the context > pointer. > > > If you only care about POSIX target, the I'd reccomend to avoid > > strtok and to use strtok_r(). > > I would recommend learning about strspn and strcspn, and writing > your own tokenizing loop: > > /* strtok-like loop: input variabls are str and delim */ > > for (;;) { > /* skip delim chars to find start of tok */ > char *tok = str + strspn(str, delim); > > /* tokens must be nonempty; > if (*tok == 0) > break; > > /* OK; tok points to non-delim char. > Find end of token: skip span of non-delim chars. */ > char *end = tok + strcspn(str, delim); > > /* Record whether the end of the token is the end > of the string. */ > char more = *end; > > /* null-terminate token */ > *end = 0; > > { /* process tok here */ } > > if (!more) > break; > > /* If there is more material after the tok, point > str there and continue */ > str = end + 1; > } > > The strok function is ill-suited to many situations. For instance, > there are situations in which you do want empty tokens, like CSV, such > that ",abc,def," shows four tokens, two of them empty. > > With the strspn and strcspn building blocks, you can easily whip up a > custom tokenizing loop that has the right semantics for the situation. > > We can also write our loop such that it restores the original > character that was overwritten in order to null-terminate the token, > simply by adding *end = more. Thus when the loop ends, the string > is restored to its original state. > > I can understand code like that above without having to look up > anything, but if I see strtok or strtok_r code after many years of not > working with strtok, I will need a refresher on how exactly they > define a token. > For parsing of something important and relatively well-defined, like CSV, I'd very seriously consider option of not using standard str* utilities at all, with exception of those, where coding your own requires special expertise, i.e. primarily strtod(). BTW, even strtod() can't be blindly relied on for .csv, because it accepts hex floats, while standard CSV parser has to reject them. Most likely, avoiding fgets() is also a good idea in this case.