Deutsch   English   Français   Italiano  
<20250215202915.00004842@yahoo.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.lang.c
Subject: Re: Buffer contents well-defined after fgets() reaches EOF ?
Date: Sat, 15 Feb 2025 20:29:15 +0200
Organization: A noiseless patient Spider
Lines: 168
Message-ID: <20250215202915.00004842@yahoo.com>
References: <vo9g74$fu8u$1@dont-email.me>
	<vo9hlo$g0to$1@dont-email.me>
	<vo9khf$ggd4$1@dont-email.me>
	<vobf3h$sefh$2@dont-email.me>
	<vobjdt$t5ka$1@dont-email.me>
	<vobkd5$t7np$1@dont-email.me>
	<20250210124911.00006b31@yahoo.com>
	<86ldu9zxkb.fsf@linuxsc.com>
	<20250214165108.00002984@yahoo.com>
	<20250214085627.815@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Feb 2025 19:29:22 +0100 (CET)
Injection-Info: dont-email.me; posting-host="8738ddbde3c74697697bcd6d7680458d";
	logging-data="131702"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18iN2POxYO0v4daFhDTWZCZvqNBM75cFOw="
Cancel-Lock: sha1:D453zTc3GA2MD1pgg20flaGzRAU=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
Bytes: 7190

On Fri, 14 Feb 2025 17:22:59 -0000 (UTC)
Kaz Kylheku <643-408-1753@kylheku.com> wrote:

> On 2025-02-14, Michael S <already5chosen@yahoo.com> wrote:
> > For starter, it looks like designers of fgets() did not believe in
> > their own motto about files being just streams of bytes.  
> 
> They obviously did, which is exactly why they painstakingly preserved
> the annoying line terminators in the returned data.
> 
> > I don't know the history, so, may be, the function was defined this
> > way for portability with systems where text files have special
> > record-based structure?  
> 
> You are sliding into muddled thinking here.
> 
> > Then, everything about it feels inelegant.
> > A return value carries just 1 bit of information, success or
> > failure.  
> 
> Why would you assert a claim for which the standard library alone
> is replete with counterexamples: getchar, malloc, getenv, pow, sin.
> 
> Did you mean /the/ return value (of fgets)?
> 
> > So why did they encode this information in baroque way instead of
> > something obvious, 0 and 1?  
> 
> Because you can express this concept:
> 
>    char work_area[SIZE];
>    char *line;
> 
>    while ((line = fgets(work_area, sizeof work_area, stream)))
>    {
>       /* process line */
>    }
> 
> The work_area just provides storage for the operation: line is the
> returned line.
> 
> The loop would work even if fgets sometimes returned pointers that
> are not the to first byte of work_area. It just so happens that
> they always are.
> 
> It is meaningful to capture the returned value and work with
> it as if it were distinct from the buffer.
> 
> > Appending zero at the end also feels like a hack, but it is
> > necessary because of the main problem.  
> 
> Appending zero is necessary so that the result meets the definition
> of a C character string, without which it cannot be passed into
> string-manipulating functions like strlen.
> 
> Home-grown functions that resemble fgets, but forget to add a null
> byte sometimes, are the subjects of security CVEs.
> 
> > And the main problem is: how the user is
> > supposed to figure out how many bytes were read?  
> 
> Yes, how are they, if you take away the null byte?
> 
> > In well-designed API this question should be answered in O(1) time.
> >  
> 
> In the context of C strings, that buys you almost nothing.
> Even if you know the length, it's going to get measured numerous
> more times.
> 
> It would be good if fgets nuked the terminating newline.
> 
> Many uses of fgets, after every operation, look for the newline
> and nuke it, before doing anything else.
> 
> There is a nice idiom for that, by the way, which avoids an
> temporary variable and if test:
> 
>    line[strcspn(line, "\n")] = 0;
> 
> strcspn(line, "\n") calculates the length of the prefix of line
> which consists of non-newlines. That value is precisely the
> array index of the first newline, if there is one, or else
> of the terminating null, if there isn't a newline. Either
> way, you can clobber that with a newline.
> 
> Once you see the above, you will never do this again:
> 
>    newline = strchr(line, '\n');
>    if (newline)
>      *newline = 0;
> 
> > With fgets(), it can be answered in O(N) time when input is trusted
> > to contain no zeros.  
> 
> We have decided in the C world that text does not contain zeros.
> 

Yes, for internal data.
External inputs has to be sanitized.

> This has become so pervasive that the remaining naysayers can safely
> regarded as part of a lunatic fringe.
> 
> Software that tries to support the presence of raw nulls in text is
> actively harmful for security.
> 
> For instance, a piece of text with embedded nulls might have valid
> overall syntax which makes it immune to an injection attack.
> 
> But when it is sent to another piece of software which interprets
> the null as a terminator, the syntax is chopped in half, allowing
> it to be completed by a malicious actor.
>

I don't quite understand. In particular, I don't understand if you
argue in favor of fgets() or against it.

> > When input is arbitrary, finding out the answer is
> > even harder and requires quirks.  
> 
> When input is arbitrary, don't use fgets? It's for text.
> 
> > The function foo() is more generic than fgets(). For use instead of
> > fgets() it should be accompanied by standard constant EOL_CHAR.
> >
> > I am not completely satisfied with proposed solution. The API is
> > still less obvious than it could be. But it is much better than
> > fgets().  
> 
> If last_c is '\n', you're still writing the pesky newline that
> the caller will often want to remove.
> 
> Adding a terminating null and returning a pointer to that null
> would be better.
> 

If the caller wants it, it can easily do it by itself.
OTOH, If we follow your proposal, we lose information about
presence/absence of EOL at the end of the file. I think, for generic
function it's better to not lose any information, even even an
information that is not useful for 99.99% of the callers.

> You could then call the operation again with the returned dst
> pointer, and it would continue extending the string,
> without obliterating the last character.
> 
> I'm sure I've seen a foo-like function in software before:
> reading delimited by an arbitrary byte, with length signaling.
> 

I certainly do not pretend that I invented anything new here.
Nor did I pretend that it's the best possible.
More so, I'd like it even more mundane. I just can't figure out, how to
do it without addition of one more [pointer] parameter.

One obvious possibility is to return # of characters read instead of
pointer. Then 0 can mean EOF and negative values can mean I/O errors.
But that is also not sufficiently boring.