| Deutsch English Français Italiano |
|
<20250215202915.00004842@yahoo.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.lang.c
Subject: Re: Buffer contents well-defined after fgets() reaches EOF ?
Date: Sat, 15 Feb 2025 20:29:15 +0200
Organization: A noiseless patient Spider
Lines: 168
Message-ID: <20250215202915.00004842@yahoo.com>
References: <vo9g74$fu8u$1@dont-email.me>
<vo9hlo$g0to$1@dont-email.me>
<vo9khf$ggd4$1@dont-email.me>
<vobf3h$sefh$2@dont-email.me>
<vobjdt$t5ka$1@dont-email.me>
<vobkd5$t7np$1@dont-email.me>
<20250210124911.00006b31@yahoo.com>
<86ldu9zxkb.fsf@linuxsc.com>
<20250214165108.00002984@yahoo.com>
<20250214085627.815@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 15 Feb 2025 19:29:22 +0100 (CET)
Injection-Info: dont-email.me; posting-host="8738ddbde3c74697697bcd6d7680458d";
logging-data="131702"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18iN2POxYO0v4daFhDTWZCZvqNBM75cFOw="
Cancel-Lock: sha1:D453zTc3GA2MD1pgg20flaGzRAU=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
Bytes: 7190
On Fri, 14 Feb 2025 17:22:59 -0000 (UTC)
Kaz Kylheku <643-408-1753@kylheku.com> wrote:
> On 2025-02-14, Michael S <already5chosen@yahoo.com> wrote:
> > For starter, it looks like designers of fgets() did not believe in
> > their own motto about files being just streams of bytes.
>
> They obviously did, which is exactly why they painstakingly preserved
> the annoying line terminators in the returned data.
>
> > I don't know the history, so, may be, the function was defined this
> > way for portability with systems where text files have special
> > record-based structure?
>
> You are sliding into muddled thinking here.
>
> > Then, everything about it feels inelegant.
> > A return value carries just 1 bit of information, success or
> > failure.
>
> Why would you assert a claim for which the standard library alone
> is replete with counterexamples: getchar, malloc, getenv, pow, sin.
>
> Did you mean /the/ return value (of fgets)?
>
> > So why did they encode this information in baroque way instead of
> > something obvious, 0 and 1?
>
> Because you can express this concept:
>
> char work_area[SIZE];
> char *line;
>
> while ((line = fgets(work_area, sizeof work_area, stream)))
> {
> /* process line */
> }
>
> The work_area just provides storage for the operation: line is the
> returned line.
>
> The loop would work even if fgets sometimes returned pointers that
> are not the to first byte of work_area. It just so happens that
> they always are.
>
> It is meaningful to capture the returned value and work with
> it as if it were distinct from the buffer.
>
> > Appending zero at the end also feels like a hack, but it is
> > necessary because of the main problem.
>
> Appending zero is necessary so that the result meets the definition
> of a C character string, without which it cannot be passed into
> string-manipulating functions like strlen.
>
> Home-grown functions that resemble fgets, but forget to add a null
> byte sometimes, are the subjects of security CVEs.
>
> > And the main problem is: how the user is
> > supposed to figure out how many bytes were read?
>
> Yes, how are they, if you take away the null byte?
>
> > In well-designed API this question should be answered in O(1) time.
> >
>
> In the context of C strings, that buys you almost nothing.
> Even if you know the length, it's going to get measured numerous
> more times.
>
> It would be good if fgets nuked the terminating newline.
>
> Many uses of fgets, after every operation, look for the newline
> and nuke it, before doing anything else.
>
> There is a nice idiom for that, by the way, which avoids an
> temporary variable and if test:
>
> line[strcspn(line, "\n")] = 0;
>
> strcspn(line, "\n") calculates the length of the prefix of line
> which consists of non-newlines. That value is precisely the
> array index of the first newline, if there is one, or else
> of the terminating null, if there isn't a newline. Either
> way, you can clobber that with a newline.
>
> Once you see the above, you will never do this again:
>
> newline = strchr(line, '\n');
> if (newline)
> *newline = 0;
>
> > With fgets(), it can be answered in O(N) time when input is trusted
> > to contain no zeros.
>
> We have decided in the C world that text does not contain zeros.
>
Yes, for internal data.
External inputs has to be sanitized.
> This has become so pervasive that the remaining naysayers can safely
> regarded as part of a lunatic fringe.
>
> Software that tries to support the presence of raw nulls in text is
> actively harmful for security.
>
> For instance, a piece of text with embedded nulls might have valid
> overall syntax which makes it immune to an injection attack.
>
> But when it is sent to another piece of software which interprets
> the null as a terminator, the syntax is chopped in half, allowing
> it to be completed by a malicious actor.
>
I don't quite understand. In particular, I don't understand if you
argue in favor of fgets() or against it.
> > When input is arbitrary, finding out the answer is
> > even harder and requires quirks.
>
> When input is arbitrary, don't use fgets? It's for text.
>
> > The function foo() is more generic than fgets(). For use instead of
> > fgets() it should be accompanied by standard constant EOL_CHAR.
> >
> > I am not completely satisfied with proposed solution. The API is
> > still less obvious than it could be. But it is much better than
> > fgets().
>
> If last_c is '\n', you're still writing the pesky newline that
> the caller will often want to remove.
>
> Adding a terminating null and returning a pointer to that null
> would be better.
>
If the caller wants it, it can easily do it by itself.
OTOH, If we follow your proposal, we lose information about
presence/absence of EOL at the end of the file. I think, for generic
function it's better to not lose any information, even even an
information that is not useful for 99.99% of the callers.
> You could then call the operation again with the returned dst
> pointer, and it would continue extending the string,
> without obliterating the last character.
>
> I'm sure I've seen a foo-like function in software before:
> reading delimited by an arbitrary byte, with length signaling.
>
I certainly do not pretend that I invented anything new here.
Nor did I pretend that it's the best possible.
More so, I'd like it even more mundane. I just can't figure out, how to
do it without addition of one more [pointer] parameter.
One obvious possibility is to return # of characters read instead of
pointer. Then 0 can mean EOF and negative values can mean I/O errors.
But that is also not sufficiently boring.