Deutsch   English   Français   Italiano  
<86zfpngh93.fsf@linuxsc.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.std.c
Subject: Re: May a string span multiple, independent objects?
Date: Thu, 08 Aug 2024 08:35:04 -0700
Organization: A noiseless patient Spider
Lines: 92
Message-ID: <86zfpngh93.fsf@linuxsc.com>
References: <20240703141500$00ed@vinc17.org> <v63sjf$28fl8$3@dont-email.me> <87zfqy6v54.fsf@bsb.me.uk> <20240704130236$a100@vinc17.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Thu, 08 Aug 2024 17:35:07 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3394822579de1ea0c2b9a1ce158ed8c5";
	logging-data="93807"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+Gq+Se57n+MJnZvUh8bnrVzj21g8S/kys="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:qGrDJjszNok6z0FYSiN3tLvTAx4=
	sha1:qVR/83yzQOUdYhUT0Kx9+03atkM=
Bytes: 4523

Vincent Lefevre <vincent-news@vinc17.net> writes:

> In article <87zfqy6v54.fsf@bsb.me.uk>,
>   Ben Bacarisse <ben@bsb.me.uk> wrote:
>
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>
>>> On 7/3/24 10:31, Vincent Lefevre wrote:
>>>
>>>> ISO C17 (and C23 draft) 7.1.1 defines a string as follows:  "A
>>>> string is a contiguous sequence of characters terminated by and
>>>> including the first null character."
>>>>
>>>> But may a string span multiple, independent objects that happens
>>>> to be contiguous in memory?
>>
>> ...
>>
>>>> For instance, is the following program valid and what does the
>>>> ISO C standard say about that?
>>>>
>>>> #include <stdio.h>
>>>> #include <string.h>
>>>>
>>>> typedef char *volatile vp;
>>>>
>>>> int main (void)
>>>> {
>>>> char a = '\0', b = '\0';
>>>
>>> a and b are not guaranteed to be contiguous.
>>>
>>>> vp p = &a, q = &b;
>>>>
>>>> printf ("%p\n", (void *) p);
>>>> printf ("%p\n", (void *) q);
>>>> if (p + 1 == q)
>>>> {
>>>
>>> That comparison is legal, and has well-defined behavior.  It will
>>> be true only if they are in fact contiguous.
>>>
>>>> a = 'x';
>>>> printf ("%zd\n", strlen (p));
>>>
>>> Because strlen() must take a pointer to 'a' (which is treated, for
>>> these purposes, as a array of char of length 1), and increment it
>>> one past the end of that array, and then dereference that pointer
>>> to check whether it points as a null character, the behavior is
>>> undefined.
>>
>> I think this is slightly misleading.  It suggests that the UB comes
>> from something strlen /must/ do, but strlen must be thought of as a
>> black box.  We can't base anyhting on a assumed implementation.
>
> I agree (and note that strlen is not necessarily written in C).
>
>> But our conclusion is correct because there is explicit wording
>> covering this case.  The section on "String function conventions"
>> (7.24.1) states:
>>
>>   "If an array is accessed beyond the end of an object, the
>>   behavior is undefined."
>
> Arguments of these functions are either arrays and strings, where
> a string is not defined as being an array (or a part of an array).
> So I don't see why this text, as written, would apply to strings.

Something that's important to understand is the C standard is not
meant to be read as legalese or mathematicalese.  Certainly the
authors are making an effort to be precise, but not always to the
degree that every sentence is entirely correct, or presenting the
whole story, if considered just in isolation.  To avoid being led
astray it helps to remember that and try to read holistically in
addition to reading passages individually.

In any case, the question here is easily resolved by noting the
description in paragraph 1 of 7.24.1 "String function conventions",
which says in part

   The header <string.h> declares one type and several functions,
   and defines one macro useful for manipulating arrays of
   character type and other objects treated as arrays of character
   type.  [...]  Various methods are used for determining the
   lengths of the arrays, but in all cases a char * or void *
   argument points to the initial (lowest addressed) character of
   the array.

Note especially the second part of the last sentence, starting with
"but in all cases".  Arguments to functions in <string.h> always
refer to arrays, regardless of whether they might also refer to
strings.