Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.std.c Subject: Re: May a string span multiple, independent objects? Date: Thu, 08 Aug 2024 08:35:04 -0700 Organization: A noiseless patient Spider Lines: 92 Message-ID: <86zfpngh93.fsf@linuxsc.com> References: <20240703141500$00ed@vinc17.org> <87zfqy6v54.fsf@bsb.me.uk> <20240704130236$a100@vinc17.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Thu, 08 Aug 2024 17:35:07 +0200 (CEST) Injection-Info: dont-email.me; posting-host="3394822579de1ea0c2b9a1ce158ed8c5"; logging-data="93807"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Gq+Se57n+MJnZvUh8bnrVzj21g8S/kys=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:qGrDJjszNok6z0FYSiN3tLvTAx4= sha1:qVR/83yzQOUdYhUT0Kx9+03atkM= Bytes: 4523 Vincent Lefevre writes: > In article <87zfqy6v54.fsf@bsb.me.uk>, > Ben Bacarisse wrote: > >> James Kuyper writes: >> >>> On 7/3/24 10:31, Vincent Lefevre wrote: >>> >>>> ISO C17 (and C23 draft) 7.1.1 defines a string as follows: "A >>>> string is a contiguous sequence of characters terminated by and >>>> including the first null character." >>>> >>>> But may a string span multiple, independent objects that happens >>>> to be contiguous in memory? >> >> ... >> >>>> For instance, is the following program valid and what does the >>>> ISO C standard say about that? >>>> >>>> #include >>>> #include >>>> >>>> typedef char *volatile vp; >>>> >>>> int main (void) >>>> { >>>> char a = '\0', b = '\0'; >>> >>> a and b are not guaranteed to be contiguous. >>> >>>> vp p = &a, q = &b; >>>> >>>> printf ("%p\n", (void *) p); >>>> printf ("%p\n", (void *) q); >>>> if (p + 1 == q) >>>> { >>> >>> That comparison is legal, and has well-defined behavior. It will >>> be true only if they are in fact contiguous. >>> >>>> a = 'x'; >>>> printf ("%zd\n", strlen (p)); >>> >>> Because strlen() must take a pointer to 'a' (which is treated, for >>> these purposes, as a array of char of length 1), and increment it >>> one past the end of that array, and then dereference that pointer >>> to check whether it points as a null character, the behavior is >>> undefined. >> >> I think this is slightly misleading. It suggests that the UB comes >> from something strlen /must/ do, but strlen must be thought of as a >> black box. We can't base anyhting on a assumed implementation. > > I agree (and note that strlen is not necessarily written in C). > >> But our conclusion is correct because there is explicit wording >> covering this case. The section on "String function conventions" >> (7.24.1) states: >> >> "If an array is accessed beyond the end of an object, the >> behavior is undefined." > > Arguments of these functions are either arrays and strings, where > a string is not defined as being an array (or a part of an array). > So I don't see why this text, as written, would apply to strings. Something that's important to understand is the C standard is not meant to be read as legalese or mathematicalese. Certainly the authors are making an effort to be precise, but not always to the degree that every sentence is entirely correct, or presenting the whole story, if considered just in isolation. To avoid being led astray it helps to remember that and try to read holistically in addition to reading passages individually. In any case, the question here is easily resolved by noting the description in paragraph 1 of 7.24.1 "String function conventions", which says in part The header declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type. [...] Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. Note especially the second part of the last sentence, starting with "but in all cases". Arguments to functions in always refer to arrays, regardless of whether they might also refer to strings.