Deutsch   English   Français   Italiano  
<87frx1obba.fsf@nosuchdomain.example.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith Thompson <Keith.S.Thompson+u@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: Implicit String-Literal Concatenation
Date: Thu, 07 Mar 2024 15:46:01 -0800
Organization: None to speak of
Lines: 87
Message-ID: <87frx1obba.fsf@nosuchdomain.example.com>
References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me>
	<urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me>
	<urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me>
	<urnbh6$3t14d$1@dont-email.me>
	<87frxcuv87.fsf@nosuchdomain.example.com>
	<urq4fe$lapm$1@dont-email.me> <urq7fd$lupv$1@dont-email.me>
	<urqrsu$q361$1@dont-email.me>
	<87o7bzrll5.fsf@nosuchdomain.example.com>
	<urquvb$qn8n$2@dont-email.me>
	<87bk7ysysj.fsf@nosuchdomain.example.com>
	<us6876$3jpc3$5@dont-email.me>
	<87y1axp9a7.fsf@nosuchdomain.example.com>
	<usdad0$18du3$3@dont-email.me> <20240307133736.732@kylheku.com>
	<87o7bpof0z.fsf@nosuchdomain.example.com>
	<20240307145137.99@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: dont-email.me; posting-host="ba4161de3c6afa3b79edb2bdfdc78ddd";
	logging-data="1388589"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX191txyRFnv8+kTJo+6U7CLA"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:1vOuYIvAQ+5gQgTd7+PMyLLDOvo=
	sha1:vbmAdreZwyfJpTSK2t4s+zvkx34=
Bytes: 4906

Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-07, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> On 2024-03-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
>>>> On Mon, 04 Mar 2024 20:55:28 -0800, Keith Thompson wrote:
>>>>> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>>>>>> On Thu, 29 Feb 2024 14:14:52 -0800, Keith Thompson wrote:
>>>>>>> "A *string* is a contiguous sequence of characters terminated by and
>>>>>>> including the first null character."
>>>>>>
>>>>>> So how come strlen(3) does not include the null?
>>>>> 
>>>>> Because the *length of a string* is by definition "the number of bytes
>>>>> preceding the null character".
>>>>
>>>> So the “string” itself includes the null character, but its “length” does 
>>>> not?
>>>
>>> That's correct. However, its size includes it.
>>>
>>>  sizeof "abc" == 4
>>>
>>>  strlen("abc") == 3
>>>
>>> The abstract string does not include the null character;
>>> we understand "abc" to be a three character string.
>>
>> Sure, if you define "abstract string" that way.  I'll just note that C's
>> definition of the word "string" does include the terminating null
>> character, and does not talk about "abstract strings".  (A string in the
>> abstract machine clearly includes the null character, but that's a bit
>> of a stretch.)
>
> Yes; "abstract machine" is not what I mean by abstract.
>
> The concept of the abstract string lives in the semantics though.
>
> When N strings are catenated together, their abstract strings are
> juxtaposed together without any nulls in between, with only a single
> null at the end.

True both for compile-time string literal catenation and for strcat().

But for the former, embedded null characters can slightly complicate
matters.  The value of a string literal isn't necessarily a string.

#include <stdio.h>
int main(void) {
    const char s[] = "abc\0def" "ghi\0";
    puts(s);
    for (size_t i = 0; i < sizeof s; i ++) {
        if (s[i] == '\0') {
            fputs("\\0", stdout);
        }
        else {
            putchar(s[i]);
        }
    }
    putchar('\n');
}

Output:
abc
abc\0defghi\0\0

> Furthermore, when a string is sent to a stream with %s or {f}puts,
> the null byte is omitted, like in the calculation of length.
>
> Clearly, there is a semantics that the part before the null byte
> is the text processing payload; what I'm calling the abstract string.

Agreed.  To be clear, I like the idea of referring to the contents of a
string excluding the terminating null character as an "abstract string".

> (With character encodings, it gets hairy. The part before the null
> may be a UTF-8 sequence, where the abstract string consists of code
> points. Which may be combining characters, so the True Scotsman's
> abstract string is the sequence of characters.)

Yes.  With UTF-8, the term "abstract string" might reasonably refer
either to the sequence of bytes preceding the terminating '\0', or to
the sequence of code points.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */