Deutsch English Français Italiano |
<87frx1obba.fsf@nosuchdomain.example.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson <Keith.S.Thompson+u@gmail.com> Newsgroups: comp.lang.c Subject: Re: Implicit String-Literal Concatenation Date: Thu, 07 Mar 2024 15:46:01 -0800 Organization: None to speak of Lines: 87 Message-ID: <87frx1obba.fsf@nosuchdomain.example.com> References: <urdsob$1e8e4$7@dont-email.me> <urj1qv$2p32o$1@dont-email.me> <urk6um$33nqv$1@dont-email.me> <urlgfn$3d1ah$3@dont-email.me> <urlmo7$3eg2j$1@dont-email.me> <urn6sv$3s62i$2@dont-email.me> <urnbh6$3t14d$1@dont-email.me> <87frxcuv87.fsf@nosuchdomain.example.com> <urq4fe$lapm$1@dont-email.me> <urq7fd$lupv$1@dont-email.me> <urqrsu$q361$1@dont-email.me> <87o7bzrll5.fsf@nosuchdomain.example.com> <urquvb$qn8n$2@dont-email.me> <87bk7ysysj.fsf@nosuchdomain.example.com> <us6876$3jpc3$5@dont-email.me> <87y1axp9a7.fsf@nosuchdomain.example.com> <usdad0$18du3$3@dont-email.me> <20240307133736.732@kylheku.com> <87o7bpof0z.fsf@nosuchdomain.example.com> <20240307145137.99@kylheku.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: dont-email.me; posting-host="ba4161de3c6afa3b79edb2bdfdc78ddd"; logging-data="1388589"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX191txyRFnv8+kTJo+6U7CLA" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cancel-Lock: sha1:1vOuYIvAQ+5gQgTd7+PMyLLDOvo= sha1:vbmAdreZwyfJpTSK2t4s+zvkx34= Bytes: 4906 Kaz Kylheku <433-929-6894@kylheku.com> writes: > On 2024-03-07, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >> Kaz Kylheku <433-929-6894@kylheku.com> writes: >>> On 2024-03-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote: >>>> On Mon, 04 Mar 2024 20:55:28 -0800, Keith Thompson wrote: >>>>> Lawrence D'Oliveiro <ldo@nz.invalid> writes: >>>>>> On Thu, 29 Feb 2024 14:14:52 -0800, Keith Thompson wrote: >>>>>>> "A *string* is a contiguous sequence of characters terminated by and >>>>>>> including the first null character." >>>>>> >>>>>> So how come strlen(3) does not include the null? >>>>> >>>>> Because the *length of a string* is by definition "the number of bytes >>>>> preceding the null character". >>>> >>>> So the “string” itself includes the null character, but its “length” does >>>> not? >>> >>> That's correct. However, its size includes it. >>> >>> sizeof "abc" == 4 >>> >>> strlen("abc") == 3 >>> >>> The abstract string does not include the null character; >>> we understand "abc" to be a three character string. >> >> Sure, if you define "abstract string" that way. I'll just note that C's >> definition of the word "string" does include the terminating null >> character, and does not talk about "abstract strings". (A string in the >> abstract machine clearly includes the null character, but that's a bit >> of a stretch.) > > Yes; "abstract machine" is not what I mean by abstract. > > The concept of the abstract string lives in the semantics though. > > When N strings are catenated together, their abstract strings are > juxtaposed together without any nulls in between, with only a single > null at the end. True both for compile-time string literal catenation and for strcat(). But for the former, embedded null characters can slightly complicate matters. The value of a string literal isn't necessarily a string. #include <stdio.h> int main(void) { const char s[] = "abc\0def" "ghi\0"; puts(s); for (size_t i = 0; i < sizeof s; i ++) { if (s[i] == '\0') { fputs("\\0", stdout); } else { putchar(s[i]); } } putchar('\n'); } Output: abc abc\0defghi\0\0 > Furthermore, when a string is sent to a stream with %s or {f}puts, > the null byte is omitted, like in the calculation of length. > > Clearly, there is a semantics that the part before the null byte > is the text processing payload; what I'm calling the abstract string. Agreed. To be clear, I like the idea of referring to the contents of a string excluding the terminating null character as an "abstract string". > (With character encodings, it gets hairy. The part before the null > may be a UTF-8 sequence, where the abstract string consists of code > points. Which may be combining characters, so the True Scotsman's > abstract string is the sequence of characters.) Yes. With UTF-8, the term "abstract string" might reasonably refer either to the sequence of bytes preceding the terminating '\0', or to the sequence of code points. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Medtronic void Void(void) { Void(); } /* The recursive call of the void */