Deutsch English Français Italiano |
<20240801114615.906@kylheku.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Kaz Kylheku <643-408-1753@kylheku.com> Newsgroups: comp.lang.c Subject: Re: relearning C: why does an in-place change to a char* segfault? Date: Thu, 1 Aug 2024 19:39:04 -0000 (UTC) Organization: A noiseless patient Spider Lines: 59 Message-ID: <20240801114615.906@kylheku.com> References: <IoGcndcJ1Zm83zb7nZ2dnZfqnPWdnZ2d@brightview.co.uk> Injection-Date: Thu, 01 Aug 2024 21:39:05 +0200 (CEST) Injection-Info: dont-email.me; posting-host="aa33a9cc9e2bd6eeeb8aeaf5ac42ace4"; logging-data="2482737"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18mB9gf4VKaC9Rari5mETkZG15II0sn0Oo=" User-Agent: slrn/pre1.0.4-9 (Linux) Cancel-Lock: sha1:NF60BUjBzIeqSzO3Io4mmNRj0fI= Bytes: 3213 On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote: > This program segfaults at the commented line: > > #include <ctype.h> > #include <stdio.h> > > void uppercase_ascii(char *s) { > while (*s) { > *s = toupper(*s); // SEGFAULT > s++; > } > } > > int main() { > char* text = "this is a test"; The "this is a test" object is a literal. It is part of the program's image. When you try to change it, you're making your program self-modifying. The ISO C language standard doesn't require implementations to support self-modifying programs; the behavior is left undefined. It could work in some documented, reliable way, in a given implementation. It's the same with any other constant in the program. Say you have a malloc(1024) somewhere in the program. That 1024 number is encoded into the program's image somhow, and in principle you could write code to somehow get at that number and change it to 256. Long before you got that far, you would be in undefined behavior territory. If it worked, it could have surprising effects. For instance, there could be another call to malloc(1024) in the program and, surprisingly, *that* one also changes to malloc(256). A literal like "this is a test" is similar to that 1024, except that it's very easy to get at it. The language defines it aws an object with an address, and to get that address all we have to do is evaluate that expression itself. A minimal piece of code that requests the undefined consequences of modifying a string literal is as easy as "a"[0] = 0. > Program received signal SIGSEGV, Segmentation fault. > 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test") > at inplace.c:6 > 6 *s = toupper(*s); On Linux, the string literals of a C executable are located together with the program text. They are interspersed among the machine instructions which reference them. The program text is mapped read-only, so an attempted modification is an access violation trapped by the OS, turned into a SIGSEGV signal. GCC uses to have a -fwritable-strings option, but it has been removed for quite some time now. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca