Deutsch English Français Italiano |
<8734nnexbs.fsf@bsb.me.uk> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse <ben@bsb.me.uk> Newsgroups: comp.lang.c Subject: Re: relearning C: why does an in-place change to a char* segfault? Date: Thu, 01 Aug 2024 22:40:23 +0100 Organization: A noiseless patient Spider Lines: 74 Message-ID: <8734nnexbs.fsf@bsb.me.uk> References: <IoGcndcJ1Zm83zb7nZ2dnZfqnPWdnZ2d@brightview.co.uk> <20240801114615.906@kylheku.com> <v8gs06$2ceis$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Thu, 01 Aug 2024 23:40:24 +0200 (CEST) Injection-Info: dont-email.me; posting-host="af28625286034bb931e403b0aec66571"; logging-data="2520926"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19VO2Ec2xq+E6l93n5IxFBEUwIImNyaqBk=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:pXzbr05/3OBhP7rN4QAqkuu5+vk= sha1:OGz9uv0Cr7RRrTcs+N9b8kIPmHM= X-BSB-Auth: 1.591ada64e22f8d1cd674.20240801224023BST.8734nnexbs.fsf@bsb.me.uk Bytes: 3763 Bart <bc@freeuk.com> writes: > On 01/08/2024 20:39, Kaz Kylheku wrote: >> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote: >>> This program segfaults at the commented line: >>> >>> #include <ctype.h> >>> #include <stdio.h> >>> >>> void uppercase_ascii(char *s) { >>> while (*s) { >>> *s = toupper(*s); // SEGFAULT >>> s++; >>> } >>> } >>> >>> int main() { >>> char* text = "this is a test"; >> The "this is a test" object is a literal. It is part of the program's >> image. > > So is the text here: > > char text[]="this is a test"; > > But this can be changed without making the program self-modifying. Different "this". The array generated by the string can't be modified without UB. The "this" that can be changed in the corrected version is a plain, automatically allocated array of char, initialised with the values from the string. > I guess it depends on what is classed as the program's 'image'. The self-modifying remark is a bit of a red-herring, but altering the value of named automatic objects can't be classed as altering the program's image even in any reasonable way at all. > I'd say the image in the state it is in just after loading or just before > execution starts (since certain fixups are needed). But some sections will > be writable during execution, some not. > >> When you try to change it, you're making your program self-modifying. > >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test") >>> at inplace.c:6 >>> 6 *s = toupper(*s); >> On Linux, the string literals of a C executable are located together >> with the program text. They are interspersed among the machine >> instructions which reference them. The program text is mapped >> read-only, so an attempted modification is an access violation trapped >> by the OS, turned into a SIGSEGV signal. > > Does it really do that? Linux does not really have much to do with it; the C implementation decides, though the OS will influence what choices make more or less sense. For example, with my gcc (13.2.0) on Ubuntu the string is put into a section called .rodata, but tcc on the same Linux box puts it in .data. As a result the tcc compiled program runs without any issues and outputs before [this is a test] after [THIS IS A TEST] Some C implementations, on some Linux systems might put strings in the text segment, but I've not see a system that does that for decades. Mind you "Linux" refers to a huge class of systems ranging from top-end servers to tiny embedded devices) -- Ben.