Deutsch   English   Français   Italiano  
<8734nnexbs.fsf@bsb.me.uk>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ben Bacarisse <ben@bsb.me.uk>
Newsgroups: comp.lang.c
Subject: Re: relearning C: why does an in-place change to a char* segfault?
Date: Thu, 01 Aug 2024 22:40:23 +0100
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <8734nnexbs.fsf@bsb.me.uk>
References: <IoGcndcJ1Zm83zb7nZ2dnZfqnPWdnZ2d@brightview.co.uk>
	<20240801114615.906@kylheku.com> <v8gs06$2ceis$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Thu, 01 Aug 2024 23:40:24 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="af28625286034bb931e403b0aec66571";
	logging-data="2520926"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19VO2Ec2xq+E6l93n5IxFBEUwIImNyaqBk="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:pXzbr05/3OBhP7rN4QAqkuu5+vk=
	sha1:OGz9uv0Cr7RRrTcs+N9b8kIPmHM=
X-BSB-Auth: 1.591ada64e22f8d1cd674.20240801224023BST.8734nnexbs.fsf@bsb.me.uk
Bytes: 3763

Bart <bc@freeuk.com> writes:

> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>>      while (*s) {
>>>          *s = toupper(*s); // SEGFAULT
>>>          s++;
>>>      }
>>> }
>>>
>>> int main() {
>>>      char* text = "this is a test";
>> The "this is a test" object is a literal. It is part of the program's
>> image.
>
> So is the text here:
>
>   char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.

Different "this".  The array generated by the string can't be modified
without UB.  The "this" that can be changed in the corrected version is
a plain, automatically allocated array of char, initialised with the
values from the string.

> I guess it depends on what is classed as the program's 'image'.

The self-modifying remark is a bit of a red-herring, but altering the
value of named automatic objects can't be classed as altering the
program's image even in any reasonable way at all.

> I'd say the image in the state it is in just after loading or just before
> execution starts (since certain fixups are needed). But some sections will
> be writable during execution, some not.
>
>> When you try to change it, you're making your program self-modifying.
>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
>>> at inplace.c:6
>>> 6	        *s = toupper(*s);
>> On Linux, the string literals of a C executable are located together
>> with the program text. They are interspersed among the machine
>> instructions which reference them. The program text is mapped
>> read-only, so an attempted modification is an access violation trapped
>> by the OS, turned into a SIGSEGV signal.
>
> Does it really do that?

Linux does not really have much to do with it; the C implementation
decides, though the OS will influence what choices make more or less
sense.

For example, with my gcc (13.2.0) on Ubuntu the string is put into a
section called .rodata, but tcc on the same Linux box puts it in .data.
As a result the tcc compiled program runs without any issues and outputs

before [this is a test]
after  [THIS IS A TEST]

Some C implementations, on some Linux systems might put strings in the
text segment, but I've not see a system that does that for decades.
Mind you "Linux" refers to a huge class of systems ranging from top-end
servers to tiny embedded devices)

-- 
Ben.