Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v8jlnk$31hqf$1@dont-email.me>
Deutsch   English   Français   Italiano  
<v8jlnk$31hqf$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: relearning C: why does an in-place change to a char* segfault?
Date: Sat, 3 Aug 2024 00:14:09 +0200
Organization: A noiseless patient Spider
Lines: 107
Message-ID: <v8jlnk$31hqf$1@dont-email.me>
References: <IoGcndcJ1Zm83zb7nZ2dnZfqnPWdnZ2d@brightview.co.uk>
 <20240801114615.906@kylheku.com> <v8gs06$2ceis$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 03 Aug 2024 00:14:13 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2038e674c4ef04c0f72826825844e7c7";
	logging-data="3196751"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19H+DWaR1b19WHu0IfxSrfE6yNjpEcvQwI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:3YcSMnZq45BRuPxSk836D/zbfys=
Content-Language: en-GB, nb-NO
In-Reply-To: <v8gs06$2ceis$1@dont-email.me>
Bytes: 5674

On 01/08/2024 22:42, Bart wrote:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>>      while (*s) {
>>>          *s = toupper(*s); // SEGFAULT
>>>          s++;
>>>      }
>>> }
>>>
>>> int main() {
>>>      char* text = "this is a test";
>>
>> The "this is a test" object is a literal. It is part of the program's 
>> image.
> 
> So is the text here:
> 
>    char text[]="this is a test";
> 
> But this can be changed without making the program self-modifying.

"this is a test" is a string literal, and is typically part of the 
program's image.  (There are some C implementations that do things 
differently, like storing such initialisation data in a compressed format.)

The array "char text[]", however, is a normal variable of type array of 
char.  It is most definitely not part of the program image - it is in 
ram (statically allocated or on the stack, depending on the context) and 
is initialised by copying the characters from the string literal (prior 
to main(), or at each entry to its scope if it is a local variable).

The string literal initialisation data cannot be changed without 
self-modifying code or other undefined behaviour.  The variable "text" 
is just a normal array and can be changed at will.

> 
> I guess it depends on what is classed as the program's 'image'.
> 

No, it depends on understanding what the C means and not trying to 
confuse yourself and others.

> I'd say the image in the state it is in just after loading or just 
> before execution starts (since certain fixups are needed). But some 
> sections will be writable during execution, some not.
> 

That is a poor definition because you are not considering initialised 
data, and you are not clear about what you mean by "before execution 
starts".  A C program typically has an entry point that clears the 
zero-initialised program-lifetime data, initialises the initialised 
program-lifetime data by copying from a block in the program image, then 
sets up things like stdin, heap support, argc/argv, and various other 
run-time setup features.  Then it calls main().  The initialised data 
section and zero-initialised data section are certainly part of the 
state of the program at the start of the execution from C's viewpoint - 
entry to main().  They are equally certainly not part of the program image.

One reasonable definition of "program image" would be "the file on the 
disk" (on general-purpose OS's) or "the binary data in flash" on typical 
embedded systems.  Another might be the read-only data sections set up 
by the OS loader just before jumping to the entry point of the C 
run-time code (long before main() is called and the C code itself starts).

>> When you try to change it, you're making your program self-modifying.
> 
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a 
>>> test")
>>> at inplace.c:6
>>> 6            *s = toupper(*s);
>>
>> On Linux, the string literals of a C executable are located together
>> with the program text. They are interspersed among the machine
>> instructions which reference them. The program text is mapped
>> read-only, so an attempted modification is an access violation trapped
>> by the OS, turned into a SIGSEGV signal.
> 
> Does it really do that? That's the method I've used for read-only 
> strings, to put them into the code-segment (since I neglected to support 
> a dedicated read-only data section, and it's too much work now).
>

No, Linux systems don't have read-only data or string literals 
interspersed with code.  They have such data in separate segments, for 
better cache efficiency and to allow different section attributes 
(read-only data can't be executed).

> But I don't like it since the code section is also executable; you could 
> inadvertently execute code within a string (which might happen to 
> contain machine code for other purposes).
> 

That's why code and read-only data is rarely interspersed.

> The dangers are small, but there must be reasons why a dedication 
> section is normally used. gcc on Windows creates up to 19 sections, so 
> it would odd for literal strings to share with code.
> 
>