Deutsch   English   Français   Italiano  
<v8icrj$2paum$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: relearning C: why does an in-place change to a char* segfault?
Date: Fri, 2 Aug 2024 11:36:36 +0100
Organization: A noiseless patient Spider
Lines: 68
Message-ID: <v8icrj$2paum$1@dont-email.me>
References: <IoGcndcJ1Zm83zb7nZ2dnZfqnPWdnZ2d@brightview.co.uk>
 <20240801114615.906@kylheku.com> <v8gs06$2ceis$1@dont-email.me>
 <20240801172148.200@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 02 Aug 2024 12:36:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="685cfd46d568320119a7f0cfbd8429d5";
	logging-data="2927574"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX198odbOVD0HyjqbInxeBRpn"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:N2R8yqFqvGMzsvU5O9qUiQ6x7IM=
In-Reply-To: <20240801172148.200@kylheku.com>
Content-Language: en-GB
Bytes: 3700

On 02/08/2024 01:37, Kaz Kylheku wrote:
> On 2024-08-01, Bart <bc@freeuk.com> wrote:
>> On 01/08/2024 20:39, Kaz Kylheku wrote:
>>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>>> This program segfaults at the commented line:
>>>>
>>>> #include <ctype.h>
>>>> #include <stdio.h>
>>>>
>>>> void uppercase_ascii(char *s) {
>>>>       while (*s) {
>>>>           *s = toupper(*s); // SEGFAULT
>>>>           s++;
>>>>       }
>>>> }
>>>>
>>>> int main() {
>>>>       char* text = "this is a test";
>>>
>>> The "this is a test" object is a literal. It is part of the program's image.
>>
>> So is the text here:
>>
>>     char text[]="this is a test";
>>
>> But this can be changed without making the program self-modifying.
> 
> The array which is initialized by the literal is what can be
> changed.
> 
> In this situation, the literal is just initializer syntax,
> not required to be an object with an address.

I don't spot the 'int main() {' part of your example; my version of it 
was meant to be static. (My A, B examples explicitly used 'static'.)



>> I guess it depends on what is classed as the program's 'image'.
>>
>> I'd say the image in the state it is in just after loading or just
>> before execution starts (since certain fixups are needed). But some
>> sections will be writable during execution, some not.
> 
> Programs can self-modify in ways designed into the run time.
> The toaster has certain internal receptacles that can take
> certain forks, according to some rules, which do not affect
> the user operating the toaster according to the manual.
> 
>> The dangers are small, but there must be reasons why a dedication
>> section is normally used. gcc on Windows creates up to 19 sections, so
>> it would odd for literal strings to share with code.
> 
> One reason is that PC-relative addressing can be used by code to
> find its literals. Since that usually has a limited range, it helps
> to keep the literals with the code. Combining sections also reduces
> size. The addressing is also relocatable, which is useful in shared
> libs.

You must be talking about ARM then, with its limited address 
displacement (I think 12 bits or +/- 2KB).

On x64, PC-relative uses a 32-bit offset so the range is +/- 2GB; enough 
to have string literals located in their own read-only section of memory.

I'm sure you can do that on ARM too, I can think of several ways (and 
there are loads more registers to play with keep as bases to tables of 
such data). But I don't know what real code does.