Deutsch   English   Français   Italiano  
<vsiit3$12k13$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Constant Stack Canaries
Date: Wed, 2 Apr 2025 00:43:39 -0500
Organization: A noiseless patient Spider
Lines: 317
Message-ID: <vsiit3$12k13$1@dont-email.me>
References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me>
 <4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org>
 <vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me>
 <e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org>
 <vseojq$112f7$1@dont-email.me>
 <62b5c4a25d917c5bab64a815189de826@www.novabbs.org>
 <vshf6a$3smcv$1@dont-email.me>
 <21397906a7a77c2d43191fdaab98a3c9@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 02 Apr 2025 07:45:08 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c11654d5afe4b7ef2b18d1e91dc487ad";
	logging-data="1134627"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18lyMloEE/a7gq1iEhb6y32du8FFGWoCiU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:hPRh0+y7jto9Lyf30VgtWJBM07s=
Content-Language: en-US
In-Reply-To: <21397906a7a77c2d43191fdaab98a3c9@www.novabbs.org>
Bytes: 12281

On 4/1/2025 6:21 PM, MitchAlsup1 wrote:
> On Tue, 1 Apr 2025 19:34:10 +0000, BGB wrote:
> 
>> On 3/31/2025 3:52 PM, MitchAlsup1 wrote:
>>> On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
> ---------------------
>>>> PC-Rel not being used as PC-Rel doesn't allow for multiple process
>>>> instances of a given loaded binary within a shared address space.
>>>
>>> As long as the relative distance is the same, it does.
>>>
>>
>> Can't happen within a shared address space.
>>
>> Say, if you load a single copy of a binary at 0x24680000.
>> Process A and B can't use the same mapping in the same address space,
>> with PC-rel globals, as then they would each see the other's globals.
> 
> Say I load a copy of the binary text at 0x24680000 and its data at
> 0x35900000 for a distance of 0x11280000 into the address space of
> a process.
> 
> Then I load another copy at 0x44680000 and its data at 55900000
> into the address space of a different process.
> 
> PC-rel addressing works in both cases--because the distance (-rel)
> remains the same,
> 
> and the MMU can translate the code to the same physical, and map
> each area of data individually.
> 
> Different virtual addresses, same code physical address, different
> data virtual and physical addresses.
> 
>> You can't do a duplicate mapping at another address, as this both wastes
>> VAS, and also any Abs64 base-relocs or similar would differ.
> 
> A 64-bit VAS is a wasteable address space, whereas a 48-bit VAS is not.
> 

OK.

PE/COFF had defined Abs64 relocs, but I am using a 48-bit VAS.

Would not have made sense to define separate Abs48 relocs, but much of 
the time, we can just assume the HOBs are zero.

Well, except for function pointers, where the base-reloc handling 
detects pointers into ".text" and does some special secret-sauce magic 
regarding the HOBs to make sure they are correctly tagged.


Binaries are not generally fully PIE though, but are instead 
base-relocated (more like EXE/DLL handling in Windows). Though, most 
things within the core proper are either PC-rel or GBR rel, and there 
are usually a relatively small number of base-relocations.

Things like DLL calls are essentially absolute addressed though. Where, 
mapping instances at different virtual addresses would be messy for 
things like DLL handling (in the absence of a GOT or similar).


>> You also can't CoW the data/bss sections, as this is no longer a shared
>> address space.
> 
> You are trying to "get at" something here, but I can't see it (yet).
> 

Shared address space assumes all processes have the same page tables and 
shared address mappings and TLB contents (though, ACL checking can be 
different, as the ACL/KRR stuff is not based on having separate contents 
in the page tables or TLB, *).

By definition, CoW can't be used in this constraint.

But, multiple VAS's adds new problems (both hassles and potential 
performance effects, so better here to delay this if possible).


*: A smaller 4-entry full-assoc cache is used for ACL checks, so it is 
more of a "what access does the current task have to this particular 
ACL" check. But, admittedly, some of this part is still TODO regarding 
making use of it in the OS.


>>
>> So, alternative is to use GBR to access globals, with the data/bss
>> sections allocated independently of the binary.
>>
>> This way, multiple processes can share the same mapping at the same
>> address for any executable code and constant data, with only the data
>> sections needing to be allocated.
>>
>>
>> Does mean though that one needs to save/restore the global pointer, and
>> there is a ritual for reloading it.
>>
>> EXE's generally assume they are index 0, so:
>>    MOV.Q (GBR, 0), Rt
>>    MOV.Q (Rt, 0), GBR
>> Or, in RV terms:
>>    LD    X6, 0(X3)
>>    LD    X3, Disp33(X6)
>> Or, RV64G:
>>    LD    X6, 0(X3)
>>    LUI   X5, DispHi
>>    ADD   X5  X5, X6
>>    LD    X3, DispLo(X5)
>>
>>
>> For DLL's, the index is fixed up with a base-reloc (for each loaded
>> DLL), so basically the same idea. Typically a Disp33 is used here to
>> allow for a potentially large/unknown number of loaded DLL's. Thus far,
>> a global numbering scheme is used.
>>
>> Where, (GBR+0) gives the address of a table of global pointers for every
>> loaded binary (can be assumed read-only from userland).
>>
>>
>> Generally, this is needed if:
>>    Function may be called from outside of the current binary and:
>>      Accesses global variables;
>>      And/or, calls local functions.
> 
> I just use 32-bit of 64-bit displacement constants. Does not matter
> how control arrived at this subroutine, it accesses its data as the
> linker resolved addresses--without wasting a register.
> 

GBR or GP is specially designated as a global pointer though.
Not so starved for registers that it would make sense to reclaim it as a 
GPR.

But, yeah, do need to care how control can arrive at a given function.



>>
>> Though, still generally lower average-case overhead than the strategy
>> typically used by FDPIC, which would handle this reload process on the
>> caller side...
>>    SD    X3, Disp(SP)
>>    LD    X3, 8(X18)
>>    LD    X6, 0(X18)
>>    JALR  X1, 0(X6)
>>    LD    X3, Disp(SP)
> 
> This is just::
> 
>      CALX    [IP,,#GOT[funct_num]-.]
> 
> In the 32-bit linking mode this is a 2 word instruction, in the 64-bit
> linking mode it is a 3 word instruction.
> ----------------

OK.

Neither BJX nor RISC-V have special instructions to deal with FDPIC call 
semantics.


>>
>> Though, execl() effectively replaces the current process.
>>
>> IMHO, a "CreateProcess()" style abstraction makes more sense than
>> fork+exec.
> 
> You are 40 years late on that.
> 

I am just doing it the Windows (or Cygwin) way...

========== REMAINDER OF ARTICLE TRUNCATED ==========