Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <vshf6a$3smcv$1@dont-email.me>
Deutsch   English   Français   Italiano  
<vshf6a$3smcv$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Constant Stack Canaries
Date: Tue, 1 Apr 2025 14:34:10 -0500
Organization: A noiseless patient Spider
Lines: 292
Message-ID: <vshf6a$3smcv$1@dont-email.me>
References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me>
 <4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org>
 <vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me>
 <e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org>
 <vseojq$112f7$1@dont-email.me>
 <62b5c4a25d917c5bab64a815189de826@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 01 Apr 2025 21:35:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="daaf9b4a0d2b7384daa5332985ebaedc";
	logging-data="4086175"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/XONmAPPGWTARvcYCExGVa73+Y4N1F4k8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:L6UaBd5+vssI6ownRHgetqyFu4Q=
Content-Language: en-US
In-Reply-To: <62b5c4a25d917c5bab64a815189de826@www.novabbs.org>

On 3/31/2025 3:52 PM, MitchAlsup1 wrote:
> On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
> 
>> On 3/31/2025 1:07 PM, MitchAlsup1 wrote:
> -------------
>>>> Another option being if it could be a feature of a Load/Store Multiple.
>>>>
>>>> Say, LDM/STM:
>>>>    6b Hi (Upper bound of register to save)
>>>>    6b Lo (Lower bound of registers to save)
>>>>    1b LR (Flag to save Link Register)
>>>>    1b GP (Flag to save Global Pointer)
>>>>    1b SK (Flag to generate a canary)
>>>
>>> ENTER and EXIT have 2 of those flags--but also note use of SP and CSP
>>> are implicit.
>>>
>>>> Likely (STM):
>>>>    Pushes LR first (if bit set);
>>>>    Pushes GP second (if bit set);
>>>>    Pushes registers in range (if Hi>=Lo);
>>>>    Pushes stack canary (if bit set).
>>>
>>> EXIT uses its 3rd flag used when doing longjump() and THROW()
>>> so as to pop the call-stack but not actually RET from the stack
>>> walker.
>>>
>>
>> OK.
>>
>> I guess one could debate whether an LDM could treat the Load-LR as "Load
>> LR" or "Load address and Branch", and/or have separate flags (Load LR vs
>> Load PC, with Load PC meaning to branch).
>>
>>
>> Other ABIs may not have as much reason to save/restore the Global
>> Pointer all the time. But, in my case, it is being used as the primary
>> way of accessing globals, and each binary image has its own address
>> range here.
> 
> I use constants to access globals.
> These comes in 32-bit and 64-bit flavors.
> 

Typically 16-bit, most are within a 16-bit range of the Global Pointer.


>> PC-Rel not being used as PC-Rel doesn't allow for multiple process
>> instances of a given loaded binary within a shared address space.
> 
> As long as the relative distance is the same, it does.
> 

Can't happen within a shared address space.

Say, if you load a single copy of a binary at 0x24680000.
Process A and B can't use the same mapping in the same address space, 
with PC-rel globals, as then they would each see the other's globals.

You can't do a duplicate mapping at another address, as this both wastes 
VAS, and also any Abs64 base-relocs or similar would differ.

You also can't CoW the data/bss sections, as this is no longer a shared 
address space.


So, alternative is to use GBR to access globals, with the data/bss 
sections allocated independently of the binary.

This way, multiple processes can share the same mapping at the same 
address for any executable code and constant data, with only the data 
sections needing to be allocated.


Does mean though that one needs to save/restore the global pointer, and 
there is a ritual for reloading it.

EXE's generally assume they are index 0, so:
   MOV.Q (GBR, 0), Rt
   MOV.Q (Rt, 0), GBR
Or, in RV terms:
   LD    X6, 0(X3)
   LD    X3, Disp33(X6)
Or, RV64G:
   LD    X6, 0(X3)
   LUI   X5, DispHi
   ADD   X5  X5, X6
   LD    X3, DispLo(X5)


For DLL's, the index is fixed up with a base-reloc (for each loaded 
DLL), so basically the same idea. Typically a Disp33 is used here to 
allow for a potentially large/unknown number of loaded DLL's. Thus far, 
a global numbering scheme is used.

Where, (GBR+0) gives the address of a table of global pointers for every 
loaded binary (can be assumed read-only from userland).


Generally, this is needed if:
   Function may be called from outside of the current binary and:
     Accesses global variables;
     And/or, calls local functions.


Though, still generally lower average-case overhead than the strategy 
typically used by FDPIC, which would handle this reload process on the 
caller side...
   SD    X3, Disp(SP)
   LD    X3, 8(X18)
   LD    X6, 0(X18)
   JALR  X1, 0(X6)
   LD    X3, Disp(SP)

With generally every function pointer existing as a pair with the actual 
function pointer, and its associated global pointer.

Though, caller side handling does arguably avoid the need to perform 
relocs for the table index.

Though, seemingly no one wants to add FDPIC for RV64G, seeing it mostly 
as a 32-bit microcontroller thing.


For normal PIE though, absent CoW, it is necessary to load a new copy of 
the binary each time a new process instance is created.


>> Vs, say, for PIE ELF binaries where it is needed to load a new copy for
>> each process instance because of this (well, excluding an FDPIC style
>> ABI, but seemingly still no one seems to have bothered adding FDPIC
>> support in GCC or friends for RV64 based targets, ...).
>>
>> Well, granted, because Linux and similar tend to load every new process
>> into its own address space and/or use CoW.
> 
> CoW and execl()
> 

Though, execl() effectively replaces the current process.

IMHO, a "CreateProcess()" style abstraction makes more sense than fork+exec.

Though, one tricky way to handle it is:
   vfork: effectively spawns a thread in the same address space as the 
caller, with a provisional PID, and semi-copied stack;
   exec: Creates a new process copying the PID and file-descriptors;
     Internally uses CreateProcess;
     Temporary thread disappears once exec is called.

True "fork()" is more of an issue though...

The true "fork()" semantics are not possible on single-address-space or 
NoMMU systems. Nor fully emulated in things like Cygwin IIRC.

Though, the usual alternative is to give them "vfork()" semantics, and 
things will probably explode if they do anything other than call exec or 
similar.


> --------------
>>>> Other ISAs use a flag bit for each register, but this is less viable
>>>> with an ISA with a larger number of registers, well, unless one uses a
>>>> 64 or 96 bit LDM/STM encoding (possible). Merit though would be not
>>>> needing multiple LDM's / STM's to deal with a discontinuous register
>>>> range.
>>>
>>> To quote Trevor Smith:: "Why would anyone want to do that" ??
>>>
>>
>> Discontinuous register ranges:
>> Because pretty much no ABI's put all of the callee save registers in a
>> contiguous range.
>>
>> Granted, I guess if someone were designing an ISA and ABI clean, they
========== REMAINDER OF ARTICLE TRUNCATED ==========