| Deutsch English Français Italiano |
|
<vshf6a$3smcv$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Constant Stack Canaries
Date: Tue, 1 Apr 2025 14:34:10 -0500
Organization: A noiseless patient Spider
Lines: 292
Message-ID: <vshf6a$3smcv$1@dont-email.me>
References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me>
<4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org>
<vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me>
<e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org>
<vseojq$112f7$1@dont-email.me>
<62b5c4a25d917c5bab64a815189de826@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 01 Apr 2025 21:35:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="daaf9b4a0d2b7384daa5332985ebaedc";
logging-data="4086175"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/XONmAPPGWTARvcYCExGVa73+Y4N1F4k8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:L6UaBd5+vssI6ownRHgetqyFu4Q=
Content-Language: en-US
In-Reply-To: <62b5c4a25d917c5bab64a815189de826@www.novabbs.org>
On 3/31/2025 3:52 PM, MitchAlsup1 wrote:
> On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
>
>> On 3/31/2025 1:07 PM, MitchAlsup1 wrote:
> -------------
>>>> Another option being if it could be a feature of a Load/Store Multiple.
>>>>
>>>> Say, LDM/STM:
>>>> 6b Hi (Upper bound of register to save)
>>>> 6b Lo (Lower bound of registers to save)
>>>> 1b LR (Flag to save Link Register)
>>>> 1b GP (Flag to save Global Pointer)
>>>> 1b SK (Flag to generate a canary)
>>>
>>> ENTER and EXIT have 2 of those flags--but also note use of SP and CSP
>>> are implicit.
>>>
>>>> Likely (STM):
>>>> Pushes LR first (if bit set);
>>>> Pushes GP second (if bit set);
>>>> Pushes registers in range (if Hi>=Lo);
>>>> Pushes stack canary (if bit set).
>>>
>>> EXIT uses its 3rd flag used when doing longjump() and THROW()
>>> so as to pop the call-stack but not actually RET from the stack
>>> walker.
>>>
>>
>> OK.
>>
>> I guess one could debate whether an LDM could treat the Load-LR as "Load
>> LR" or "Load address and Branch", and/or have separate flags (Load LR vs
>> Load PC, with Load PC meaning to branch).
>>
>>
>> Other ABIs may not have as much reason to save/restore the Global
>> Pointer all the time. But, in my case, it is being used as the primary
>> way of accessing globals, and each binary image has its own address
>> range here.
>
> I use constants to access globals.
> These comes in 32-bit and 64-bit flavors.
>
Typically 16-bit, most are within a 16-bit range of the Global Pointer.
>> PC-Rel not being used as PC-Rel doesn't allow for multiple process
>> instances of a given loaded binary within a shared address space.
>
> As long as the relative distance is the same, it does.
>
Can't happen within a shared address space.
Say, if you load a single copy of a binary at 0x24680000.
Process A and B can't use the same mapping in the same address space,
with PC-rel globals, as then they would each see the other's globals.
You can't do a duplicate mapping at another address, as this both wastes
VAS, and also any Abs64 base-relocs or similar would differ.
You also can't CoW the data/bss sections, as this is no longer a shared
address space.
So, alternative is to use GBR to access globals, with the data/bss
sections allocated independently of the binary.
This way, multiple processes can share the same mapping at the same
address for any executable code and constant data, with only the data
sections needing to be allocated.
Does mean though that one needs to save/restore the global pointer, and
there is a ritual for reloading it.
EXE's generally assume they are index 0, so:
MOV.Q (GBR, 0), Rt
MOV.Q (Rt, 0), GBR
Or, in RV terms:
LD X6, 0(X3)
LD X3, Disp33(X6)
Or, RV64G:
LD X6, 0(X3)
LUI X5, DispHi
ADD X5 X5, X6
LD X3, DispLo(X5)
For DLL's, the index is fixed up with a base-reloc (for each loaded
DLL), so basically the same idea. Typically a Disp33 is used here to
allow for a potentially large/unknown number of loaded DLL's. Thus far,
a global numbering scheme is used.
Where, (GBR+0) gives the address of a table of global pointers for every
loaded binary (can be assumed read-only from userland).
Generally, this is needed if:
Function may be called from outside of the current binary and:
Accesses global variables;
And/or, calls local functions.
Though, still generally lower average-case overhead than the strategy
typically used by FDPIC, which would handle this reload process on the
caller side...
SD X3, Disp(SP)
LD X3, 8(X18)
LD X6, 0(X18)
JALR X1, 0(X6)
LD X3, Disp(SP)
With generally every function pointer existing as a pair with the actual
function pointer, and its associated global pointer.
Though, caller side handling does arguably avoid the need to perform
relocs for the table index.
Though, seemingly no one wants to add FDPIC for RV64G, seeing it mostly
as a 32-bit microcontroller thing.
For normal PIE though, absent CoW, it is necessary to load a new copy of
the binary each time a new process instance is created.
>> Vs, say, for PIE ELF binaries where it is needed to load a new copy for
>> each process instance because of this (well, excluding an FDPIC style
>> ABI, but seemingly still no one seems to have bothered adding FDPIC
>> support in GCC or friends for RV64 based targets, ...).
>>
>> Well, granted, because Linux and similar tend to load every new process
>> into its own address space and/or use CoW.
>
> CoW and execl()
>
Though, execl() effectively replaces the current process.
IMHO, a "CreateProcess()" style abstraction makes more sense than fork+exec.
Though, one tricky way to handle it is:
vfork: effectively spawns a thread in the same address space as the
caller, with a provisional PID, and semi-copied stack;
exec: Creates a new process copying the PID and file-descriptors;
Internally uses CreateProcess;
Temporary thread disappears once exec is called.
True "fork()" is more of an issue though...
The true "fork()" semantics are not possible on single-address-space or
NoMMU systems. Nor fully emulated in things like Cygwin IIRC.
Though, the usual alternative is to give them "vfork()" semantics, and
things will probably explode if they do anything other than call exec or
similar.
> --------------
>>>> Other ISAs use a flag bit for each register, but this is less viable
>>>> with an ISA with a larger number of registers, well, unless one uses a
>>>> 64 or 96 bit LDM/STM encoding (possible). Merit though would be not
>>>> needing multiple LDM's / STM's to deal with a discontinuous register
>>>> range.
>>>
>>> To quote Trevor Smith:: "Why would anyone want to do that" ??
>>>
>>
>> Discontinuous register ranges:
>> Because pretty much no ABI's put all of the callee save registers in a
>> contiguous range.
>>
>> Granted, I guess if someone were designing an ISA and ABI clean, they
========== REMAINDER OF ARTICLE TRUNCATED ==========