Deutsch   English   Français   Italiano  
<vsid3u$sput$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Constant Stack Canaries
Date: Tue, 1 Apr 2025 23:04:52 -0500
Organization: A noiseless patient Spider
Lines: 467
Message-ID: <vsid3u$sput$1@dont-email.me>
References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me>
 <4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org>
 <vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me>
 <e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org>
 <vseojq$112f7$1@dont-email.me>
 <62b5c4a25d917c5bab64a815189de826@www.novabbs.org>
 <vsfrqk$28q7o$1@dont-email.me> <vshlfk$32p7$1@dont-email.me>
 <vshoop$3sqf$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 02 Apr 2025 06:06:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c11654d5afe4b7ef2b18d1e91dc487ad";
	logging-data="944093"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/7XkJQC4901RRNQWSS5/2q0iL6wcHKseg="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:15OmI51u47tLjgJpHoeSweyFYdE=
Content-Language: en-US
In-Reply-To: <vshoop$3sqf$2@dont-email.me>
Bytes: 20858

On 4/1/2025 5:19 PM, Robert Finch wrote:
> On 2025-04-01 5:21 p.m., BGB wrote:
>> On 3/31/2025 11:58 PM, Robert Finch wrote:
>>> On 2025-03-31 4:52 p.m., MitchAlsup1 wrote:
>>>> On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
>>>>
>>>>> On 3/31/2025 1:07 PM, MitchAlsup1 wrote:
>>>> -------------
>>>>>>> Another option being if it could be a feature of a Load/Store 
>>>>>>> Multiple.
>>>>>>>
>>>>>>> Say, LDM/STM:
>>>>>>>    6b Hi (Upper bound of register to save)
>>>>>>>    6b Lo (Lower bound of registers to save)
>>>>>>>    1b LR (Flag to save Link Register)
>>>>>>>    1b GP (Flag to save Global Pointer)
>>>>>>>    1b SK (Flag to generate a canary)
>>>
>>> Q+3 uses a bitmap of register selection with four more bits selecting 
>>> overlapping groups. It can work with up to 17 registers.
>>>
>>
>> OK.
>>
>> If I did LDM/STM style ops, not sure which strategy I would take.
>>
>> The possibility of using a 96-bit encoding with an Imm64 holding a 
>> bit- mask of all the registers makes some sense...
>>
>>
>>>>>>
>>>>>> ENTER and EXIT have 2 of those flags--but also note use of SP and CSP
>>>>>> are implicit.
>>>>>>
>>>>>>> Likely (STM):
>>>>>>>    Pushes LR first (if bit set);
>>>>>>>    Pushes GP second (if bit set);
>>>>>>>    Pushes registers in range (if Hi>=Lo);
>>>>>>>    Pushes stack canary (if bit set).
>>>>>>
>>>>>> EXIT uses its 3rd flag used when doing longjump() and THROW()
>>>>>> so as to pop the call-stack but not actually RET from the stack
>>>>>> walker.
>>>>>>
>>>>>
>>>>> OK.
>>>>>
>>>>> I guess one could debate whether an LDM could treat the Load-LR as 
>>>>> "Load
>>>>> LR" or "Load address and Branch", and/or have separate flags (Load 
>>>>> LR vs
>>>>> Load PC, with Load PC meaning to branch).
>>>>>
>>>>>
>>>>> Other ABIs may not have as much reason to save/restore the Global
>>>>> Pointer all the time. But, in my case, it is being used as the primary
>>>>> way of accessing globals, and each binary image has its own address
>>>>> range here.
>>>>
>>>> I use constants to access globals.
>>>> These comes in 32-bit and 64-bit flavors.
>>>>
>>>>> PC-Rel not being used as PC-Rel doesn't allow for multiple process
>>>>> instances of a given loaded binary within a shared address space.
>>>>
>>>> As long as the relative distance is the same, it does.
>>>>
>>>>> Vs, say, for PIE ELF binaries where it is needed to load a new copy 
>>>>> for
>>>>> each process instance because of this (well, excluding an FDPIC style
>>>>> ABI, but seemingly still no one seems to have bothered adding FDPIC
>>>>> support in GCC or friends for RV64 based targets, ...).
>>>>>
>>>>> Well, granted, because Linux and similar tend to load every new 
>>>>> process
>>>>> into its own address space and/or use CoW.
>>>>
>>>> CoW and execl()
>>>>
>>>> --------------
>>>>>>> Other ISAs use a flag bit for each register, but this is less viable
>>>>>>> with an ISA with a larger number of registers, well, unless one 
>>>>>>> uses a
>>>>>>> 64 or 96 bit LDM/STM encoding (possible). Merit though would be not
>>>>>>> needing multiple LDM's / STM's to deal with a discontinuous register
>>>>>>> range.
>>>>>>
>>>>>> To quote Trevor Smith:: "Why would anyone want to do that" ??
>>>>>>
>>>>>
>>>>> Discontinuous register ranges:
>>>>> Because pretty much no ABI's put all of the callee save registers in a
>>>>> contiguous range.
>>>>>
>>>>> Granted, I guess if someone were designing an ISA and ABI clean, they
>>>>> could make all of the argument registers and callee save registers
>>>>> contiguous.
>>>>>
>>>>> Say:
>>>>>    R0..R3: Special
>>>>>    R4..R15: Scratch
>>>>>    R16..R31: Argument
>>>>>    R32..R63: Callee Save
>>>>> ....
>>>>>
>>>>> But, invariably, someone will want "compressed" instructions with a
>>>>> subset of the registers, and one can't just have these only having
>>>>> access to argument registers.
>>>>
>>>> Brian had little trouble using My 66000 ABI which does have contiguous
>>>> register groupings.
>>>>
>>>>>>> Well, also excluding the possibility where the LDM/STM is 
>>>>>>> essentially
>>>>>>> just a function call (say, if beyond certain number of registers 
>>>>>>> are to
>>>>>>> be saved/restored, the compiler generates a call to a save/restore
>>>>>>> sequence, which is also generates as-needed). Granted, this is 
>>>>>>> basically
>>>>>>> the strategy used by BGBCC. If multiple functions happen to save/ 
>>>>>>> restore
>>>>>>> the same combination of registers, they get to reuse the prior
>>>>>>> function's save/restore sequence (generally folded off to before the
>>>>>>> function in question).
>>>>>>
>>>>>> Calling a subroutine to perform epilogues is adding to the number of
>>>>>> branches a program executes. Having an instruction like EXIT means
>>>>>> when you know you need to exit, you EXIT you don't branch to the exit
>>>>>> point. Saving instructions.
>>>>>>
>>>>>
>>>>> Prolog needs a call, but epilog can just be a branch, since no need to
>>>>> return back into the function that is returning.
>>>>
>>>> Yes, but this means My 66000 executes 3 fewer transfers of control
>>>> per subroutine than you do. And taken branches add latency.
>>>>
>>>>> Needs to have a lower limit though, as it is not worth it to use a
>>>>> call/branch to save/restore 3 or 4 registers...
>>>>>
>>>>> But, say, 20 registers, it is more worthwhile.
>>>>
>>>> ENTER saves as few as 1 or as many as 32 and remains that 1 single
>>>> instruction. Same for EXIT and exit also performs the RET when LDing
>>>> R0.
>>>>
>>>>>
>>>>>>> Granted, the folding strategy can still do canary values, but 
>>>>>>> doing so
>>>>>>> in the reused portions would limit the range of unique canary values
>>>>>>> (well, unless the canary magic is XOR'ed with SP or something...).
>>>>>>>
>>>> Canary values are in addition to ENTER and EXIT not part of them
>>>> IMHO.
>>>
>>> In Q+3 there are push and pop multiple instructions. I did not want 
>>> to add load and store multiple on top of that. They work great for 
>>> ISRs, but not so great for task switching code. I have the 
>>> instructions pushing or popping up to 17 registers in a group. Groups 
>>> of registers overlap by eight. The instructions can handle all 96 
>>> registers in the machine. ENTER and EXIT are also present.
>>>
>>> It is looking like the context switch code for the OS will take about 
>>> 3000 clock cycles to run. Not wanting to disable interrupts for that 
>>> long, I put a spinlock on the system’s task control block array. But 
>>> I think I have run into an issue. It is the timer ISR that switches 
>>> tasks. Since it is an ISR it pushes a subset of registers that it 
>>> uses and restores them at exit. But when exiting and switching tasks 
>>> it spinlocks on the task control block array. I am not sure this is a 
>>> good thing. As the timer IRQ is fairly high priority. If something 
>>> else locked the TCB array it would deadlock. I guess the context 
>>> switching could be deferred until the app requests some other 
========== REMAINDER OF ARTICLE TRUNCATED ==========