Deutsch   English   Français   Italiano  
<vshlfk$32p7$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Constant Stack Canaries
Date: Tue, 1 Apr 2025 16:21:30 -0500
Organization: A noiseless patient Spider
Lines: 243
Message-ID: <vshlfk$32p7$1@dont-email.me>
References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me>
 <4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org>
 <vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me>
 <e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org>
 <vseojq$112f7$1@dont-email.me>
 <62b5c4a25d917c5bab64a815189de826@www.novabbs.org>
 <vsfrqk$28q7o$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 01 Apr 2025 23:23:01 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="daaf9b4a0d2b7384daa5332985ebaedc";
	logging-data="101159"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18LPstWtbSSEx0wS/sZvyQ5ztuq4rXuv5w="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:zkxCrryda2RqLeVyynk3vSpzjOQ=
Content-Language: en-US
In-Reply-To: <vsfrqk$28q7o$1@dont-email.me>

On 3/31/2025 11:58 PM, Robert Finch wrote:
> On 2025-03-31 4:52 p.m., MitchAlsup1 wrote:
>> On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
>>
>>> On 3/31/2025 1:07 PM, MitchAlsup1 wrote:
>> -------------
>>>>> Another option being if it could be a feature of a Load/Store 
>>>>> Multiple.
>>>>>
>>>>> Say, LDM/STM:
>>>>>    6b Hi (Upper bound of register to save)
>>>>>    6b Lo (Lower bound of registers to save)
>>>>>    1b LR (Flag to save Link Register)
>>>>>    1b GP (Flag to save Global Pointer)
>>>>>    1b SK (Flag to generate a canary)
> 
> Q+3 uses a bitmap of register selection with four more bits selecting 
> overlapping groups. It can work with up to 17 registers.
> 

OK.

If I did LDM/STM style ops, not sure which strategy I would take.

The possibility of using a 96-bit encoding with an Imm64 holding a 
bit-mask of all the registers makes some sense...


>>>>
>>>> ENTER and EXIT have 2 of those flags--but also note use of SP and CSP
>>>> are implicit.
>>>>
>>>>> Likely (STM):
>>>>>    Pushes LR first (if bit set);
>>>>>    Pushes GP second (if bit set);
>>>>>    Pushes registers in range (if Hi>=Lo);
>>>>>    Pushes stack canary (if bit set).
>>>>
>>>> EXIT uses its 3rd flag used when doing longjump() and THROW()
>>>> so as to pop the call-stack but not actually RET from the stack
>>>> walker.
>>>>
>>>
>>> OK.
>>>
>>> I guess one could debate whether an LDM could treat the Load-LR as "Load
>>> LR" or "Load address and Branch", and/or have separate flags (Load LR vs
>>> Load PC, with Load PC meaning to branch).
>>>
>>>
>>> Other ABIs may not have as much reason to save/restore the Global
>>> Pointer all the time. But, in my case, it is being used as the primary
>>> way of accessing globals, and each binary image has its own address
>>> range here.
>>
>> I use constants to access globals.
>> These comes in 32-bit and 64-bit flavors.
>>
>>> PC-Rel not being used as PC-Rel doesn't allow for multiple process
>>> instances of a given loaded binary within a shared address space.
>>
>> As long as the relative distance is the same, it does.
>>
>>> Vs, say, for PIE ELF binaries where it is needed to load a new copy for
>>> each process instance because of this (well, excluding an FDPIC style
>>> ABI, but seemingly still no one seems to have bothered adding FDPIC
>>> support in GCC or friends for RV64 based targets, ...).
>>>
>>> Well, granted, because Linux and similar tend to load every new process
>>> into its own address space and/or use CoW.
>>
>> CoW and execl()
>>
>> --------------
>>>>> Other ISAs use a flag bit for each register, but this is less viable
>>>>> with an ISA with a larger number of registers, well, unless one uses a
>>>>> 64 or 96 bit LDM/STM encoding (possible). Merit though would be not
>>>>> needing multiple LDM's / STM's to deal with a discontinuous register
>>>>> range.
>>>>
>>>> To quote Trevor Smith:: "Why would anyone want to do that" ??
>>>>
>>>
>>> Discontinuous register ranges:
>>> Because pretty much no ABI's put all of the callee save registers in a
>>> contiguous range.
>>>
>>> Granted, I guess if someone were designing an ISA and ABI clean, they
>>> could make all of the argument registers and callee save registers
>>> contiguous.
>>>
>>> Say:
>>>    R0..R3: Special
>>>    R4..R15: Scratch
>>>    R16..R31: Argument
>>>    R32..R63: Callee Save
>>> ....
>>>
>>> But, invariably, someone will want "compressed" instructions with a
>>> subset of the registers, and one can't just have these only having
>>> access to argument registers.
>>
>> Brian had little trouble using My 66000 ABI which does have contiguous
>> register groupings.
>>
>>>>> Well, also excluding the possibility where the LDM/STM is essentially
>>>>> just a function call (say, if beyond certain number of registers 
>>>>> are to
>>>>> be saved/restored, the compiler generates a call to a save/restore
>>>>> sequence, which is also generates as-needed). Granted, this is 
>>>>> basically
>>>>> the strategy used by BGBCC. If multiple functions happen to save/ 
>>>>> restore
>>>>> the same combination of registers, they get to reuse the prior
>>>>> function's save/restore sequence (generally folded off to before the
>>>>> function in question).
>>>>
>>>> Calling a subroutine to perform epilogues is adding to the number of
>>>> branches a program executes. Having an instruction like EXIT means
>>>> when you know you need to exit, you EXIT you don't branch to the exit
>>>> point. Saving instructions.
>>>>
>>>
>>> Prolog needs a call, but epilog can just be a branch, since no need to
>>> return back into the function that is returning.
>>
>> Yes, but this means My 66000 executes 3 fewer transfers of control
>> per subroutine than you do. And taken branches add latency.
>>
>>> Needs to have a lower limit though, as it is not worth it to use a
>>> call/branch to save/restore 3 or 4 registers...
>>>
>>> But, say, 20 registers, it is more worthwhile.
>>
>> ENTER saves as few as 1 or as many as 32 and remains that 1 single
>> instruction. Same for EXIT and exit also performs the RET when LDing
>> R0.
>>
>>>
>>>>> Granted, the folding strategy can still do canary values, but doing so
>>>>> in the reused portions would limit the range of unique canary values
>>>>> (well, unless the canary magic is XOR'ed with SP or something...).
>>>>>
>> Canary values are in addition to ENTER and EXIT not part of them
>> IMHO.
> 
> In Q+3 there are push and pop multiple instructions. I did not want to 
> add load and store multiple on top of that. They work great for ISRs, 
> but not so great for task switching code. I have the instructions 
> pushing or popping up to 17 registers in a group. Groups of registers 
> overlap by eight. The instructions can handle all 96 registers in the 
> machine. ENTER and EXIT are also present.
> 
> It is looking like the context switch code for the OS will take about 
> 3000 clock cycles to run. Not wanting to disable interrupts for that 
> long, I put a spinlock on the system’s task control block array. But I 
> think I have run into an issue. It is the timer ISR that switches tasks. 
> Since it is an ISR it pushes a subset of registers that it uses and 
> restores them at exit. But when exiting and switching tasks it spinlocks 
> on the task control block array. I am not sure this is a good thing. As 
> the timer IRQ is fairly high priority. If something else locked the TCB 
> array it would deadlock. I guess the context switching could be deferred 
> until the app requests some other operating system function. But then 
> the issue is what if the app gets stuck in an infinite loop, not calling 
> the OS? I suppose I could make an OS heartbeat function call a 
> requirement of apps. If the app does not do a heartbeat within a 
> reasonable time, it could be terminated.
> 
> Q+3 progresses rapidly. A lot of the stuff in earlier versions was 
> removed. The pared down version is a 32-bit machine. Expecting some 
> headaches because of the use of condition registers and branch registers.
> 

OK.
========== REMAINDER OF ARTICLE TRUNCATED ==========