Deutsch English Français Italiano |
<vsfrqk$28q7o$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Robert Finch <robfi680@gmail.com> Newsgroups: comp.arch Subject: Re: Constant Stack Canaries Date: Tue, 1 Apr 2025 00:58:58 -0400 Organization: A noiseless patient Spider Lines: 161 Message-ID: <vsfrqk$28q7o$1@dont-email.me> References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me> <4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org> <vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me> <e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org> <vseojq$112f7$1@dont-email.me> <62b5c4a25d917c5bab64a815189de826@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 01 Apr 2025 06:59:01 +0200 (CEST) Injection-Info: dont-email.me; posting-host="aaac892bcab3ba4fd41d3fe421492be1"; logging-data="2386168"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18DtJzqzTUXfSeIk/6B7AQqs6t/pwpnMYU=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:U1CzSqzci+85k6sXp4NQcKllnGI= Content-Language: en-US In-Reply-To: <62b5c4a25d917c5bab64a815189de826@www.novabbs.org> Bytes: 8134 On 2025-03-31 4:52 p.m., MitchAlsup1 wrote: > On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote: > >> On 3/31/2025 1:07 PM, MitchAlsup1 wrote: > ------------- >>>> Another option being if it could be a feature of a Load/Store Multiple. >>>> >>>> Say, LDM/STM: >>>> 6b Hi (Upper bound of register to save) >>>> 6b Lo (Lower bound of registers to save) >>>> 1b LR (Flag to save Link Register) >>>> 1b GP (Flag to save Global Pointer) >>>> 1b SK (Flag to generate a canary) Q+3 uses a bitmap of register selection with four more bits selecting overlapping groups. It can work with up to 17 registers. >>> >>> ENTER and EXIT have 2 of those flags--but also note use of SP and CSP >>> are implicit. >>> >>>> Likely (STM): >>>> Pushes LR first (if bit set); >>>> Pushes GP second (if bit set); >>>> Pushes registers in range (if Hi>=Lo); >>>> Pushes stack canary (if bit set). >>> >>> EXIT uses its 3rd flag used when doing longjump() and THROW() >>> so as to pop the call-stack but not actually RET from the stack >>> walker. >>> >> >> OK. >> >> I guess one could debate whether an LDM could treat the Load-LR as "Load >> LR" or "Load address and Branch", and/or have separate flags (Load LR vs >> Load PC, with Load PC meaning to branch). >> >> >> Other ABIs may not have as much reason to save/restore the Global >> Pointer all the time. But, in my case, it is being used as the primary >> way of accessing globals, and each binary image has its own address >> range here. > > I use constants to access globals. > These comes in 32-bit and 64-bit flavors. > >> PC-Rel not being used as PC-Rel doesn't allow for multiple process >> instances of a given loaded binary within a shared address space. > > As long as the relative distance is the same, it does. > >> Vs, say, for PIE ELF binaries where it is needed to load a new copy for >> each process instance because of this (well, excluding an FDPIC style >> ABI, but seemingly still no one seems to have bothered adding FDPIC >> support in GCC or friends for RV64 based targets, ...). >> >> Well, granted, because Linux and similar tend to load every new process >> into its own address space and/or use CoW. > > CoW and execl() > > -------------- >>>> Other ISAs use a flag bit for each register, but this is less viable >>>> with an ISA with a larger number of registers, well, unless one uses a >>>> 64 or 96 bit LDM/STM encoding (possible). Merit though would be not >>>> needing multiple LDM's / STM's to deal with a discontinuous register >>>> range. >>> >>> To quote Trevor Smith:: "Why would anyone want to do that" ?? >>> >> >> Discontinuous register ranges: >> Because pretty much no ABI's put all of the callee save registers in a >> contiguous range. >> >> Granted, I guess if someone were designing an ISA and ABI clean, they >> could make all of the argument registers and callee save registers >> contiguous. >> >> Say: >> R0..R3: Special >> R4..R15: Scratch >> R16..R31: Argument >> R32..R63: Callee Save >> .... >> >> But, invariably, someone will want "compressed" instructions with a >> subset of the registers, and one can't just have these only having >> access to argument registers. > > Brian had little trouble using My 66000 ABI which does have contiguous > register groupings. > >>>> Well, also excluding the possibility where the LDM/STM is essentially >>>> just a function call (say, if beyond certain number of registers are to >>>> be saved/restored, the compiler generates a call to a save/restore >>>> sequence, which is also generates as-needed). Granted, this is >>>> basically >>>> the strategy used by BGBCC. If multiple functions happen to save/ >>>> restore >>>> the same combination of registers, they get to reuse the prior >>>> function's save/restore sequence (generally folded off to before the >>>> function in question). >>> >>> Calling a subroutine to perform epilogues is adding to the number of >>> branches a program executes. Having an instruction like EXIT means >>> when you know you need to exit, you EXIT you don't branch to the exit >>> point. Saving instructions. >>> >> >> Prolog needs a call, but epilog can just be a branch, since no need to >> return back into the function that is returning. > > Yes, but this means My 66000 executes 3 fewer transfers of control > per subroutine than you do. And taken branches add latency. > >> Needs to have a lower limit though, as it is not worth it to use a >> call/branch to save/restore 3 or 4 registers... >> >> But, say, 20 registers, it is more worthwhile. > > ENTER saves as few as 1 or as many as 32 and remains that 1 single > instruction. Same for EXIT and exit also performs the RET when LDing > R0. > >> >>>> Granted, the folding strategy can still do canary values, but doing so >>>> in the reused portions would limit the range of unique canary values >>>> (well, unless the canary magic is XOR'ed with SP or something...). >>>> > Canary values are in addition to ENTER and EXIT not part of them > IMHO. In Q+3 there are push and pop multiple instructions. I did not want to add load and store multiple on top of that. They work great for ISRs, but not so great for task switching code. I have the instructions pushing or popping up to 17 registers in a group. Groups of registers overlap by eight. The instructions can handle all 96 registers in the machine. ENTER and EXIT are also present. It is looking like the context switch code for the OS will take about 3000 clock cycles to run. Not wanting to disable interrupts for that long, I put a spinlock on the system’s task control block array. But I think I have run into an issue. It is the timer ISR that switches tasks. Since it is an ISR it pushes a subset of registers that it uses and restores them at exit. But when exiting and switching tasks it spinlocks on the task control block array. I am not sure this is a good thing. As the timer IRQ is fairly high priority. If something else locked the TCB array it would deadlock. I guess the context switching could be deferred until the app requests some other operating system function. But then the issue is what if the app gets stuck in an infinite loop, not calling the OS? I suppose I could make an OS heartbeat function call a requirement of apps. If the app does not do a heartbeat within a reasonable time, it could be terminated. Q+3 progresses rapidly. A lot of the stuff in earlier versions was removed. The pared down version is a 32-bit machine. Expecting some headaches because of the use of condition registers and branch registers.