Deutsch English Français Italiano |
<vshlfk$32p7$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.arch Subject: Re: Constant Stack Canaries Date: Tue, 1 Apr 2025 16:21:30 -0500 Organization: A noiseless patient Spider Lines: 243 Message-ID: <vshlfk$32p7$1@dont-email.me> References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me> <4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org> <vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me> <e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org> <vseojq$112f7$1@dont-email.me> <62b5c4a25d917c5bab64a815189de826@www.novabbs.org> <vsfrqk$28q7o$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 01 Apr 2025 23:23:01 +0200 (CEST) Injection-Info: dont-email.me; posting-host="daaf9b4a0d2b7384daa5332985ebaedc"; logging-data="101159"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18LPstWtbSSEx0wS/sZvyQ5ztuq4rXuv5w=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:zkxCrryda2RqLeVyynk3vSpzjOQ= Content-Language: en-US In-Reply-To: <vsfrqk$28q7o$1@dont-email.me> On 3/31/2025 11:58 PM, Robert Finch wrote: > On 2025-03-31 4:52 p.m., MitchAlsup1 wrote: >> On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote: >> >>> On 3/31/2025 1:07 PM, MitchAlsup1 wrote: >> ------------- >>>>> Another option being if it could be a feature of a Load/Store >>>>> Multiple. >>>>> >>>>> Say, LDM/STM: >>>>> 6b Hi (Upper bound of register to save) >>>>> 6b Lo (Lower bound of registers to save) >>>>> 1b LR (Flag to save Link Register) >>>>> 1b GP (Flag to save Global Pointer) >>>>> 1b SK (Flag to generate a canary) > > Q+3 uses a bitmap of register selection with four more bits selecting > overlapping groups. It can work with up to 17 registers. > OK. If I did LDM/STM style ops, not sure which strategy I would take. The possibility of using a 96-bit encoding with an Imm64 holding a bit-mask of all the registers makes some sense... >>>> >>>> ENTER and EXIT have 2 of those flags--but also note use of SP and CSP >>>> are implicit. >>>> >>>>> Likely (STM): >>>>> Pushes LR first (if bit set); >>>>> Pushes GP second (if bit set); >>>>> Pushes registers in range (if Hi>=Lo); >>>>> Pushes stack canary (if bit set). >>>> >>>> EXIT uses its 3rd flag used when doing longjump() and THROW() >>>> so as to pop the call-stack but not actually RET from the stack >>>> walker. >>>> >>> >>> OK. >>> >>> I guess one could debate whether an LDM could treat the Load-LR as "Load >>> LR" or "Load address and Branch", and/or have separate flags (Load LR vs >>> Load PC, with Load PC meaning to branch). >>> >>> >>> Other ABIs may not have as much reason to save/restore the Global >>> Pointer all the time. But, in my case, it is being used as the primary >>> way of accessing globals, and each binary image has its own address >>> range here. >> >> I use constants to access globals. >> These comes in 32-bit and 64-bit flavors. >> >>> PC-Rel not being used as PC-Rel doesn't allow for multiple process >>> instances of a given loaded binary within a shared address space. >> >> As long as the relative distance is the same, it does. >> >>> Vs, say, for PIE ELF binaries where it is needed to load a new copy for >>> each process instance because of this (well, excluding an FDPIC style >>> ABI, but seemingly still no one seems to have bothered adding FDPIC >>> support in GCC or friends for RV64 based targets, ...). >>> >>> Well, granted, because Linux and similar tend to load every new process >>> into its own address space and/or use CoW. >> >> CoW and execl() >> >> -------------- >>>>> Other ISAs use a flag bit for each register, but this is less viable >>>>> with an ISA with a larger number of registers, well, unless one uses a >>>>> 64 or 96 bit LDM/STM encoding (possible). Merit though would be not >>>>> needing multiple LDM's / STM's to deal with a discontinuous register >>>>> range. >>>> >>>> To quote Trevor Smith:: "Why would anyone want to do that" ?? >>>> >>> >>> Discontinuous register ranges: >>> Because pretty much no ABI's put all of the callee save registers in a >>> contiguous range. >>> >>> Granted, I guess if someone were designing an ISA and ABI clean, they >>> could make all of the argument registers and callee save registers >>> contiguous. >>> >>> Say: >>> R0..R3: Special >>> R4..R15: Scratch >>> R16..R31: Argument >>> R32..R63: Callee Save >>> .... >>> >>> But, invariably, someone will want "compressed" instructions with a >>> subset of the registers, and one can't just have these only having >>> access to argument registers. >> >> Brian had little trouble using My 66000 ABI which does have contiguous >> register groupings. >> >>>>> Well, also excluding the possibility where the LDM/STM is essentially >>>>> just a function call (say, if beyond certain number of registers >>>>> are to >>>>> be saved/restored, the compiler generates a call to a save/restore >>>>> sequence, which is also generates as-needed). Granted, this is >>>>> basically >>>>> the strategy used by BGBCC. If multiple functions happen to save/ >>>>> restore >>>>> the same combination of registers, they get to reuse the prior >>>>> function's save/restore sequence (generally folded off to before the >>>>> function in question). >>>> >>>> Calling a subroutine to perform epilogues is adding to the number of >>>> branches a program executes. Having an instruction like EXIT means >>>> when you know you need to exit, you EXIT you don't branch to the exit >>>> point. Saving instructions. >>>> >>> >>> Prolog needs a call, but epilog can just be a branch, since no need to >>> return back into the function that is returning. >> >> Yes, but this means My 66000 executes 3 fewer transfers of control >> per subroutine than you do. And taken branches add latency. >> >>> Needs to have a lower limit though, as it is not worth it to use a >>> call/branch to save/restore 3 or 4 registers... >>> >>> But, say, 20 registers, it is more worthwhile. >> >> ENTER saves as few as 1 or as many as 32 and remains that 1 single >> instruction. Same for EXIT and exit also performs the RET when LDing >> R0. >> >>> >>>>> Granted, the folding strategy can still do canary values, but doing so >>>>> in the reused portions would limit the range of unique canary values >>>>> (well, unless the canary magic is XOR'ed with SP or something...). >>>>> >> Canary values are in addition to ENTER and EXIT not part of them >> IMHO. > > In Q+3 there are push and pop multiple instructions. I did not want to > add load and store multiple on top of that. They work great for ISRs, > but not so great for task switching code. I have the instructions > pushing or popping up to 17 registers in a group. Groups of registers > overlap by eight. The instructions can handle all 96 registers in the > machine. ENTER and EXIT are also present. > > It is looking like the context switch code for the OS will take about > 3000 clock cycles to run. Not wanting to disable interrupts for that > long, I put a spinlock on the system’s task control block array. But I > think I have run into an issue. It is the timer ISR that switches tasks. > Since it is an ISR it pushes a subset of registers that it uses and > restores them at exit. But when exiting and switching tasks it spinlocks > on the task control block array. I am not sure this is a good thing. As > the timer IRQ is fairly high priority. If something else locked the TCB > array it would deadlock. I guess the context switching could be deferred > until the app requests some other operating system function. But then > the issue is what if the app gets stuck in an infinite loop, not calling > the OS? I suppose I could make an OS heartbeat function call a > requirement of apps. If the app does not do a heartbeat within a > reasonable time, it could be terminated. > > Q+3 progresses rapidly. A lot of the stuff in earlier versions was > removed. The pared down version is a 32-bit machine. Expecting some > headaches because of the use of condition registers and branch registers. > OK. ========== REMAINDER OF ARTICLE TRUNCATED ==========