| Deutsch English Français Italiano |
|
<vsid3u$sput$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.arch Subject: Re: Constant Stack Canaries Date: Tue, 1 Apr 2025 23:04:52 -0500 Organization: A noiseless patient Spider Lines: 467 Message-ID: <vsid3u$sput$1@dont-email.me> References: <vsbcnl$1d4m5$1@dont-email.me> <vsc058$20pih$1@dont-email.me> <4cf60b5fd8b785feb07a67a823cc349d@www.novabbs.org> <vseeen$l4ig$1@dont-email.me> <vseiq9$qndj$1@dont-email.me> <e05e9d429f71944bbfe74c3f54b79a03@www.novabbs.org> <vseojq$112f7$1@dont-email.me> <62b5c4a25d917c5bab64a815189de826@www.novabbs.org> <vsfrqk$28q7o$1@dont-email.me> <vshlfk$32p7$1@dont-email.me> <vshoop$3sqf$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 02 Apr 2025 06:06:23 +0200 (CEST) Injection-Info: dont-email.me; posting-host="c11654d5afe4b7ef2b18d1e91dc487ad"; logging-data="944093"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/7XkJQC4901RRNQWSS5/2q0iL6wcHKseg=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:15OmI51u47tLjgJpHoeSweyFYdE= Content-Language: en-US In-Reply-To: <vshoop$3sqf$2@dont-email.me> Bytes: 20858 On 4/1/2025 5:19 PM, Robert Finch wrote: > On 2025-04-01 5:21 p.m., BGB wrote: >> On 3/31/2025 11:58 PM, Robert Finch wrote: >>> On 2025-03-31 4:52 p.m., MitchAlsup1 wrote: >>>> On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote: >>>> >>>>> On 3/31/2025 1:07 PM, MitchAlsup1 wrote: >>>> ------------- >>>>>>> Another option being if it could be a feature of a Load/Store >>>>>>> Multiple. >>>>>>> >>>>>>> Say, LDM/STM: >>>>>>> 6b Hi (Upper bound of register to save) >>>>>>> 6b Lo (Lower bound of registers to save) >>>>>>> 1b LR (Flag to save Link Register) >>>>>>> 1b GP (Flag to save Global Pointer) >>>>>>> 1b SK (Flag to generate a canary) >>> >>> Q+3 uses a bitmap of register selection with four more bits selecting >>> overlapping groups. It can work with up to 17 registers. >>> >> >> OK. >> >> If I did LDM/STM style ops, not sure which strategy I would take. >> >> The possibility of using a 96-bit encoding with an Imm64 holding a >> bit- mask of all the registers makes some sense... >> >> >>>>>> >>>>>> ENTER and EXIT have 2 of those flags--but also note use of SP and CSP >>>>>> are implicit. >>>>>> >>>>>>> Likely (STM): >>>>>>> Pushes LR first (if bit set); >>>>>>> Pushes GP second (if bit set); >>>>>>> Pushes registers in range (if Hi>=Lo); >>>>>>> Pushes stack canary (if bit set). >>>>>> >>>>>> EXIT uses its 3rd flag used when doing longjump() and THROW() >>>>>> so as to pop the call-stack but not actually RET from the stack >>>>>> walker. >>>>>> >>>>> >>>>> OK. >>>>> >>>>> I guess one could debate whether an LDM could treat the Load-LR as >>>>> "Load >>>>> LR" or "Load address and Branch", and/or have separate flags (Load >>>>> LR vs >>>>> Load PC, with Load PC meaning to branch). >>>>> >>>>> >>>>> Other ABIs may not have as much reason to save/restore the Global >>>>> Pointer all the time. But, in my case, it is being used as the primary >>>>> way of accessing globals, and each binary image has its own address >>>>> range here. >>>> >>>> I use constants to access globals. >>>> These comes in 32-bit and 64-bit flavors. >>>> >>>>> PC-Rel not being used as PC-Rel doesn't allow for multiple process >>>>> instances of a given loaded binary within a shared address space. >>>> >>>> As long as the relative distance is the same, it does. >>>> >>>>> Vs, say, for PIE ELF binaries where it is needed to load a new copy >>>>> for >>>>> each process instance because of this (well, excluding an FDPIC style >>>>> ABI, but seemingly still no one seems to have bothered adding FDPIC >>>>> support in GCC or friends for RV64 based targets, ...). >>>>> >>>>> Well, granted, because Linux and similar tend to load every new >>>>> process >>>>> into its own address space and/or use CoW. >>>> >>>> CoW and execl() >>>> >>>> -------------- >>>>>>> Other ISAs use a flag bit for each register, but this is less viable >>>>>>> with an ISA with a larger number of registers, well, unless one >>>>>>> uses a >>>>>>> 64 or 96 bit LDM/STM encoding (possible). Merit though would be not >>>>>>> needing multiple LDM's / STM's to deal with a discontinuous register >>>>>>> range. >>>>>> >>>>>> To quote Trevor Smith:: "Why would anyone want to do that" ?? >>>>>> >>>>> >>>>> Discontinuous register ranges: >>>>> Because pretty much no ABI's put all of the callee save registers in a >>>>> contiguous range. >>>>> >>>>> Granted, I guess if someone were designing an ISA and ABI clean, they >>>>> could make all of the argument registers and callee save registers >>>>> contiguous. >>>>> >>>>> Say: >>>>> R0..R3: Special >>>>> R4..R15: Scratch >>>>> R16..R31: Argument >>>>> R32..R63: Callee Save >>>>> .... >>>>> >>>>> But, invariably, someone will want "compressed" instructions with a >>>>> subset of the registers, and one can't just have these only having >>>>> access to argument registers. >>>> >>>> Brian had little trouble using My 66000 ABI which does have contiguous >>>> register groupings. >>>> >>>>>>> Well, also excluding the possibility where the LDM/STM is >>>>>>> essentially >>>>>>> just a function call (say, if beyond certain number of registers >>>>>>> are to >>>>>>> be saved/restored, the compiler generates a call to a save/restore >>>>>>> sequence, which is also generates as-needed). Granted, this is >>>>>>> basically >>>>>>> the strategy used by BGBCC. If multiple functions happen to save/ >>>>>>> restore >>>>>>> the same combination of registers, they get to reuse the prior >>>>>>> function's save/restore sequence (generally folded off to before the >>>>>>> function in question). >>>>>> >>>>>> Calling a subroutine to perform epilogues is adding to the number of >>>>>> branches a program executes. Having an instruction like EXIT means >>>>>> when you know you need to exit, you EXIT you don't branch to the exit >>>>>> point. Saving instructions. >>>>>> >>>>> >>>>> Prolog needs a call, but epilog can just be a branch, since no need to >>>>> return back into the function that is returning. >>>> >>>> Yes, but this means My 66000 executes 3 fewer transfers of control >>>> per subroutine than you do. And taken branches add latency. >>>> >>>>> Needs to have a lower limit though, as it is not worth it to use a >>>>> call/branch to save/restore 3 or 4 registers... >>>>> >>>>> But, say, 20 registers, it is more worthwhile. >>>> >>>> ENTER saves as few as 1 or as many as 32 and remains that 1 single >>>> instruction. Same for EXIT and exit also performs the RET when LDing >>>> R0. >>>> >>>>> >>>>>>> Granted, the folding strategy can still do canary values, but >>>>>>> doing so >>>>>>> in the reused portions would limit the range of unique canary values >>>>>>> (well, unless the canary magic is XOR'ed with SP or something...). >>>>>>> >>>> Canary values are in addition to ENTER and EXIT not part of them >>>> IMHO. >>> >>> In Q+3 there are push and pop multiple instructions. I did not want >>> to add load and store multiple on top of that. They work great for >>> ISRs, but not so great for task switching code. I have the >>> instructions pushing or popping up to 17 registers in a group. Groups >>> of registers overlap by eight. The instructions can handle all 96 >>> registers in the machine. ENTER and EXIT are also present. >>> >>> It is looking like the context switch code for the OS will take about >>> 3000 clock cycles to run. Not wanting to disable interrupts for that >>> long, I put a spinlock on the system’s task control block array. But >>> I think I have run into an issue. It is the timer ISR that switches >>> tasks. Since it is an ISR it pushes a subset of registers that it >>> uses and restores them at exit. But when exiting and switching tasks >>> it spinlocks on the task control block array. I am not sure this is a >>> good thing. As the timer IRQ is fairly high priority. If something >>> else locked the TCB array it would deadlock. I guess the context >>> switching could be deferred until the app requests some other ========== REMAINDER OF ARTICLE TRUNCATED ==========