Deutsch English Français Italiano |
<v038qm$bmtm$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Paul A. Clayton" <paaronclayton@gmail.com> Newsgroups: comp.arch Subject: Re: 88xxx or PPC Date: Sat, 20 Apr 2024 15:15:04 -0400 Organization: A noiseless patient Spider Lines: 111 Message-ID: <v038qm$bmtm$1@dont-email.me> References: <uigus7$1pteb$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad> <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org> <us8l5d$6ae9$1@dont-email.me> <7d218b002494ff0fedd0abd386f7aa08@www.novabbs.org> <usgid9$20vho$3@dont-email.me> <f40fa64b4d719b47fb3ab79ca334ebc3@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 21 Apr 2024 16:45:43 +0200 (CEST) Injection-Info: dont-email.me; posting-host="5d52f8e0f0694c11b30894cb014da68f"; logging-data="383926"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19kJImr+dm07nvqkIIoqmYZBXMR1FSKkDw=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0 Cancel-Lock: sha1:SuR2Onq4DpxcmlbDm6qAtEy8k58= In-Reply-To: <f40fa64b4d719b47fb3ab79ca334ebc3@www.novabbs.org> Bytes: 7087 On 3/8/24 11:17 PM, MitchAlsup1 wrote: > Paul A. Clayton wrote: > [snip] >> Register windows were intended to avoid save/restore overhead by >> retaining values in registers with renaming. A stack cache is >> meant to reduce the overhead of loads and stores to the stack — >> not just preserving and restoring registers. A direct-mapped stack >> cache is not entirely insane. A partial stack frame cache might >> cache up to 256 bytes (e.g.) with alternating frames indexing with >> inverted bits (to reduce interference) — one could even reserve a >> chunk (e.g., 64 bytes) of a frame and not overlapped by limiting >> offset cached to be smaller than the cache. > >> Such might be more useful than register windows, but that does >> not mean that it is actually a good option. > > If it is such a good option why has it not reached production ?? (might be) more useful than register windows is not the same as providing a net benefit when considering the entire system. One obvious issue with a small stack cache is utilization. While generic data caches also have utilization issues (no single size is ideal for all workloads) and the stack cache would be small (and potentially highly prefetchable), the spilling and filling overhead at entering and exiting stack frames could be much greater than the savings from simple addressing (and permission checks) if few accesses are made within the cached part of the stack frame between frame spills and fills. A latency optimized partial frame stack cache would also benefit from specific sizes of higher utilization regions of stack frames with longish frame active periods, so compiler-based optimization would be a factor. Depending on microarchitecture-specific compiler optimization for good performance is generally avoided. This is related to software distribution format. If aliasing was not avoided by architectural contract — which would be difficult for any existing ISA — then handling aliases would also introduce overhead. (For higher utilization, one might want to avoid caching the registers saved at function entry, assuming these are colder and less latency sensitive than other values in the frame. Since the amount of the frame used by saved registers would vary, a hardware-friendly fixed uncached chunk would either waste capacity on cold saved registers when more registers are saved or make some potentially warmer values uncached [in the stack cache]. Updating the stack pointer to hide saved register would address this but would presumably introduce other issues.) Another factor that would reduce the attractiveness of specialized caches is the use of out-of-order execution. OoOE helps hide latency, so any latency benefit is less important. Not all optimization opportunities are implemented even when they do not conflict excessively. Part of this is the complexity and risks of adding new features. >> On 3/6/24 3:00 PM, MitchAlsup1 wrote: >>> Paul A. Clayton wrote: >>>> An L2 register set that can only be accessed for one operand >>>> might be somewhat similar to LD-OP. >>> >>> In high speed designs, there are at least 2 cycles of delay >>> from AGEN >>> to the L2 and 2 cycles of delay back. Even zero cycle access >>> sees at >>> least 4 cycles of latency, 5 if you count AGEN. There seems to have been confusion. I wrote "L2 _register_ set". Being able to access a larger register name space for one operand might be useful when value reuse often has moderate temporal locality. Such an L2 register set is even more complicated than load-op in terms of compiler optimization. Renaming a larger name space of (L2) registers would also introduce issues. I suspect something more like a Load-Store Queue would be used rather than a register alias table. The benefits from specialization (e.g., smaller tags from the smaller address space than general memory for LSQ) would conflict with the utilization benefits of only having an LSQ. Physical placement would also involve tradeoffs of latency (and access energy) relative to L1 data cache. Giving prime real estate to an L2 register file would increase L1 latency (and access energy). Dynamic scheduling would also be a little more complicated by adding another latency consideration, and using banking rather than multiporting — which becomes more reasonable at larger capacities — would add more latency variability. It does seem *to me* that there should be a benefit from a storage region of intermediate capacity with simpler addressing than general memory. >> Presumably this is related to the storage technology used as >> well as the capacity. > > Purely wire delay due to the size of the L2 cache. Wire delay due to physical size is related to storage technology as well as capacity. E.g., DRAM can be denser than SRAM and thus lower latency at larger sizes even when array access is slower. Single-ported register storage technology would (I ass_me) be even less dense than SRAM, such that there would be some capacity where latency would be better with SRAM even when register storage would be faster at the array level. Of course, latency is not the only consideration for storage.