Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Paul A. Clayton" Newsgroups: comp.arch Subject: Re: Arguments for a sane ISA 6-years later Date: Sat, 27 Jul 2024 21:01:59 -0400 Organization: A noiseless patient Spider Lines: 54 Message-ID: References: <034bc00e088a2cb40307e73ce30dcb2f@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 28 Jul 2024 03:09:33 +0200 (CEST) Injection-Info: dont-email.me; posting-host="dc428a5f16f626673ad5c967f5230e77"; logging-data="3812313"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+cXOXOgjLyVlLH9MlGgKbf2bGEtBKJtKc=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0 Cancel-Lock: sha1:2m2xVYTtzjyfjKibROiOA3yPUCU= In-Reply-To: <034bc00e088a2cb40307e73ce30dcb2f@www.novabbs.org> Bytes: 3900 On 7/25/24 6:07 PM, MitchAlsup1 wrote: > On Thu, 25 Jul 2024 20:09:06 +0000, BGB wrote: > >> On 7/24/2024 3:37 PM, MitchAlsup1 wrote: [snip] >>> D) exception and interrupt control transfer should take no more >>> ..than 1 cache line read followed by 4 cache line reads to the >>> ..same page in DRAM/L3/L2 that are dependent on the first cache >>> ..line read. Control transfer back to the suspended thread should >>> ..be no longer than the control transfer to the exception handler. [snip] >> A fast, but more expensive, option would be to have multiple >> copies of >> the register file which is then bank-switched on an interrupt. > > Under My 66000 a low end implementation can choose the write back > cache > version, while the GBOoO implementation can choose the bank switcher. > In both cases, the same model is presented to executing SW. I do not know at what port count a "3D register file" (temporal banking where extra storage "hides" under the wires) makes sense. I suspect the 3-read, 1-write register file of a low end My 66000 implementation would have the overhead be too great unless lower overhead context switching was extremely important. Another technique for reducing storage overhead in highly ported \ register files is to use checkpoint registers that connect to the highly ported cells. (The paper that proposed this — I am not certain I can find it again — did not swap the values, only allowing push and pop and that at a depth of one. IIRC, this was proposed for speculatively dead values, so more storage would be available for in-use values. For short interrupts this might be usable — with save to cache/memory if the context is still alive when a different context is scheduled to run.) I do not know if a ring buffer could be designed that might allow coarse-grained barrel-like processing (or even a race track memory of slower arbitrary context switching), but that seems unlikely to be useful (except possibly in some rather special purpose processor). In general it seems that one would only want contexts to be "cached" in registers (even with clever storage cost reductions) if switches are frequent or the latency was especially critical. A small core something like the CDC 6600 Peripheral Processor might justify multithreading at finer granularity than through cache-based context swapping. I also *feel* that reduced contexts could have some utility. Some threads have low ILP and do not benefit as much from extra register state; even moderately coarse grained hardware multithreading might bring performance benefits and reduced contexts could reduce the switch overhead. Of course, such implies two (slightly) different ISAs.