Deutsch English Français Italiano |
<vljjbs$29b71$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: "Paul A. Clayton" <paaronclayton@gmail.com> Newsgroups: comp.arch Subject: Re: Microarchitectural support for counting Date: Mon, 6 Jan 2025 11:33:05 -0500 Organization: A noiseless patient Spider Lines: 75 Message-ID: <vljjbs$29b71$1@dont-email.me> References: <2024Oct3.160055@mips.complang.tuwien.ac.at> <vdmrk6$3rksr$1@dont-email.me> <LyELO.69485$2nv5.62232@fx39.iad> <TdWLO.282116$FzW1.158190@fx14.iad> <963a276fd8d43e4212477cefae7f6e46@www.novabbs.org> <8IcMO.249144$v8v2.147178@fx18.iad> <vkhgkn$2g9gm$1@dont-email.me> <7bffca4c284d329c60d8e93c7382c30f@www.novabbs.org> <vl1nr6$2diq8$1@dont-email.me> <b8760a721775ebc2e1f232c2edae4be9@www.novabbs.org> <6CBdP.20426$nlJ1.2963@fx41.iad> <l3VdP.349682$0O61.241301@fx15.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 07 Jan 2025 17:05:17 +0100 (CET) Injection-Info: dont-email.me; posting-host="9d65aefa36e1ba0a69163b3b04be1762"; logging-data="2403553"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19o5P89OfMbqY3FD6m429ofLn3N3+BKilw=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0 Cancel-Lock: sha1:sc0HYL2UYueCxGFDjjYgHs2XNnE= In-Reply-To: <l3VdP.349682$0O61.241301@fx15.iad> Bytes: 5216 On 1/3/25 12:24 PM, Scott Lurndal wrote: > EricP <ThatWouldBeTelling@thevillage.com> writes: [snip] >> For MMIO device registers I think having an explicit SPCB instruction >> might be better than putting a "no-speculate" flag on the PTE for the >> device register address as that flag would be difficult to propagate >> backwards from address translate to all the parts of the core that >> we might have to sync with. > > MMIO accesses are, by definition, non-cachable, which is typically > designated in either a translation table entry or associated > attribute registers (MTTR, MAIR). Non-cacheable accesses > are not speculatively executed, which provides the > correct semantics for device registers which have side effects > on read accesses. It is not clear to me that Memory-Mapped I/O requires non-cacheable accesses. Some addresses within I/O device address areas do not have access side effects. I would **GUESS** that most I/O addresses do not have read side effects. (One obvious exception would be implicit buffers where a read "pops" a value from a queue allowing the next value to be accessed at the same address. _Theoretically_ one could buffer such reads outside of the I/O device such that old values would not be lost and incorrect speculation could be rolled back — this might be a form of versioned memory. Along similar lines, values could be prefetched and cached as long as all modifiers of the values use cache coherency. There may well be other cases of read side effects.) In general writes require hidden buffering for speculation, but write side effects can affect later reads. One possibility would be a write that changes which buffer is accessed at a given address. Such a write followed by a read of such a buffer address must have the read presented after the write, so caching the read address would be problematic. One weak type of write side effect would be similar to releasing a lock, where with a weaker memory order one needs to ensure that previous writes are visible before the "lock is released". E.g., one might update a command buffer on an I/O device with multiple writes and lastly update a I/O device pointer to indicate that the buffer was added to. The ordering required for this is weaker than sequential consistency. If certain kinds of side effects are limited to a single device, then the ordering of accesses to different devices may allow greater flexibility in ordering. (This seems conceptually similar to cache coherence vs. consistency where "single I/O device" corresponds to single address. Cache coherence provides strict consistency for a single address.) I seem to recall that StrongARM exploited a distinction between "bufferable" and "cacheable" marked in PTEs to select the cache to which an access would be allocated. This presumably means that the two terms had different consistency/coherence constraints. I am very skeptical that an extremely complex system with best possible performance would be worthwhile. However, I suspect that some relaxation of ordering and cacheability would be practical and worthwhile. I do very much object to requiring memory-mapped I/O as a concept to require non-cacheability even if existing software (and hardware) and development mindset makes any relaxation impractical. Since x86 allowed a different kind of consistency for non-temporal stores, it may not be absurd for a new architecture to present a more complex interface, presumably with the option not to deal with that complexity. Of course, the most likely result would be hardware having to support the complexity with not actual benefit from use.