Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Robert Finch Newsgroups: comp.arch Subject: Re: Tonight's tradeoff Date: Thu, 7 Mar 2024 08:25:53 -0500 Organization: A noiseless patient Spider Lines: 323 Message-ID: References: <95f07d18ea021f53af50c0bf2064ccdf@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 7 Mar 2024 13:25:55 -0000 (UTC) Injection-Info: dont-email.me; posting-host="62bf69830fd11e8a0b21b52a939d9fa6"; logging-data="1133050"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+VkEjdZ8cNTYzzFd9DnHkO7doMip8ct/I=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:XQn+qu6eIJuIb2IEMQXn07sNbPI= In-Reply-To: Content-Language: en-US Bytes: 14316 On 2024-03-07 1:39 a.m., BGB wrote: > On 3/6/2024 7:28 PM, MitchAlsup1 wrote: >> BGB wrote: >> >>> On 3/6/2024 8:42 AM, Robert Finch wrote: >>>> >> >> >>> In my case, access is figured out on cache-line fetch, and is precooked: >>>    NR, NW, NX, NU, NC: NoRead/NoWrite/NoExecute/NoUser/NoCache. >>> Though, some combinations of these flags are special. >> >> Is there a reason these flags (other than user) are inverted ?? >> {{And even noUser can be changed into Super.}} >> > > Historical quirk... > > Off-hand, I don't remember why it is this way. > Seems this was one of the parts I designed, but as for why the bits were > logically inverted, dunno. > > In terms of the main page flags, they are also inverted. But. in terms > of VUGID and ACL checks, they are not inverted. > > >> In addition, I think you will want to be able to specify which level of >> cache {L1, L2, LLC} this line is stored at, prefetched to, and pushed out >> to. >> > > Possibly, but not really a thing ATM. > It mostly effects the L1 cache, and (indirectly) the newer > set-associative V$ thing. > > >> My 66000 is using ASID instead of something like Super/Global because I >> don't want to have to flush the TLB on a hypervisor context switch -- >> where one GuestOS Super/Global is not the same as another GuestOSs. >> When a GuestOS is accessing one of its user applications, AGEN >> automagiaclly >> uses application AISD instead of GuestOS ASID. {Similar for HV accessing >> GuestOS -- while switching from 1-level translation to 2-level. >> > > This is why I have "ASID Groups"... > > If normal processes are in ASID Groups 00..1F, and VM's are in groups > 38..3F, then global pages in the normal process groups will not be > visible in the VM groups (avoiding the need for a TLB flush). > > But, yeah, I had debated whether or not to not have global pages at all. > > >> >> >>> The L1 cache only hits if the current mode matches the mode that was >>> in effect at the time the cache-line was fetched, and if KRR has not >>> changed (as determined by a hash value), ... >> >> s/mode/ASID/ >> > > Both will effect hit/miss in my case. >   User/Supervisor/ISR; >   What KRR contains; >   Which ISA mode is running; >   ASID; >   ... > All these may cause the L1 caches to miss. > > >>>> For my system the ACL is not part of the PTE, it is part of the >>>> software managed page information, along with share counts. I do not >>>> see the ACL for a page being different depending on the page table. >>>> >> >>> In my case, ACL handling is done via a combination of keyring >>> register (KRR), and a small fully-associative cache (4 entry at >>> present, 6 could be better in theory; luckily each entry is >>> comparably small). >> >>> The ACLID is tied to the TLBE, so the intersection of the ACLID and >>> KRR entry are used to figure out access in the ACL cache (or, >>> ignored/disabled if the low 16 bits of KRR are 0). >> >> >>>> I have dedicated some of the block RAMs for the page management >>>> information, so they may be read out in parallel with a memory >>>> access. So shifted the block RAM usage from the TLB to the PMT. This >>>> makes the TLB smaller. It also reduces the memory usage. The page >>>> management information only needs one copy for each page of memory. >>>> If the information were in the TLBE / PTEs there would be multiple >>>> copies of the information in the page tables. How do you keep things >>>> coherent if there are multiple copies in page tables? >>>> >> >> >>> The access ID for pages is kept in sync with the memory address, >>> since both are uploaded to the TLB at the same time. >> >>> However, as for ACL checks themselves, these are handled with a >>> separate cache. So, say, changing the access to an ACLID, and >>> flushing the corresponding entry from the ACL cache, will >>> automatically apply to any pages previously loaded into the TLB. >> >>> There was also the older VUGID system, which used traditional >>> Unix-style permissions. If I were designing it now, would likely >>> design things around using exclusively ACL checking, which >>> (ironically) also needs less bits to encode. >> >> >> >>> Generally, software TLB miss handling is used in my case. >> >>> There is no automatic way to keep the TLB in sync with the page table >>> (if the page table entry is modified). >> >> My 66000 has a coherent TLB. >> >>> Usual thing is that if the current page table is updated, then one >>> needs to forge a special dummy entry, and then upload this entry to >>> the TLB multiple times (via the LDTLB instruction) to knock the prior >>> contents out of the TLB (or use the INVTLB instruction, but this >>> currently invalidates the entire TLB; which is a bad situation for >>> software-managed TLB...). >> >> See how much easier a coherent TLB is ?? >> > > Possible, but generally only the kernel is going to be updating the page > tables, and the kernel can know that it needs to invoke a special ritual > whenever updating the page table to avoid stale page-table entries being > used... > > > Meanwhile, like with coherent caches, coherent TLB would require some > sort of "spooky action at a distance" (like, somehow, the TLB needs to > know that memory corresponding to a particular part of the page-table > was updated). > > This is possibly even harder to implement, than something like TSO would > be (since, at least with TSO, there is a more obvious correlation > between writing to a cache line and needing to have every other copy at > the same address first written back to main memory). > > > Easier from the hardware design front to throw up ones' hands and be > like "Yeah, the OS can deal with it somehow...". > > Nevermind that apparently my coherence model is even weaker than the > RISC-V model, as they are like "well, there is a FENCE" instruction, and > I am left not having any good idea with how to deal with it either than > "trap and let the trap-handler sort it out..." (presumably by flushing > the L1 caches...). > > Granted this is a crap solution... > > Granted, it appears that "Trap and flush the L1 caches" is still a valid > implementation strategy for "Zifencei". > > >>> Generally, the assumption is that all pages in a mapping will have >>> the same ACLID (generally corresponding to the "owner" of the mapping). >> >> An unsupported assumption if one wants to keep LB flushes minimized. >> > > Possible, but this is more for the OS to care about. > The hardware doesn't care either way. ========== REMAINDER OF ARTICLE TRUNCATED ==========