Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connectionsPath: ...!eternal-september.org!feeder3.eternal-september.org!i2pn.org!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Stacks, was Segments Date: Sat, 8 Feb 2025 22:19:47 +0000 Organization: Rocksolid Light Message-ID: <4a1eb52bbccf3c20554ac8016bb7f97e@www.novabbs.org> References: <0IboP.166940$V9s2.82811@fx34.iad> <876e9cf6b21da15dc541756be2e24049@www.novabbs.org> <710d0f743fb3204b909114db4429633d@www.novabbs.org> <7y7pP.2$JzO2.1@fx15.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="3333536"; mail-complaints-to="usenet@i2pn2.org"; posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU"; User-Agent: Rocksolid Light X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71 X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Site: $2y$10$oD99LlbAFq.gDhfiTeSNHeAEMiKAMaEwgG9fIs/KmmvcNCUAbScyW Bytes: 11959 Lines: 235 On Fri, 7 Feb 2025 20:32:39 +0000, Scott Lurndal wrote: > mitchalsup@aol.com (MitchAlsup1) writes: >>On Fri, 7 Feb 2025 13:57:51 +0000, Scott Lurndal wrote: >> >>> mitchalsup@aol.com (MitchAlsup1) writes: >>>>On Thu, 6 Feb 2025 20:06:31 +0000, Stephen Fuld wrote: >>>> >>>>> On 2/6/2025 10:51 AM, EricP wrote: >>>>>> MitchAlsup1 wrote: >>>>>>> On Thu, 6 Feb 2025 16:41:45 +0000, EricP wrote: >>>>>>>------------------- >>>>>> Not sure how this would work with device IO and DMA. >>>>>> Say a secure kernel that owns a disk drive with secrets that even the HV >>>>>> is not authorized to see (so HV operators don't need Top Secret >>>>>> clearance). >>>>>> The Hypervisor has to pass to a hardware device DMA access to a memory >>>>>> frame that it has no access to itself. How does one block the HV from >>>>>> setting the IOMMU to DMA the device's secrets into its own memory? >>>>>> >>>>>> Hmmm... something like: once a secure HV passes a physical frame address >>>>>> to a secure kernel then it cannot take it back, it can only ask that >>>>>> kernel for it back. Which means that the HV looses control of any >>>>>> core or IOMMU PTE's that map that frame until it is handed back. >>>>>> >>>>>> That would seem to imply that once an HV gives memory to a secure >>>>>> guest kernel that it can only page that guest with its permission. >>>>>> Hmmm... >>>>> >>>>> I am a little confused here. When you talk about I0MMU addresses, are >>>>> you talking about memory addresses or disk addresses? >>>> >>>>I/O MMU does not see the device commands containing the sector on >>>>the disk to be accessed, Mostly, CPUs write directly to the CRs >>>>of the device to start a command, bypassing I/O MMU as raw data. >>> >>> That is indeed the case. The IOMMU is on the inbound path >>> from the PCIe controller to the internal bus/mesh structure. >>> >>> Note that there is a translation on the outbound path from >>> the host address space to the PCIe memory space - this is >>> often 1:1, but need not be so. This translation happens >>> in the PCIe controller when creating the a TLP that contains >>> an address before sending the TLP to the endpoint. Take >> >>Is there any reason this cannot happen in the core MMU ?? > > How do you map the translation table to the device? device is configured by setting BAR[s] to an addressable page. Accesses to this page are performed by the device consisting of Rd and Wt to control registers. Physical addresses matching BAR aperture are routed to device. HyperVisor maintains a PTE to map guest physical addresses within an aperture to the page matching the device's BAR. Thus, HV MMU maps guest OS physical address into universal MMI/O address. A long time before accessing the device, HyperVisor sets up a device control block and places it in a table indexed by segment:bus;device and stores table address in a control register of the I/O MMU {HostBridge}. This control block contains several context pointers an interrupt table pointer and four event coordinators--one for DMA, page faults, errors, and interrupts. The EC provides an index into the root pointers. Guest OS uses the virtual device address in code, Guest OS MMU maps it to the aperture maintained by HyperVisor. HV then maps GPA to MMI/O:device_address. Using said trans- lations, Guest OS writes commands to the function:register of the addressed device. The path from core virtual address to device control register address does not pass through the I/O MMU. When device responds with DMA request it uses a device virtual address (not a virtual device address), said request is routed to the top of PCIe tree, where I/O MMU uses ECAM to identify the MMU tables for this device, once identified, translates* the device virtual address into a universal address (almost invariably targeting DRAM) Once translated and checked, the command is allowed to proceed. (*) assuming ATS was not used. When device responds with Interrupt request, I/O MMU uses ECAM (again) to find the associated interrupt table, and then translates the device interrupt address in to a universal MMI/O write to the attached interrupt table. Said universal MMI/O write knocks on the door of interrupt table service port, where the interrupt message is logged into the table. And when the priority of the table increases the service port broadcasts the new priority vector of this table to all cores. Should a core monitoring this table see a higher priority interrupt pending than it is currently running, the core begins interrupt negotiation. When a device responds with a page fault, the device control block identifies the level of the software stack to handle this exception, and the I?O MMU sends a suitable interrupt to that level of the interrupt table. When a device responds with a device error, the device control block identifies the level and ISR to deal with this device problem, and the I/O MMU sends a suitable interrupt to that level of the interrupt table. So, the I/O MMU responds and guides all requests coming up the PCIe tree--not just DMA. -------------------------------------------------------- > How do you map the translation table to the device? HostBridge has a configuration register that points at the I/O MMU ROOT table, which is used to map segment: bus;device to Originating context. Originating Context contains a snapshot of the software stack managing the application. This is where the ROOT pointers, ASIDs, priorities, and levels are stored. And, in addition, there is an interrupt table pointer virtual address, ... A tree is used to map ECAM to device control block, and other than not starting at a page boundary, and not ending on a page boundary, it is essentially identical to the std page mapping tree. The final level of said tree points at the device control block--a cache line of data where the I/O MMU gets the data it needs for that particular device. > Why > would you wish to have the CPU translating I/O virtual > addresses? This is pure mischaracterization on you part. You always want the MMU closest to the access to perform the trans- lation. I suspect you read virtual device address and device virtual address interchangeably--they are entirely different things used in different places. > The IOMMU tables are per device, and they > can be configured to map the minimum amount of the address > space (even updated per-I/O if desired) required to support > the completion of an inbound DMA from the device. This still leaves the door open for a parity error to allow one application DMA to damage another application process memory, since commands to a single device share a translation table and both translations are valid at the same instant. One can essentially eliminate this with dead pages between different application mappings-- preventing DMA from walking into a wrong VAS. >> >>Guest OS uses a virtual device address given to it from HV. >>HV sets up the 2nd nesting of translation to translate this >>to "what HostBridge needs" to route commands to device control >>registers. The handoff can be done by spoofing config space >>of having HV simply hand Guest OS a list of devices it can >>discover/configure/use. > > The IOMMU only is involved in DMA transactions _initiated_ by > the device, not by the CPUs. They're two completely different > concepts. If the I/O MMU does not participate in interrupts, page faults, and errors, who does ?? The requests coming up from the device are still virtual and need mapping and routing. >> >>> an AHCI controller, for example, where the only device >>> BAR is 32-bits; if a host wants to map the AHCI controller >>> at a 64-bit address, the controller needs to map that 64-bit >>> address window into a 32-bit 3DW TLP to be sent to the endpoint >>> function. >> >>This is one of the reasons My 66000 architecture has a unique >>MMI/O address space--you can setup a 32-bit BAR to put a >>page of control registers in 32-bit address space without ========== REMAINDER OF ARTICLE TRUNCATED ==========