Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connectionsPath: ...!eternal-september.org!feeder3.eternal-september.org!i2pn.org!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Stacks, was Segments
Date: Sat, 8 Feb 2025 22:19:47 +0000
Organization: Rocksolid Light
Message-ID: <4a1eb52bbccf3c20554ac8016bb7f97e@www.novabbs.org>
References: <0IboP.166940$V9s2.82811@fx34.iad> <876e9cf6b21da15dc541756be2e24049@www.novabbs.org> <710d0f743fb3204b909114db4429633d@www.novabbs.org> <7y7pP.2$JzO2.1@fx15.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
logging-data="3333536"; mail-complaints-to="usenet@i2pn2.org";
posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$oD99LlbAFq.gDhfiTeSNHeAEMiKAMaEwgG9fIs/KmmvcNCUAbScyW
Bytes: 11959
Lines: 235
On Fri, 7 Feb 2025 20:32:39 +0000, Scott Lurndal wrote:
> mitchalsup@aol.com (MitchAlsup1) writes:
>>On Fri, 7 Feb 2025 13:57:51 +0000, Scott Lurndal wrote:
>>
>>> mitchalsup@aol.com (MitchAlsup1) writes:
>>>>On Thu, 6 Feb 2025 20:06:31 +0000, Stephen Fuld wrote:
>>>>
>>>>> On 2/6/2025 10:51 AM, EricP wrote:
>>>>>> MitchAlsup1 wrote:
>>>>>>> On Thu, 6 Feb 2025 16:41:45 +0000, EricP wrote:
>>>>>>>-------------------
>>>>>> Not sure how this would work with device IO and DMA.
>>>>>> Say a secure kernel that owns a disk drive with secrets that even the HV
>>>>>> is not authorized to see (so HV operators don't need Top Secret
>>>>>> clearance).
>>>>>> The Hypervisor has to pass to a hardware device DMA access to a memory
>>>>>> frame that it has no access to itself. How does one block the HV from
>>>>>> setting the IOMMU to DMA the device's secrets into its own memory?
>>>>>>
>>>>>> Hmmm... something like: once a secure HV passes a physical frame address
>>>>>> to a secure kernel then it cannot take it back, it can only ask that
>>>>>> kernel for it back. Which means that the HV looses control of any
>>>>>> core or IOMMU PTE's that map that frame until it is handed back.
>>>>>>
>>>>>> That would seem to imply that once an HV gives memory to a secure
>>>>>> guest kernel that it can only page that guest with its permission.
>>>>>> Hmmm...
>>>>>
>>>>> I am a little confused here. When you talk about I0MMU addresses, are
>>>>> you talking about memory addresses or disk addresses?
>>>>
>>>>I/O MMU does not see the device commands containing the sector on
>>>>the disk to be accessed, Mostly, CPUs write directly to the CRs
>>>>of the device to start a command, bypassing I/O MMU as raw data.
>>>
>>> That is indeed the case. The IOMMU is on the inbound path
>>> from the PCIe controller to the internal bus/mesh structure.
>>>
>>> Note that there is a translation on the outbound path from
>>> the host address space to the PCIe memory space - this is
>>> often 1:1, but need not be so. This translation happens
>>> in the PCIe controller when creating the a TLP that contains
>>> an address before sending the TLP to the endpoint. Take
>>
>>Is there any reason this cannot happen in the core MMU ??
>
> How do you map the translation table to the device?
device is configured by setting BAR[s] to an addressable
page. Accesses to this page are performed by the device
consisting of Rd and Wt to control registers. Physical
addresses matching BAR aperture are routed to device.
HyperVisor maintains a PTE to map guest physical addresses
within an aperture to the page matching the device's BAR.
Thus, HV MMU maps guest OS physical address into universal
MMI/O address.
A long time before accessing the device, HyperVisor sets up
a device control block and places it in a table indexed
by segment:bus;device and stores table address in a control
register of the I/O MMU {HostBridge}. This control block
contains several context pointers an interrupt table
pointer and four event coordinators--one for DMA, page
faults, errors, and interrupts. The EC provides an index
into the root pointers.
Guest OS uses the virtual device address in code, Guest OS
MMU maps it to the aperture maintained by HyperVisor. HV
then maps GPA to MMI/O:device_address. Using said trans-
lations, Guest OS writes commands to the function:register
of the addressed device.
The path from core virtual address to device control register
address does not pass through the I/O MMU.
When device responds with DMA request it uses a device virtual
address (not a virtual device address), said request is routed
to the top of PCIe tree, where I/O MMU uses ECAM to identify
the MMU tables for this device, once identified, translates*
the device virtual address into a universal address (almost
invariably targeting DRAM) Once translated and checked, the
command is allowed to proceed. (*) assuming ATS was not used.
When device responds with Interrupt request, I/O MMU uses
ECAM (again) to find the associated interrupt table,
and then translates the device interrupt address in to a
universal MMI/O write to the attached interrupt table.
Said universal MMI/O write knocks on the door of interrupt
table service port, where the interrupt message is logged
into the table. And when the priority of the table increases
the service port broadcasts the new priority vector of this
table to all cores.
Should a core monitoring this table see a higher priority
interrupt pending than it is currently running, the core
begins interrupt negotiation.
When a device responds with a page fault, the device control
block identifies the level of the software stack to handle
this exception, and the I?O MMU sends a suitable interrupt
to that level of the interrupt table.
When a device responds with a device error, the device
control block identifies the level and ISR to deal with
this device problem, and the I/O MMU sends a suitable
interrupt to that level of the interrupt table.
So, the I/O MMU responds and guides all requests coming
up the PCIe tree--not just DMA.
--------------------------------------------------------
> How do you map the translation table to the device?
HostBridge has a configuration register that points at
the I/O MMU ROOT table, which is used to map segment:
bus;device to Originating context. Originating Context
contains a snapshot of the software stack managing the
application. This is where the ROOT pointers, ASIDs,
priorities, and levels are stored. And, in addition,
there is an interrupt table pointer virtual address, ...
A tree is used to map ECAM to device control block, and
other than not starting at a page boundary, and not ending
on a page boundary, it is essentially identical to the std
page mapping tree. The final level of said tree points at
the device control block--a cache line of data where the
I/O MMU gets the data it needs for that particular device.
> Why
> would you wish to have the CPU translating I/O virtual
> addresses?
This is pure mischaracterization on you part. You always
want the MMU closest to the access to perform the trans-
lation. I suspect you read virtual device address and
device virtual address interchangeably--they are entirely
different things used in different places.
> The IOMMU tables are per device, and they
> can be configured to map the minimum amount of the address
> space (even updated per-I/O if desired) required to support
> the completion of an inbound DMA from the device.
This still leaves the door open for a parity error to
allow one application DMA to damage another application
process memory, since commands to a single device share
a translation table and both translations are valid at
the same instant. One can essentially eliminate this
with dead pages between different application mappings--
preventing DMA from walking into a wrong VAS.
>>
>>Guest OS uses a virtual device address given to it from HV.
>>HV sets up the 2nd nesting of translation to translate this
>>to "what HostBridge needs" to route commands to device control
>>registers. The handoff can be done by spoofing config space
>>of having HV simply hand Guest OS a list of devices it can
>>discover/configure/use.
>
> The IOMMU only is involved in DMA transactions _initiated_ by
> the device, not by the CPUs. They're two completely different
> concepts.
If the I/O MMU does not participate in interrupts, page faults,
and errors, who does ?? The requests coming up from the device
are still virtual and need mapping and routing.
>>
>>> an AHCI controller, for example, where the only device
>>> BAR is 32-bits; if a host wants to map the AHCI controller
>>> at a 64-bit address, the controller needs to map that 64-bit
>>> address window into a 32-bit 3DW TLP to be sent to the endpoint
>>> function.
>>
>>This is one of the reasons My 66000 architecture has a unique
>>MMI/O address space--you can setup a 32-bit BAR to put a
>>page of control registers in 32-bit address space without
========== REMAINDER OF ARTICLE TRUNCATED ==========