Deutsch   English   Français   Italiano  
<2025Feb2.184458@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Cost of handling misaligned access
Date: Sun, 02 Feb 2025 17:44:58 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 50
Message-ID: <2025Feb2.184458@mips.complang.tuwien.ac.at>
References: <5lNnP.1313925$2xE6.991023@fx18.iad>
Injection-Date: Sun, 02 Feb 2025 19:08:50 +0100 (CET)
Injection-Info: dont-email.me; posting-host="f0d8ba63a863aedce302b0b45b92c591";
	logging-data="829314"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+6x+zNSQeBw3bPJXDmJExi"
Cancel-Lock: sha1:/EBrxWPxdXPAeutKSQlJ9hZz3qs=
X-newsreader: xrn 10.11
Bytes: 3234

EricP <ThatWouldBeTelling@thevillage.com> writes:
>The incremental cost is in a sequencer in the AGU for handling cache
>line and possibly virtual page straddles, and a small byte shifter to
>left shift the high order bytes. The AGU sequencer needs to know if the
>line straddles a page boundary, if not then increment the 6-bit physical
>line number within the 4 kB physical frame number, if yes then increment
>virtual page number and TLB lookup again and access the first line.
>(Slightly more if multiple page sizes are supported, but same idea.)
>For a load AGU merges the low and high fragments and forwards.
....
>The hardware cost appears trivial, especially within an OoO core.
>So there doesn't appear to be any reason to not handle this.
>Am I missing something?

The OS must also be able to keep both pages in physical memory until
the access is complete, or there will be no progress.  Should not be a
problem these days, but the 48 pages or so potentially needed by VAX
complicated the OS.

Yes, hardware is not hard, there is software that benefits, and as a
result, modern architectures (including RISC-V) now support unaligned
accesses (except for atomic accesses).

>https://old.chipsandcheese.com/2025/01/26/inside-sifives-p550-microarchitecture/
....
>This terrible unaligned access behavior is atypical even for low power
>cores. Arm's Cortex A75 only takes 15 cycles in the worst case of
>dependent accesses that are both misaligned.
>
>Digging deeper with performance counters reveals executing each unaligned
>load instruction results in ~505 executed instructions.

This is similar to what I measured on an U74 core from SiFive
<2024May14.073553@mips.complang.tuwien.ac.at>, so they probably use
the same solution.

>P550 almost
>certainly doesn’t have hardware support for unaligned accesses.
>Rather, it’s likely raising a fault and letting an operating system
>handler emulate it in software."

The architecture guarantees that unaligned accesses work, so the OS
might not have support for such emulation.  Another option would be to
trap into some kind of firmware-supplied fixup code, along the lines
of Alpha's PALcode.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>