Article <lsp0tqFs7aoU1@mid.individual.net>

Deutsch English Français Italiano
<lsp0tqFs7aoU1@mid.individual.net>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Jonathan Thornburg <jonathan@gold.bkis-orchard.net>
Newsgroups: comp.arch
Subject: unaligned load/store (was: Re: Keeping other stuff with addresses)
Date: 21 Dec 2024 23:22:35 GMT
Lines: 56
Message-ID: <lsp0tqFs7aoU1@mid.individual.net>
References: <memo.20241128153105.12904U@jgd.cix.co.uk> <jwvcyi87lva.fsf-monnier+comp.arch@gnu.org> <vini47$sgi$1@gal.iecc.com> <jwvldww6253.fsf-monnier+comp.arch@gnu.org> <vio4ge$1eka$1@gal.iecc.com> <jwvmshc49i0.fsf-monnier+comp.arch@gnu.org> <9534a1cd1364f2127a1951cc85002f29@www.novabbs.org>
X-Trace: individual.net qF3M70RCKalo2vd/qq+kHwRJk04fCiFUDP6Ke8UtZyqCI29sX2/5bbXtU1
X-Orig-Path: not-for-mail
Cancel-Lock: sha1:rxtLttYWedgaJlrRgIMDd5GgXkY= sha256:l0Gl24MPTaUuJaSkJNyZCndG6FkWDafafqsabW3a9aA=
Bytes: 3927

MitchAlsup1 <mitchalsup@aol.com> wrote:
> FORTRAN COMMON blocks require misaligned accesses to double precision
> data.
> R E Q U I R E in that it is neither optional nor wise to emulate with
> exceptions. It is just barely tolerable using LD/ST Left/Right
> instructions
> out of the compiler.
> 
> I, personally, went through enough PAIN with misalignment, that over
> time my mood swung from "aligned only" to "completely misaligned"::
> a) because there is no performant* SW workaround
> b) it is SO easy to fix in HW.
> c) once fixed in HW, any SW burden is so small as to be barely
> ..measurable.

I'm not so sure (b) is true.  Some cases are moderately easy to handle
in hardware (e.g., misaligned loads that stay within a single L1 D-cache
line), but some cases are harder (e.g., misaligned writes that cross L1
D-cache line boundaries) and might need a microcode trap (awkward if the
design wasn't otherwise using microcode).  And some cases are even harder
(e.g., misaligned writes crossing L1 D-cache line boundaries where the
two lines are owned by different CPUs in a cache-coherent multiprocessor)
and might need a millicode trap.  And some cases may require going all the
way up to the OS (e.g., misaligned writes that cross virtual-memory-page
boundaries where one page is ok but the other is non-resident).

So, allowing this in the architecture has several costs:
* extra hardware implementation effort to make sure the "hardware" cases
  don't cost an extra gate delay or two on some critical path
* extra complexity and debugging time in hardware and in system software
  (think about writing and *debugging* and *verifying* microcode/millicode
  trap handlers for all those messy write-crossing-cache/page-boundary
  cases, especially their interactions with multiprocessor cache coherency)
* this extra effort means a longer design time and/or greater design cost,
  and hence (so long as the state-of-the-art of competing systems is still
  steadily improving with time) that means a net lower price/performance
  relative to competing systems

And, because of the traps and their overheads (which will likely differ
significantly across different implementations of the same architecture,
e.g., different multiprocessor cache-coherency protocols), any code that
actually *uses* unaligned accesses -- especially unaligned writes -- isn't
performance-portable unless the actual dynamic frequency of unaligned
operations is very low.

So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.

-- 
-- "Jonathan Thornburg [remove -color to reply]" <jt.bhbkis@gmail-pink.com>
   on the west coast of Canada
   "the stock market can remain irrational a lot longer than you can
   remain solvent" or (probably the correct original wording) "markets
   can remain irrational a lot longer than you and I can remain solvent"
         -- A. Gary Shilling (often misattributed to John Maynard Keynes)