Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Jonathan Thornburg Newsgroups: comp.arch Subject: unaligned load/store (was: Re: Keeping other stuff with addresses) Date: 21 Dec 2024 23:22:35 GMT Lines: 56 Message-ID: References: <9534a1cd1364f2127a1951cc85002f29@www.novabbs.org> X-Trace: individual.net qF3M70RCKalo2vd/qq+kHwRJk04fCiFUDP6Ke8UtZyqCI29sX2/5bbXtU1 X-Orig-Path: not-for-mail Cancel-Lock: sha1:rxtLttYWedgaJlrRgIMDd5GgXkY= sha256:l0Gl24MPTaUuJaSkJNyZCndG6FkWDafafqsabW3a9aA= Bytes: 3927 MitchAlsup1 wrote: > FORTRAN COMMON blocks require misaligned accesses to double precision > data. > R E Q U I R E in that it is neither optional nor wise to emulate with > exceptions. It is just barely tolerable using LD/ST Left/Right > instructions > out of the compiler. > > I, personally, went through enough PAIN with misalignment, that over > time my mood swung from "aligned only" to "completely misaligned":: > a) because there is no performant* SW workaround > b) it is SO easy to fix in HW. > c) once fixed in HW, any SW burden is so small as to be barely > ..measurable. I'm not so sure (b) is true. Some cases are moderately easy to handle in hardware (e.g., misaligned loads that stay within a single L1 D-cache line), but some cases are harder (e.g., misaligned writes that cross L1 D-cache line boundaries) and might need a microcode trap (awkward if the design wasn't otherwise using microcode). And some cases are even harder (e.g., misaligned writes crossing L1 D-cache line boundaries where the two lines are owned by different CPUs in a cache-coherent multiprocessor) and might need a millicode trap. And some cases may require going all the way up to the OS (e.g., misaligned writes that cross virtual-memory-page boundaries where one page is ok but the other is non-resident). So, allowing this in the architecture has several costs: * extra hardware implementation effort to make sure the "hardware" cases don't cost an extra gate delay or two on some critical path * extra complexity and debugging time in hardware and in system software (think about writing and *debugging* and *verifying* microcode/millicode trap handlers for all those messy write-crossing-cache/page-boundary cases, especially their interactions with multiprocessor cache coherency) * this extra effort means a longer design time and/or greater design cost, and hence (so long as the state-of-the-art of competing systems is still steadily improving with time) that means a net lower price/performance relative to competing systems And, because of the traps and their overheads (which will likely differ significantly across different implementations of the same architecture, e.g., different multiprocessor cache-coherency protocols), any code that actually *uses* unaligned accesses -- especially unaligned writes -- isn't performance-portable unless the actual dynamic frequency of unaligned operations is very low. So yes, allowing unaligned access does help "dusty deck" Fortran code... but it comes at a significant cost. -- -- "Jonathan Thornburg [remove -color to reply]" on the west coast of Canada "the stock market can remain irrational a lot longer than you can remain solvent" or (probably the correct original wording) "markets can remain irrational a lot longer than you and I can remain solvent" -- A. Gary Shilling (often misattributed to John Maynard Keynes)