Deutsch English Français Italiano |
<2025Feb4.191631@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Cost of handling misaligned access Date: Tue, 04 Feb 2025 18:16:31 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 95 Message-ID: <2025Feb4.191631@mips.complang.tuwien.ac.at> References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me> <2025Feb3.075550@mips.complang.tuwien.ac.at> <wi7oP.2208275$FOb4.591154@fx15.iad> Injection-Date: Tue, 04 Feb 2025 19:52:49 +0100 (CET) Injection-Info: dont-email.me; posting-host="e8dfb2a2d69ab1b641717bf652b25dd8"; logging-data="2084307"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18jB94l6TuZ4GlnSKvbIhN6" Cancel-Lock: sha1:4gmSWaW3ZfWN4fx2m/6qCPaQB3A= X-newsreader: xrn 10.11 Bytes: 5496 EricP <ThatWouldBeTelling@thevillage.com> writes: >Anton Ertl wrote: >> There are lots of potentially unaligned loads and stores. There are >> very few actually unaligned loads and stores: On Linux-Alpha every >> unaligned access is logged by default, and the number of >> unaligned-access entries in the logs of our machines was relatively >> small (on average a few per day). So trapping actual unaligned >> accesses was faster than replacing potential unaligned accesses with >> code sequences that synthesize the unaligned access from aligned >> accesses. >> >> Of course, if the cost of unaligned accesses is that high, you will >> avoid them in cases like block copies where cheap unaligned accesses >> would otherwise be beneficial. >> >> - anton > >That is fine for code that is being actively maintained and backward >data structure compatibility is not required (like those inside a kernel). That is the experience on Linux-Alpha, which ran user-level code which had, for the most part, already been ported to, e.g., SPARC with trapping on actual unaligned access. These days, with basically all available hardware of the last decade supporting unaligned accesses, the experience might be different. >However for x86 there was a few billion lines of legacy code that likely >assumed 2-byte alignment, or followed the fp64 aligned to 32-bits advice, That's not advice, that's the Intel IA-32 ABI. If you lay out your structures differently, they will not work with the libraries. >and a C language that mandates structs be laid out in memory exactly as >specified (no automatic struct optimization). The C language mandates that the order of the fields is as specified, and that the same sequence of field types leads to the same layout, but otherwise does not mandate a layout. In particular, competent ABIs (i.e., not Intel's IA-32 ABI) mandate layouts that result in natural alignment of basic types. >Also I seem to recall some >amount of squawking about SIMD when it required naturally aligned buffers. SSE does not require natural alignment wrt. basic types, but the load-and-op instructions require 16-byte alignment. That's another idiocy on Intel's part. If you have for (i=0; i<n; i++) a[i] = b[i] + c[i]; that's easy to vectorize if you have support for basic-type-aligned or unaligned accesses. But a, b, and c may all have different start addresses mod 16, so you cannot use Intel's 16-byte-aligned memory accesses for vectorizing that. Fortunately, they were not completely stupid and included unaligned-load and unaligned-store instructions, so if you use those, and forget about the load-and-operate instructions, SSE is useable. AMD has added a flag that turns off this Intel stupidity (if the flag is set, all SSE memory accesses support unaligned accesses), but Intel is stubborn and does not support this flag to this day; and they are the manufacturer that sells CPUs without AVX/AVX2 to this day (unlike AMD, which has supported AVX2 on all CPUs they sell for a long time). >As SIMD no longer requires alignment, presumably code no longer does so. Yes, if you use AVX/AVX2, you don't encounter this particular Intel stupidity. >Also in going from 32 to 64 bits, data structures that contain pointers >now could find those 8-byte pointers aligned on 4-byte boundaries. What you write does not make sense. RAM data structures are laid out according to the ABI, which is different for different architectures, and typically requires natural alignment for basic data types; no unaligned accesses from backwards compatibility here. Wire or on-disk data structures are laid out according to the specification of the protocol or file system, which may include basic data types that are not aligned according to natural alignment (e.g., because there is a prefix on the wire); these do not contain pointers, and even if they contain some kind of reference (e.g., block numbers or inode numbers), the sizes are fixed across architectures. >While the Linux kernel may not use many misaligned values, >I'd guess there is a lot of application code that does. The reports about unaligned accesses in the logs were associated with user-level code (I dimly remember gs occuring in the log), not kernel code. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>