Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: BGB Newsgroups: comp.arch Subject: Re: Cost of handling misaligned access Date: Fri, 14 Feb 2025 15:14:11 -0600 Organization: A noiseless patient Spider Lines: 89 Message-ID: References: <5lNnP.1313925$2xE6.991023@fx18.iad> <2025Feb3.075550@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 14 Feb 2025 22:14:21 +0100 (CET) Injection-Info: dont-email.me; posting-host="9f075752c769f3b49544a44a921ac4b4"; logging-data="3836341"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/dQ2BNAr3Zxr3odv70Em25ildbY5v2rrE=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:KMcleDqsR9VFZNZuQUcU5bbnSds= In-Reply-To: Content-Language: en-US Bytes: 4666 On 2/13/2025 1:09 PM, Marcus wrote: > On 2025-02-03, Anton Ertl wrote: >> BGB writes: >>> On 2/2/2025 10:45 AM, EricP wrote: >>>> Digging deeper with performance counters reveals executing each >>>> unaligned >>>> load instruction results in ~505 executed instructions. P550 almost >>>> certainly doesn’t have hardware support for unaligned accesses. >>>> Rather, it’s likely raising a fault and letting an operating system >>>> handler emulate it in software." >>>> >>> >>> An emulation fault, or something similarly nasty... >>> >>> >>> At that point, even turning any potentially unaligned load or store into >>> a runtime call is likely to be a lot cheaper. >> >> There are lots of potentially unaligned loads and stores.  There are >> very few actually unaligned loads and stores: On Linux-Alpha every >> unaligned access is logged by default, and the number of >> unaligned-access entries in the logs of our machines was relatively >> small (on average a few per day).  So trapping actual unaligned >> accesses was faster than replacing potential unaligned accesses with >> code sequences that synthesize the unaligned access from aligned >> accesses. > > If you compile regular C/C++ code that does not intentionally do any > nasty stuff, you will typically have zero unaligned loads stores. > > My machine still does not support unaligned accesses in hardware (it's > on the todo list), and it can run an awful lot of software without > problems. > > The problem arises when the programmer *deliberately* does unaligned > loads and stores in order to improve performance. Or rather, if the > programmer knows that the hardware supports unaligned loads and stores, > he/she can use that to write faster code in some special cases. > Pretty much. This is partly why I am in favor of potentially adding explicit keywords for some of these cases, or to reiterate: __aligned: Inform compiler that a pointer is aligned. May use a faster version if appropriate. If a faster aligned-only variant exists of an instruction. On an otherwise unaligned-safe target. __unaligned: Inform compiler that an access is unaligned. May use a runtime call or similar if necessary, on an aligned-only target. May do nothing on an unaligned-safe target. None: Do whatever is the default. Presumably, assume aligned by default, unless target is known unaligned-safe. And/or, an attribute, which seems to be the new style. __attribute__((unaligned)) //GCC-ism [[unaligned]] //probably if the C standard people did it... Most of the pointers will remain unqualified, but most will not do anything unaligned, so this is fine. For cases where it is needed, a keyword could make sense (probably alongside volatile and the usual mess of per-target ifdefs that usually also needs to exist with this sort of code). Meanwhile, function wrappers with manual byte-shifts or memcpy is a particularly poor solution (depends too much on compiler magic). Would be nice if there was a "commonly accepted" or "standard" option, so that one can just use this and not have a mess of ifdefs (or to "just do it with raw bytes" and accept a potentially significant performance penalty). > >> Of course, if the cost of unaligned accesses is that high, you will >> avoid them in cases like block copies where cheap unaligned accesses >> would otherwise be beneficial. >> >> - anton >