Deutsch English Français Italiano |
<vovrud$18bcu$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: Cost of handling misaligned access Date: Mon, 17 Feb 2025 18:34:03 +0100 Organization: A noiseless patient Spider Lines: 87 Message-ID: <vovrud$18bcu$1@dont-email.me> References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me> <2025Feb3.075550@mips.complang.tuwien.ac.at> <volg1m$31ca1$1@dont-email.me> <vov01l$1398i$1@dont-email.me> <d8176619cbf9e66afa8911c67425c029@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Date: Mon, 17 Feb 2025 18:34:06 +0100 (CET) Injection-Info: dont-email.me; posting-host="1c108bac77b9627d5953229442e9448a"; logging-data="1322398"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+NjDywabg1IzXYyQzP6e9moSTtN6idbqaXQBnWlgz5VQ==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20 Cancel-Lock: sha1:uyge+5yr9zfKM4A+pmrX54tAbWY= In-Reply-To: <d8176619cbf9e66afa8911c67425c029@www.novabbs.org> Bytes: 4650 MitchAlsup1 wrote: > On Mon, 17 Feb 2025 9:37:57 +0000, Terje Mathisen wrote: >=20 >> Marcus wrote: >>> On 2025-02-03, Anton Ertl wrote: >>>> BGB <cr88192@gmail.com> writes: >>>>> On 2/2/2025 10:45 AM, EricP wrote: >>>>>> Digging deeper with performance counters reveals executing each >>>>>> unaligned >>>>>> load instruction results in ~505 executed instructions. P550 almos= t >>>>>> certainly doesn=C3=83=C2=A2=C3=A2=E2=80=9A=C2=AC=C3=A2=E2=80=9E=C2= =A2t have hardware support for unaligned=20 >>>>>> accesses. >>>>>> Rather, it=C3=83=C2=A2=C3=A2=E2=80=9A=C2=AC=C3=A2=E2=80=9E=C2=A2s = likely raising a fault and letting an=20 >>>>>> operating system >>>>>> handler emulate it in software." >>>>>> >>>>> >>>>> An emulation fault, or something similarly nasty... >>>>> >>>>> >>>>> At that point, even turning any potentially unaligned load or store= =20 >>>>> into >>>>> a runtime call is likely to be a lot cheaper. >>>> >>>> There are lots of potentially unaligned loads and stores.=C3=82=C2=A0= There are >>>> very few actually unaligned loads and stores: On Linux-Alpha every >>>> unaligned access is logged by default, and the number of >>>> unaligned-access entries in the logs of our machines was relatively >>>> small (on average a few per day).=C3=82=C2=A0 So trapping actual una= ligned >>>> accesses was faster than replacing potential unaligned accesses with= >>>> code sequences that synthesize the unaligned access from aligned >>>> accesses. >>> >>> If you compile regular C/C++ code that does not intentionally do any >>> nasty stuff, you will typically have zero unaligned loads stores. >>> >>> My machine still does not support unaligned accesses in hardware (it'= s >>> on the todo list), and it can run an awful lot of software without >>> problems. >>> >>> The problem arises when the programmer *deliberately* does unaligned >>> loads and stores in order to improve performance. Or rather, if the >>> programmer knows that the hardware supports unaligned loads and store= s, >>> he/she can use that to write faster code in some special cases. >> >> No, the real problem is when a compiler want to auto-vectorize any cod= e >> working with 1/2/4/8 byte items: All of a sudden the alignment >> requirement went from the item stride to the vector register stride >> (16/32/64 bytes). >=20 > If you provide misaligned access to SIMD registers, why not provide > misaligned access to all memory references !?! >=20 > I made this argument several times in my career. >=20 >> The only way this can work is to have the compiler control _all_ >> allocations to make sure they are properly aligned, including code in >> libraries, or the compiler will be forced to use vector load/store >> operations which do allow unaligned access. >=20 > Either the entire environment has to be "air tight" or the HW > provides misaligned access at low cost. {{Good luck on the air > tight thing...}} This is just one of many details where we've agreed for a decade or two=20 (three?). Some of them you persuaded me you were right, I don't remember = any obvious examples of the opposite, but most we figured out=20 independently. :-) Terje --=20 - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"