Deutsch   English   Français   Italiano  
<vovrud$18bcu$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: Cost of handling misaligned access
Date: Mon, 17 Feb 2025 18:34:03 +0100
Organization: A noiseless patient Spider
Lines: 87
Message-ID: <vovrud$18bcu$1@dont-email.me>
References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me>
 <2025Feb3.075550@mips.complang.tuwien.ac.at> <volg1m$31ca1$1@dont-email.me>
 <vov01l$1398i$1@dont-email.me>
 <d8176619cbf9e66afa8911c67425c029@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Mon, 17 Feb 2025 18:34:06 +0100 (CET)
Injection-Info: dont-email.me; posting-host="1c108bac77b9627d5953229442e9448a";
	logging-data="1322398"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+NjDywabg1IzXYyQzP6e9moSTtN6idbqaXQBnWlgz5VQ=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
 Firefox/128.0 SeaMonkey/2.53.20
Cancel-Lock: sha1:uyge+5yr9zfKM4A+pmrX54tAbWY=
In-Reply-To: <d8176619cbf9e66afa8911c67425c029@www.novabbs.org>
Bytes: 4650

MitchAlsup1 wrote:
> On Mon, 17 Feb 2025 9:37:57 +0000, Terje Mathisen wrote:
>=20
>> Marcus wrote:
>>> On 2025-02-03, Anton Ertl wrote:
>>>> BGB <cr88192@gmail.com> writes:
>>>>> On 2/2/2025 10:45 AM, EricP wrote:
>>>>>> Digging deeper with performance counters reveals executing each
>>>>>> unaligned
>>>>>> load instruction results in ~505 executed instructions. P550 almos=
t
>>>>>> certainly doesn=C3=83=C2=A2=C3=A2=E2=80=9A=C2=AC=C3=A2=E2=80=9E=C2=
=A2t have hardware support for unaligned=20
>>>>>> accesses.
>>>>>> Rather, it=C3=83=C2=A2=C3=A2=E2=80=9A=C2=AC=C3=A2=E2=80=9E=C2=A2s =
likely raising a fault and letting an=20
>>>>>> operating system
>>>>>> handler emulate it in software."
>>>>>>
>>>>>
>>>>> An emulation fault, or something similarly nasty...
>>>>>
>>>>>
>>>>> At that point, even turning any potentially unaligned load or store=
=20
>>>>> into
>>>>> a runtime call is likely to be a lot cheaper.
>>>>
>>>> There are lots of potentially unaligned loads and stores.=C3=82=C2=A0=
 There are
>>>> very few actually unaligned loads and stores: On Linux-Alpha every
>>>> unaligned access is logged by default, and the number of
>>>> unaligned-access entries in the logs of our machines was relatively
>>>> small (on average a few per day).=C3=82=C2=A0 So trapping actual una=
ligned
>>>> accesses was faster than replacing potential unaligned accesses with=

>>>> code sequences that synthesize the unaligned access from aligned
>>>> accesses.
>>>
>>> If you compile regular C/C++ code that does not intentionally do any
>>> nasty stuff, you will typically have zero unaligned loads stores.
>>>
>>> My machine still does not support unaligned accesses in hardware (it'=
s
>>> on the todo list), and it can run an awful lot of software without
>>> problems.
>>>
>>> The problem arises when the programmer *deliberately* does unaligned
>>> loads and stores in order to improve performance. Or rather, if the
>>> programmer knows that the hardware supports unaligned loads and store=
s,
>>> he/she can use that to write faster code in some special cases.
>>
>> No, the real problem is when a compiler want to auto-vectorize any cod=
e
>> working with 1/2/4/8 byte items: All of a sudden the alignment
>> requirement went from the item stride to the vector register stride
>> (16/32/64 bytes).
>=20
> If you provide misaligned access to SIMD registers, why not provide
> misaligned access to all memory references !?!
>=20
> I made this argument several times in my career.
>=20
>> The only way this can work is to have the compiler control _all_
>> allocations to make sure they are properly aligned, including code in
>> libraries, or the compiler will be forced to use vector load/store
>> operations which do allow unaligned access.
>=20
> Either the entire environment has to be "air tight" or the HW
> provides misaligned access at low cost. {{Good luck on the air
> tight thing...}}

This is just one of many details where we've agreed for a decade or two=20
(three?). Some of them you persuaded me you were right, I don't remember =

any obvious examples of the opposite, but most we figured out=20
independently. :-)

Terje


--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"