Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!newsfeed.bofh.team!paganini.bofh.team!not-for-mail
From: antispam@fricas.org (Waldek Hebisch)
Newsgroups: comp.arch
Subject: Re: Cost of handling misaligned access
Date: Wed, 5 Feb 2025 23:36:53 -0000 (UTC)
Organization: To protect and to server
Message-ID: <vo0smj$1okml$1@paganini.bofh.team>
References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me> <2025Feb3.075550@mips.complang.tuwien.ac.at> <wi7oP.2208275$FOb4.591154@fx15.iad>
Injection-Date: Wed, 5 Feb 2025 23:36:53 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="1856213"; posting-host="WwiNTD3IIceGeoS5hCc4+A.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.0-9-amd64 (x86_64))
X-Notice: Filtered by postfilter v. 0.9.3
Bytes: 2662
Lines: 36

EricP <ThatWouldBeTelling@thevillage.com> wrote:
> 
> While the Linux kernel may not use many misaligned values,
> I'd guess there is a lot of application code that does.

I guess that much of that is simply "by accident" because
without alignment checks in hadware misalignemnt may happen
and nobody notices that there is small performance problem.

I worked on a low level program and reasonably recent I did get
bunch of alignment errors.  On AMD64 they were due to SSE
instructions used by 'memcpy', on 32-bit ARM due to use of double
precision floating point in 'memcpy'.  It took some time to find
them, simply most things worked even without alignment and the
offending cases were hard to trigger.

My personal feeling is that best machine would have aligned
access with checks by default, but also special instructions
for unaligned access.  That way code that does not need
unaligned access gets extra error checking, while code that
uses unaligned access pays modest, essentially unavoidable
penalty.

Of course, once architecture officially supports unaligned
access, there will be binaries depending on this and backward
compatibility will prevent change to require alignment.

Concerning SIMD: trouble here is increasing vector length and
consequently increasing alignment requirements.  A lot of SIMD
code is memory-bound and current way of doing misaligned
access leads to worse performance.  So really no good way
to solve this.  In principle set of buffers for 2 cache lines
each and appropriate shifters could give optimal troughput,
but probably would lead to increased latency.

-- 
                              Waldek Hebisch