Path: ...!feeds.phibee-telecom.net!2.eu.feeder.erje.net!feeder.erje.net!newsfeed.bofh.team!paganini.bofh.team!not-for-mail
From: antispam@fricas.org (Waldek Hebisch)
Newsgroups: comp.arch
Subject: Re: Cost of handling misaligned access
Date: Thu, 6 Feb 2025 15:58:07 -0000 (UTC)
Organization: To protect and to server
Message-ID: <vo2m6d$20glj$1@paganini.bofh.team>
References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me> <2025Feb3.075550@mips.complang.tuwien.ac.at> <wi7oP.2208275$FOb4.591154@fx15.iad> <vo0smj$1okml$1@paganini.bofh.team> <2025Feb6.122124@mips.complang.tuwien.ac.at>
Injection-Date: Thu, 6 Feb 2025 15:58:07 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="2114227"; posting-host="WwiNTD3IIceGeoS5hCc4+A.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.0-9-amd64 (x86_64))
X-Notice: Filtered by postfilter v. 0.9.3
Bytes: 3582
Lines: 53

Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> antispam@fricas.org (Waldek Hebisch) writes:
>>Concerning SIMD: trouble here is increasing vector length and
>>consequently increasing alignment requirements.
> 
> That is not a necessary consequence, on the contrary: alignment
> requirements based on SIMD granularity is hardware designer lazyness,
> but means that SIMD cannot be used for many of the applications where
> SIMD without that limitation can be used.
> 
> If you want to have alignment checks, then a SIMD instruction should
> check for element alignment, not for SIMD alignment.
> 
> But the computer architecture trend is clear: General-purpose
> computers do not have alignment restrictions; all that had them have
> been discontinued; the last one that had them was SPARC.

Trend is clear, but there is a question: is it good trend.
You wrot about lazy hardware designers, but there is much
more lazy programmers.  There are situations when unaligned
access is needed, but significant proportion of unaligned
accesses is not needed at all.  At best such unaligned
accesses lead to small performance loss, but they may also
be latent bugs.  There are cases when unaligned accesses
are better than aligned ones, for that architecture
should have apropriate instructions.

>>A lot of SIMD
>>code is memory-bound and current way of doing misaligned
>>access leads to worse performance.  So really no good way
>>to solve this.  In principle set of buffers for 2 cache lines
>>each and appropriate shifters could give optimal troughput,
>>but probably would lead to increased latency.
> 
> AFAIK that's what current microarchitectures do, and in many cases
> with small penalties for unaligned accesses; see
> https://www.complang.tuwien.ac.at/anton/unaligned-stores/

You call doubling store time 'small penalty'.  For me in
performance critical loop 10% matter and it is worth
aligning things to avoid such loss.  And what you present
does not look like what I wrote above: AFAICS what Intel
do is within single cache line and there is penalty when
crossing lines (with 2 cache lines buffers there would be
no penalty for line crossing).

For me much more important are loads.  First, there is more of
them.  Second, stores can be buffered and latency of store itself
is of little importance (latency from store to load matters).
For loads extra things in load path increase latency and that
may limit program speed.

-- 
                              Waldek Hebisch