Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!newsfeed.bofh.team!paganini.bofh.team!not-for-mail From: antispam@fricas.org (Waldek Hebisch) Newsgroups: comp.arch Subject: Re: Cost of handling misaligned access Date: Wed, 5 Feb 2025 23:36:53 -0000 (UTC) Organization: To protect and to server Message-ID: References: <5lNnP.1313925$2xE6.991023@fx18.iad> <2025Feb3.075550@mips.complang.tuwien.ac.at> Injection-Date: Wed, 5 Feb 2025 23:36:53 -0000 (UTC) Injection-Info: paganini.bofh.team; logging-data="1856213"; posting-host="WwiNTD3IIceGeoS5hCc4+A.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A"; User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.0-9-amd64 (x86_64)) X-Notice: Filtered by postfilter v. 0.9.3 Bytes: 2662 Lines: 36 EricP wrote: > > While the Linux kernel may not use many misaligned values, > I'd guess there is a lot of application code that does. I guess that much of that is simply "by accident" because without alignment checks in hadware misalignemnt may happen and nobody notices that there is small performance problem. I worked on a low level program and reasonably recent I did get bunch of alignment errors. On AMD64 they were due to SSE instructions used by 'memcpy', on 32-bit ARM due to use of double precision floating point in 'memcpy'. It took some time to find them, simply most things worked even without alignment and the offending cases were hard to trigger. My personal feeling is that best machine would have aligned access with checks by default, but also special instructions for unaligned access. That way code that does not need unaligned access gets extra error checking, while code that uses unaligned access pays modest, essentially unavoidable penalty. Of course, once architecture officially supports unaligned access, there will be binaries depending on this and backward compatibility will prevent change to require alignment. Concerning SIMD: trouble here is increasing vector length and consequently increasing alignment requirements. A lot of SIMD code is memory-bound and current way of doing misaligned access leads to worse performance. So really no good way to solve this. In principle set of buffers for 2 cache lines each and appropriate shifters could give optimal troughput, but probably would lead to increased latency. -- Waldek Hebisch