| Deutsch English Français Italiano |
|
<2e1543197e5189248018eb80e5543331@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Cost of handling misaligned access Date: Mon, 3 Feb 2025 21:57:37 +0000 Organization: Rocksolid Light Message-ID: <2e1543197e5189248018eb80e5543331@www.novabbs.org> References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me> <2025Feb3.075550@mips.complang.tuwien.ac.at> <wi7oP.2208275$FOb4.591154@fx15.iad> <vnr64m$1e7sb$1@dont-email.me> <vnrd49$1f52h$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="2555019"; mail-complaints-to="usenet@i2pn2.org"; posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$Dclve.yz4zkz.JZuIuA2mORqBrky31P3e4/wddn6KrBarSgLgRF0e X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71 X-Spam-Checker-Version: SpamAssassin 4.0.0 Bytes: 3706 Lines: 58 On Mon, 3 Feb 2025 21:40:21 +0000, BGB wrote: > On 2/3/2025 1:41 PM, Thomas Koenig wrote: >> EricP <ThatWouldBeTelling@thevillage.com> schrieb: >> >>> That is fine for code that is being actively maintained and backward >>> data structure compatibility is not required (like those inside a >>> kernel). >>> >>> However for x86 there was a few billion lines of legacy code that likely >>> assumed 2-byte alignment, or followed the fp64 aligned to 32-bits >>> advice, >>> and a C language that mandates structs be laid out in memory exactly as >>> specified (no automatic struct optimization). Also I seem to recall some >>> amount of squawking about SIMD when it required naturally aligned >>> buffers. >>> As SIMD no longer requires alignment, presumably code no longer does so. >> >> Looking at Intel's optimization manual, they state in >> "15.6 DATA ALIGNMENT FOR INTEL® AVX" >> >> "Assembly/Compiler Coding Rule 65. (H impact, M generality) Align >> data to 32-byte boundary when possible. Prefer store alignment >> over load alignment." >> >> and further down, about AVX-512, >> >> "18.23.1 Align Data to 64 Bytes" >> >> "Aligning data to vector length is recommended. For best results, >> when using Intel AVX-512 instructions, align data to 64 bytes. >> >> When doing a 64-byte Intel AVX-512 unaligned load/store, every >> load/store is a cache-line split, since the cache-line is 64 >> bytes. This is double the cache line split rate of Intel AVX2 >> code that uses 32-byte registers. A high cache-line split rate in >> memory-intensive code can cause poor performance." >> >> This sounds reasonable, and good advice if you want to go >> down SIMD lane. >> > > This is, ironically, a place where SIMD via ganged registers has an > advantage over SIMD via large monolithic registers. Iroincally^2 is that vVM allows each implementation to decide on how many and how wide the SIMD register are. LBI/O might have 8 128-bit flip-flops, while GBOoO might have 32 512-bit flip-flops. All running the same binary and all running that same binary as fast as any binary that that machine could run; in addition, HW looks at the loop index and possibly predication) to create masks on the lanes of execution. SW ASCII should describe the calculations to be performed, The compiler should produce a vVM loop for those calculations in loops. Any implementation should run that vVM loop as fast as it can. See, no change to ISA and you still get 98^ of SIMD you know and love and the number and width of the registers is an implementation variable!