Article <2e1543197e5189248018eb80e5543331@www.novabbs.org>

Deutsch English Français Italiano
<2e1543197e5189248018eb80e5543331@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Cost of handling misaligned access
Date: Mon, 3 Feb 2025 21:57:37 +0000
Organization: Rocksolid Light
Message-ID: <2e1543197e5189248018eb80e5543331@www.novabbs.org>
References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me> <2025Feb3.075550@mips.complang.tuwien.ac.at> <wi7oP.2208275$FOb4.591154@fx15.iad> <vnr64m$1e7sb$1@dont-email.me> <vnrd49$1f52h$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2555019"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$Dclve.yz4zkz.JZuIuA2mORqBrky31P3e4/wddn6KrBarSgLgRF0e
X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 3706
Lines: 58

On Mon, 3 Feb 2025 21:40:21 +0000, BGB wrote:

> On 2/3/2025 1:41 PM, Thomas Koenig wrote:
>> EricP <ThatWouldBeTelling@thevillage.com> schrieb:
>>
>>> That is fine for code that is being actively maintained and backward
>>> data structure compatibility is not required (like those inside a
>>> kernel).
>>>
>>> However for x86 there was a few billion lines of legacy code that likely
>>> assumed 2-byte alignment, or followed the fp64 aligned to 32-bits
>>> advice,
>>> and a C language that mandates structs be laid out in memory exactly as
>>> specified (no automatic struct optimization). Also I seem to recall some
>>> amount of squawking about SIMD when it required naturally aligned
>>> buffers.
>>> As SIMD no longer requires alignment, presumably code no longer does so.
>>
>> Looking at Intel's optimization manual, they state in
>> "15.6 DATA ALIGNMENT FOR INTEL® AVX"
>>
>> "Assembly/Compiler Coding Rule 65. (H impact, M generality) Align
>> data to 32-byte boundary when possible. Prefer store alignment
>> over load alignment."
>>
>> and further down, about AVX-512,
>>
>> "18.23.1 Align Data to 64 Bytes"
>>
>> "Aligning data to vector length is recommended. For best results,
>> when using Intel AVX-512 instructions, align data to 64 bytes.
>>
>> When doing a 64-byte Intel AVX-512 unaligned load/store, every
>> load/store is a cache-line split, since the cache-line is 64
>> bytes. This is double the cache line split rate of Intel AVX2
>> code that uses 32-byte registers. A high cache-line split rate in
>> memory-intensive code can cause poor performance."
>>
>> This sounds reasonable, and good advice if you want to go
>> down SIMD lane.
>>
>
> This is, ironically, a place where SIMD via ganged registers has an
> advantage over SIMD via large monolithic registers.

Iroincally^2 is that vVM allows each implementation to decide on how
many and how wide the SIMD register are. LBI/O might have 8 128-bit
flip-flops, while GBOoO might have 32 512-bit flip-flops. All running
the same binary and all running that same binary as fast as any binary
that that machine could run; in addition, HW looks at the loop index
and possibly predication) to create masks on the lanes of execution.

SW ASCII should describe the calculations to be performed,
The compiler should produce a vVM loop for those calculations
in loops.
Any implementation should run that vVM loop as fast as it can.

See, no change to ISA and you still get 98^ of SIMD you know and love
and the number and width of the registers is an implementation variable!