Deutsch   English   Français   Italiano  
<vo0e0r$2h20b$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: Cost of handling misaligned access
Date: Wed, 5 Feb 2025 20:26:18 +0100
Organization: A noiseless patient Spider
Lines: 80
Message-ID: <vo0e0r$2h20b$1@dont-email.me>
References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me>
 <2025Feb3.075550@mips.complang.tuwien.ac.at>
 <wi7oP.2208275$FOb4.591154@fx15.iad>
 <2025Feb4.191631@mips.complang.tuwien.ac.at> <vo061a$2fiql$1@dont-email.me>
 <2025Feb5.184830@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 05 Feb 2025 20:26:20 +0100 (CET)
Injection-Info: dont-email.me; posting-host="c737d1c68c78a1658e9668338878ae32";
	logging-data="2656267"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19daCm5KjlV15pp8TvFWhqfqCudCpQiqPwS/WfgOoq8Ow=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
 Firefox/128.0 SeaMonkey/2.53.20
Cancel-Lock: sha1:jDczULl0TjdAhTe5daOnJhdORe0=
In-Reply-To: <2025Feb5.184830@mips.complang.tuwien.ac.at>
Bytes: 4149

Anton Ertl wrote:
> Terje Mathisen <terje.mathisen@tmsw.no> writes:
>>          for k in 0..li {
>>              let sum = lock & keylocks[k];
>>              if sum == 0 {
>>                  part1 += 1;
>>              }
>>          }
> 
> Does Rust only have this roundabout way to express this sequentially?
> In Forth I would express that scalarly as
> 
> ( part1 ) li 0 do
>    keylocks i th @ lock and 0= - loop
> 
> ["-" because 0= produces all-bits-set (-1) for true]
> 
> or in C as
> 
> for (k=0; k<li; k++)
>    part1 += (lock & keylocks[k])==0;

I could have written it as
   part1 += ((lock & keylocks[k]) == 0) as u32;

I.e just like C except all casting has to be explicit, and here the 
boolean result of the '==' test needs to be expanded into a u32.

> 
> which I find much easier to follow.  I also expected 0..li to include
> li (based on, I guess, the of .. in Pascal and its descendents), but
> the net tells me that it does not (starting with 0 was the hint that
> made me check my expectations).

:-)

It is similar to "for (k=0;k<li;k++) {}" so exclusive right limit feels 
natural.

> 
>> Telling the rust compiler to target my AVX2-capable laptop CPU (an Intel
>> i7)
> 
> I find it deplorable that even knowledgeable people use marketing
> labels like "i7" which do not tell anything technical (and very little
> non-technical) rather than specifying the full model number (e.g, Core
> i7-1270P) or the design (e.g., Alder Lake).  But in the present case
> "AVX2-capable CPU" is enough information.
> 
>> I got code that simply amazed me: The compiler unrolled the inner
>> loop by 32, ANDing 4 x 8 keys by 8 copies of the current lock into 4 AVX
>> registers (vpand), then comparing with a zeroed register (vpcmpeqd)
>> (generating -1/0 results) before subtracting (vpsubd) those from 4
>> accumulators.
> 
> If you have ever learned about vectorization, it's easy to see that
> the inner loop can be vectorized.  And obviously auto-vectorization
> has worked in this case, not particularly amazing to me.

I have some (30 years?) experience with auto-vectorization, usually I've 
been (very?) disappointed. As I wrote this was the best I have ever 
seen, and the resulting code actually performed extremely close to 
theoretical speed of light, i.e. 3 clock cycles for each 3 avx instruction.

[snip]

> clang is somewhat better:
> 
> For the avx2 case, 70 lines and 250 bytes.
> For the x86-64-v4 case, 111 lines and 435 byes.

Rustc sits on top of the clang infrastucture, even with that 32-way 
unroll it was quite compact. I did not count, but your 70 lines seems to 
be in the ballpark.

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"