Deutsch   English   Français   Italiano  
<vnb1f4$1tgcb$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Physfitfreak <physfitfreak@gmail.com>
Newsgroups: comp.os.linux.advocacy
Subject: Re: Challenge For The "Expert" Tyrone
Date: Tue, 28 Jan 2025 10:43:16 -0600
Organization: individual
Lines: 127
Message-ID: <vnb1f4$1tgcb$1@dont-email.me>
References: <pan$5f575$bd36f4a3$95d6d52e$b096d18c@linux.rocks>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Jan 2025 17:43:17 +0100 (CET)
Injection-Info: dont-email.me; posting-host="3169d1e6b884ff8d692b9d0d24d92663";
	logging-data="2015627"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19fzB6yvRDZ+rXLPlPxmyBqT/2Fh3qPxl8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:e5WpQ7EiMjhhqYrTsq9ocF3jOI0=
Content-Language: en-US, fa-IR
In-Reply-To: <pan$5f575$bd36f4a3$95d6d52e$b096d18c@linux.rocks>
Bytes: 4823

On 1/28/25 10:14 AM, Farley Flud wrote:
> Poor tired, exhausted Tyrone.  He must have spent days of futile
> searching in an attempt to find a copy somewhere of my absolutely
> perfect AVX-512 assembly code.
> 
> (Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!)
> 
> Of course, all of his efforts were in total vain, because no such
> copy exists anywhere, except right here on C.O.L.A.
> 
> Poor tired, exhausted Tyrone (not to mention poor, dumb bastard).
> 
> (Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!)
> 
> Well, I have a challenge for the "expert" Tyrone.
> 
> I have ever so slightly modified my absolutely perfect AVX-512 code
> so that it no longer will execute.  Instead it will crash horribly.
> 
> The ever-so-slightly modified code follows.
> 
> Let's allow the "expert" Tyrone to discover and clearly report
> the fault.
> 
> Anyone want to takes bets?
> 
> Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!
> 
> I recommend that Tyrone invest his extensive and exhaustive search
> time in a search for his own stupidity.
> 
> Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha!
> 
> 
> ============================================
> Begin AVX-512 NASM Assembly (Modified)
> ============================================
> 
> BITS 64
> 
> segment .text
> 	global _start
> 
> _start:
> 	mov r8, data_in
> 	mov r9, data_out
> 	mov rbx, qword [stride]
> 	xor rdx, rdx
> 	mov rax, qword [N]
> 	div rbx 	; rax = quotient, rdx = remainder
> load:
> 	vmovdqa32 zmm1, zword [r8]
> 	vmovdqa32 zword [r9], zmm1
> 	add r8, 64 ; increment data pointers
> 	add r9, 64
> 	dec rax
> 	jnz load
> 	xor r11, r11 	; load mask, i.e. only rdx left over to load
> 	mov r10, -1
> 	mov rcx, rdx
> 	shld r11, r10, cl
> 	kmovq k1, r11;
> 	vmovdqa32 zmm1{k1}{z}, zword [r8]
> 	vmovdqa32 zword [r9], zmm1
> exit:	
> 	xor edi,edi
> 	mov eax,60
> 	syscall
> 
> segment .data
> align 64
> N:		dq 37 	;set length of block and stride
> stride:		dq 16
> data_in:	dd 16 dup (0xefbeadde) ;dummy data
> 		dd 16 dup (0xfecaafde)
> 		dd 5 dup (0xefbeadde)
> 
> segment .bss
> alignb 64
> data_out:	resd 37
> 
> ========================================
> End AVX-512 NASM Assembly (Modified)
> ========================================
> 
> 
> 
> 


No Farley, not after the advent of DeepSeek.

If I, who doesn't know _hoot_ about assembly language can find the 
answer within about 10 seconds of your post, then any "Tyrone" can also.

Output from DeepSeek:


1. Incorrect Handling of Remainder:

     The code calculates the quotient and remainder of N / stride using 
the div instruction. The quotient (rax) is used to determine how many 
full 64-byte blocks to copy, and the remainder (rdx) is used to handle 
the remaining elements.

     However, the code assumes that the remainder (rdx) will always be 
less than or equal to 16 (the stride). If N is not a multiple of stride, 
the remainder could be larger than 16, which would lead to incorrect 
behavior when creating the mask for the final partial load.

2. Mask Calculation:

     The mask calculation for the final partial load is incorrect. The 
shld instruction is used to create a mask, but it doesn't correctly 
handle the case where the remainder is zero. If the remainder is zero, 
the mask should be zero, but the current code will still attempt to load 
and store data, which could lead to accessing out-of-bounds memory.

3. Alignment Issues:

     The data_in and data_out arrays are aligned to 64 bytes, which is 
correct for AVX-512 operations. However, the code does not ensure that 
the final partial load is also aligned, which could lead to performance 
issues or crashes on some architectures.