Deutsch   English   Français   Italiano  
<181e05acefef6fd8$206981$891815$802601b3@news.usenetexpress.com>

View for Bookmarking (what is this?)
Look up another Usenet article

From: Farley Flud <fflud@gnu.rocks>
Subject: Re: GNU/Linux Greatness: AVX 512 Assembly
Newsgroups: comp.os.linux.advocacy
References: <181df652f994e0cb$34540$2484$802601b3@news.usenetexpress.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lines: 98
Path: ...!eternal-september.org!feeder3.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!news.usenetexpress.com!not-for-mail
Date: Sat, 25 Jan 2025 19:19:06 +0000
Nntp-Posting-Date: Sat, 25 Jan 2025 19:19:06 +0000
X-Received-Bytes: 4577
Organization: UsenetExpress - www.usenetexpress.com
X-Complaints-To: abuse@usenetexpress.com
Message-Id: <181e05acefef6fd8$206981$891815$802601b3@news.usenetexpress.com>
Bytes: 4989

On Sat, 25 Jan 2025 14:37:47 +0000, Farley Flud wrote:

> 
> Feast thine bloodshot, jaundiced eyeballs on absolutely perfect
> AVX-512 assembly code:
> 

I cannot resist giving the NASM dump of the assembled code
(in PIC form of course).

Feast thine jaundiced eyeballs below.

Note the data in hexadecimal which reads "DEADBEEF..."

That is a common device but I also added "DEAFCAFE..."

These allow me to easily discern where things are.

Also note the "90" at line 36.  NASM pads alignment with byte
"90" which is the NOP instruction.  I should change that padding
to all zeros but here it does no harm.

Assembly language is the ultimate (and only) language.  Anyone who
does not embrace assembly is a phony and a fraud and deserves to
be ostracized, if not worse.

=================================================

     1                                  BITS 64
     2                                  
     3                                  segment .text
     4                                  	global _start
     5                                  
     6                                  _start:
     7 00000000 49B8-                   	mov r8, data_in
     7 00000002 [4000000000000000] 
     8 0000000A 49B9-                   	mov r9, data_out
     8 0000000C [0000000000000000] 
     9 00000014 488B1C25[08000000]      	mov rbx, qword [stride]
    10 0000001C 4831D2                  	xor rdx, rdx
    11 0000001F 488B0425[00000000]      	mov rax, qword [N]
    12 00000027 48F7F3                  	div rbx ; rax = quotient, rdx = remainder
    13                                  load:
    14 0000002A 62D17D486F08            	vmovdqa32 zmm1, zword [r8]
    15 00000030 62D17D487F09            	vmovdqa32 zword [r9], zmm1
    16 00000036 4983C040                	add r8, 64 ; increment data pointers
    17 0000003A 4983C140                	add r9, 64
    18 0000003E 48FFC8                  	dec rax
    19 00000041 75E7                    	jnz load
    20 00000043 4D31DB                  	xor r11, r11 ; load mask, i.e. only rdx to load and process
    21 00000046 49C7C2FFFFFFFF          	mov r10, -1
    22 0000004D 4889D1                  	mov rcx, rdx
    23 00000050 4D0FA5D3                	shld r11, r10, cl  
    24 00000054 C4C1FB92CB              	kmovq k1, r11;
    25 00000059 62D17DC96F08            	vmovdqa32 zmm1{k1}{z}, zword [r8]
    26 0000005F 62D17D487F09            	vmovdqa32 zword [r9], zmm1
    27                                  exit:	
    28 00000065 31FF                    	xor edi,edi
    29 00000067 B83C000000              	mov eax,60
    30 0000006C 0F05                    	syscall
    31                                  
    32                                  segment .data
    33                                  align 64
    34 00000000 2500000000000000        N:			dq 37
    35 00000008 1000000000000000        stride:		dq 16
    36 00000010 90<rep 30h>             align 64
    37 00000040 DEADBEEFDEADBEEFDE-     data_in:	dd 16 dup (0xefbeadde)
    37 00000049 ADBEEFDEADBEEFDEAD-
    37 00000052 BEEFDEADBEEFDEADBE-
    37 0000005B EFDEADBEEFDEADBEEF-
    37 00000064 DEADBEEFDEADBEEFDE-
    37 0000006D ADBEEFDEADBEEFDEAD-
    37 00000076 BEEFDEADBEEFDEADBE-
    37 0000007F EF                 
    38 00000080 DEAFCAFEDEAFCAFEDE-     			dd 16 dup (0xfecaafde)
    38 00000089 AFCAFEDEAFCAFEDEAF-
    38 00000092 CAFEDEAFCAFEDEAFCA-
    38 0000009B FEDEAFCAFEDEAFCAFE-
    38 000000A4 DEAFCAFEDEAFCAFEDE-
    38 000000AD AFCAFEDEAFCAFEDEAF-
    38 000000B6 CAFEDEAFCAFEDEAFCA-
    38 000000BF FE                 
    39 000000C0 DEADBEEFDEADBEEFDE-     			dd 5 dup (0xefbeadde)
    39 000000C9 ADBEEFDEADBEEFDEAD-
    39 000000D2 BEEF               
    40                                  
    41                                  segment .bss
    42                                  alignb 64
    43 00000000 <res 94h>               data_out:	resd 37

=====================================================================





-- 
Gentoo: The Fastest GNU/Linux Hands Down