Deutsch English Français Italiano |
<181e05acefef6fd8$206981$891815$802601b3@news.usenetexpress.com> View for Bookmarking (what is this?) Look up another Usenet article |
From: Farley Flud <fflud@gnu.rocks> Subject: Re: GNU/Linux Greatness: AVX 512 Assembly Newsgroups: comp.os.linux.advocacy References: <181df652f994e0cb$34540$2484$802601b3@news.usenetexpress.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Lines: 98 Path: ...!eternal-september.org!feeder3.eternal-september.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!news.usenetexpress.com!not-for-mail Date: Sat, 25 Jan 2025 19:19:06 +0000 Nntp-Posting-Date: Sat, 25 Jan 2025 19:19:06 +0000 X-Received-Bytes: 4577 Organization: UsenetExpress - www.usenetexpress.com X-Complaints-To: abuse@usenetexpress.com Message-Id: <181e05acefef6fd8$206981$891815$802601b3@news.usenetexpress.com> Bytes: 4989 On Sat, 25 Jan 2025 14:37:47 +0000, Farley Flud wrote: > > Feast thine bloodshot, jaundiced eyeballs on absolutely perfect > AVX-512 assembly code: > I cannot resist giving the NASM dump of the assembled code (in PIC form of course). Feast thine jaundiced eyeballs below. Note the data in hexadecimal which reads "DEADBEEF..." That is a common device but I also added "DEAFCAFE..." These allow me to easily discern where things are. Also note the "90" at line 36. NASM pads alignment with byte "90" which is the NOP instruction. I should change that padding to all zeros but here it does no harm. Assembly language is the ultimate (and only) language. Anyone who does not embrace assembly is a phony and a fraud and deserves to be ostracized, if not worse. ================================================= 1 BITS 64 2 3 segment .text 4 global _start 5 6 _start: 7 00000000 49B8- mov r8, data_in 7 00000002 [4000000000000000] 8 0000000A 49B9- mov r9, data_out 8 0000000C [0000000000000000] 9 00000014 488B1C25[08000000] mov rbx, qword [stride] 10 0000001C 4831D2 xor rdx, rdx 11 0000001F 488B0425[00000000] mov rax, qword [N] 12 00000027 48F7F3 div rbx ; rax = quotient, rdx = remainder 13 load: 14 0000002A 62D17D486F08 vmovdqa32 zmm1, zword [r8] 15 00000030 62D17D487F09 vmovdqa32 zword [r9], zmm1 16 00000036 4983C040 add r8, 64 ; increment data pointers 17 0000003A 4983C140 add r9, 64 18 0000003E 48FFC8 dec rax 19 00000041 75E7 jnz load 20 00000043 4D31DB xor r11, r11 ; load mask, i.e. only rdx to load and process 21 00000046 49C7C2FFFFFFFF mov r10, -1 22 0000004D 4889D1 mov rcx, rdx 23 00000050 4D0FA5D3 shld r11, r10, cl 24 00000054 C4C1FB92CB kmovq k1, r11; 25 00000059 62D17DC96F08 vmovdqa32 zmm1{k1}{z}, zword [r8] 26 0000005F 62D17D487F09 vmovdqa32 zword [r9], zmm1 27 exit: 28 00000065 31FF xor edi,edi 29 00000067 B83C000000 mov eax,60 30 0000006C 0F05 syscall 31 32 segment .data 33 align 64 34 00000000 2500000000000000 N: dq 37 35 00000008 1000000000000000 stride: dq 16 36 00000010 90<rep 30h> align 64 37 00000040 DEADBEEFDEADBEEFDE- data_in: dd 16 dup (0xefbeadde) 37 00000049 ADBEEFDEADBEEFDEAD- 37 00000052 BEEFDEADBEEFDEADBE- 37 0000005B EFDEADBEEFDEADBEEF- 37 00000064 DEADBEEFDEADBEEFDE- 37 0000006D ADBEEFDEADBEEFDEAD- 37 00000076 BEEFDEADBEEFDEADBE- 37 0000007F EF 38 00000080 DEAFCAFEDEAFCAFEDE- dd 16 dup (0xfecaafde) 38 00000089 AFCAFEDEAFCAFEDEAF- 38 00000092 CAFEDEAFCAFEDEAFCA- 38 0000009B FEDEAFCAFEDEAFCAFE- 38 000000A4 DEAFCAFEDEAFCAFEDE- 38 000000AD AFCAFEDEAFCAFEDEAF- 38 000000B6 CAFEDEAFCAFEDEAFCA- 38 000000BF FE 39 000000C0 DEADBEEFDEADBEEFDE- dd 5 dup (0xefbeadde) 39 000000C9 ADBEEFDEADBEEFDEAD- 39 000000D2 BEEF 40 41 segment .bss 42 alignb 64 43 00000000 <res 94h> data_out: resd 37 ===================================================================== -- Gentoo: The Fastest GNU/Linux Hands Down