Deutsch English Français Italiano |
<2025Feb3.093413@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Cost of handling misaligned access Date: Mon, 03 Feb 2025 08:34:13 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 94 Message-ID: <2025Feb3.093413@mips.complang.tuwien.ac.at> References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me> <2025Feb3.075550@mips.complang.tuwien.ac.at> <vnptl6$15pgm$1@dont-email.me> Injection-Date: Mon, 03 Feb 2025 10:08:37 +0100 (CET) Injection-Info: dont-email.me; posting-host="108bde89ec56b4d2f11c428d43289119"; logging-data="1258214"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18eDmRbXLMgNL+tyy/bBWXV" Cancel-Lock: sha1:xxPEnSPkUGycXpF0ITPi+xsz/10= X-newsreader: xrn 10.11 Bytes: 4161 BGB <cr88192@gmail.com> writes: >On 2/3/2025 12:55 AM, Anton Ertl wrote: >Rather, have something like an explicit "__unaligned" keyword or >similar, and then use the runtime call for these pointers. There are people who think that it is ok to compile *p to anything if p is not aligned, even on architectures that support unaligned accesses. At least one of those people recommended the use of memcpy(..., ..., sizeof(...)). Let's see what gcc produces on rv64gc (where unaligned accesses are guaranteed to work): [fedora-starfive:/tmp:111378] cat x.c #include <string.h> long uload(long *p) { long x; memcpy(&x,p,sizeof(long)); return x; } [fedora-starfive:/tmp:111379] gcc -O -S x.c [fedora-starfive:/tmp:111380] cat x.s .file "x.c" .option nopic .text .align 1 .globl uload .type uload, @function uload: addi sp,sp,-16 lbu t1,0(a0) lbu a7,1(a0) lbu a6,2(a0) lbu a1,3(a0) lbu a2,4(a0) lbu a3,5(a0) lbu a4,6(a0) lbu a5,7(a0) sb t1,8(sp) sb a7,9(sp) sb a6,10(sp) sb a1,11(sp) sb a2,12(sp) sb a3,13(sp) sb a4,14(sp) sb a5,15(sp) ld a0,8(sp) addi sp,sp,16 jr ra .size uload, .-uload .ident "GCC: (GNU) 10.3.1 20210422 (Red Hat 10.3.1-1)" .section .note.GNU-stack,"",@progbits Oh boy. Godbolt tells me that gcc-14.2.0 still does it the same way, whereas clang 9.0.0 and following produce [fedora-starfive:/tmp:111383] clang -O -S x.c [fedora-starfive:/tmp:111384] cat x.s .text .attribute 4, 16 .attribute 5, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0" .file "x.c" .globl uload # -- Begin function uload .p2align 1 .type uload,@function uload: # @uload .cfi_startproc # %bb.0: ld a0, 0(a0) ret ..Lfunc_end0: .size uload, .Lfunc_end0-uload .cfi_endproc # -- End function .ident "clang version 11.0.0 (Fedora 11.0.0-2.0.riscv64.fc33)" .section ".note.GNU-stack","",@progbits .addrsig If that is frequently used for unaligned p, this will be slow on the U74 and P550. Maybe SiFive should get around to implementing unaligned accesses more efficiently. >Though "memcpy()" is usually a "simple to fix up" scenario. General memcpy where both operands may be unaligned in different ways is not particularly simple. This also shows up in the fact that Intel and AMD have failed to make REP MOVSB faster than software approaches for many cases when I last looked. Supposedly Intel has had another go at it, I should measure it again. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>