Path: ...!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: The freedom that give us C++ and C Date: Sun, 17 Jan 2016 16:48:21 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 152 Message-ID: <2016Jan17.174821@mips.complang.tuwien.ac.at> References: <2016Jan7.170220@mips.complang.tuwien.ac.at> <77bce982-05c1-46ce-97f4-a30e6310ff58@googlegroups.com> <2016Jan14.151802@mips.complang.tuwien.ac.at> <917536f1-ce38-4fe6-a393-30fe79d2ee9b@googlegroups.com> <2016Jan16.191355@mips.complang.tuwien.ac.at> <8b05fc5a-0356-4c4c-89d3-0980be96614f@googlegroups.com> Injection-Info: mx02.eternal-september.org; posting-host="d47d3421039fe8026514328ad0ebacae"; logging-data="7227"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Xg3pdQgXAFKK6V8kwBz0a" X-newsreader: xrn 10.00-beta-3 Cancel-Lock: sha1:nwJe3hnK5jtCs6YlSLSL1v0r1rE= Bytes: 6309 already5chosen@yahoo.com writes: >On Saturday, January 16, 2016 at 9:37:01 PM UTC+2, Anton Ertl wrote: >> As a >> result, the following check protecting against a buffer overflow >> attack in the Linux kernel would be "optimized" away if Linux did not >> disable this "optimization" (taken from >> ): >> >> int vsnprintf(char *buf, size_t size, ...) >> { >> char *end; >> /* Reject out-of-range values early. >> Large positive sizes are used for >> unknown buffer sizes. */ >> if (WARN_ON_ONCE((int) size < 0)) >> return 0; >> end = buf + size; >> /* Make sure end is always >= buf */ >> if (end < buf) { ... } >> ... >> } >> Back to your idea of writing clear code, and letting the optimizer >> make it fast. The problem with "optimizations" is that it can also >> compile clear code differently than intended; do you consider the >> check in vsnprintf() above unclear? > >Yes, I consider it less clear than >if (WARN_ON_ONCE(size >= INT_MAX)) > return 0; Ok, that's almost the same as the "(int) size < 0" check (it gives a different result on INT_MAX and probably compiles to bigger code), but the check that would be "optimized" away is the "end < buf" check. >> Now the new-gcc way to do >> it is to write memcpy(), so instead of >> >> *p = ~*q; >> >> you write >> >> long x; >> memcpy(&x,q, sizeof(long)); >> x = ~x; >> memcpy(p,&x, sizeof(long)); >> >> which may be the clearest way to you (to me it's cluttered), and has >> the side "benefit" of making all C compilers that don't recognize this >> gcc idiom look really bad by providing a huge performance penalty. >> > >I accept the first half of your argument, but not the second. It's not "gcc idiom", it's C idiom. Compilers that don't recognize this idiom in 2016 (or 2006) not "look bad", they "are bad". I played around with this a bit to see how various gcc versions fare with this kind of code. The input is: #include void foo(char *p, long l) { memcpy(p,&l,sizeof(long)); } First, AMD64: Interestingly, that seems to depend more on the headers than on the compiler: On Debian 4.0 all gcc versions available (from gcc-3.3 to gcc-4.1) produce 0000000000000000 : 0: 48 89 74 24 f8 mov %rsi,0xfffffffffffffff8(%rsp) 5: 48 89 37 mov %rsi,(%rdi) 8: c3 retq while on Debian 5.0 all gcc versions (from 3.4-4.3) produce 0000000000000000 : 0: 48 89 37 mov %rsi,(%rdi) 3: c3 retq Apparently there is a difference in the .h files that caused the difference. So, at least on this architecture, with the right header files, the result is optimal. Next, PowerPC (32 bit), Debian 5.0: All gcc versions, from 2.95 to 4.3 produce 0: 94 21 ff f0 stwu r1,-16(r1) 4: 90 83 00 00 stw r4,0(r3) 8: 38 21 00 10 addi r1,r1,16 c: 4e 80 00 20 blr By contrast, compiling void bar(char *p, long l) { *(long *)p = l; } produces 0: 90 83 00 00 stw r4,0(r3) 4: 4e 80 00 20 blr on all of these gcc versions. So following the gcc maintainer's idea of how to code produces more code even when using gcc itself. Note that gcc-4.3 is from 2008, after your 2006 deadline; it is also the newest gcc I have on that machine. Just so you don't think I cherry-pick the example, here are the results for my original example: Source code: void foo(long *p, long *q) { long x; memcpy(&x,q, sizeof(long)); x = ~x; memcpy(p,&x, sizeof(long)); } void bar(long *p, long *q) { *p = ~*q; } Object code (note that the compiler swaps foo and bar): 00000000 : 0: 80 04 00 00 lwz r0,0(r4) 4: 7c 00 00 f8 not r0,r0 8: 90 03 00 00 stw r0,0(r3) c: 4e 80 00 20 blr 00000010 : 10: 94 21 ff e0 stwu r1,-32(r1) 14: 80 04 00 00 lwz r0,0(r4) 18: 7c 00 00 f8 not r0,r0 1c: 90 03 00 00 stw r0,0(r3) 20: 38 21 00 20 addi r1,r1,32 24: 4e 80 00 20 blr >BTW, I think, that on architectures that support arbitrary-aligned accesses, gcc just threatens that it can miscompile the code above in the future rather than does it right now. Yes, for the simple one-word copy the auto-vectorizer probably will not trigger, but there are at least two cases where the auto-vectorizer actually miscompiled existing code (reported as gcc bugs and closed as invalid). In general, gcc threatens to miscompile all code that contains undefined behaviour, i.e., most real applications. - anton -- M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html