Deutsch English Français Italiano |
<vhq7ik$17oi0$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.lang.c Subject: Re: else ladders practice Date: Fri, 22 Nov 2024 16:21:55 +0100 Organization: A noiseless patient Spider Lines: 89 Message-ID: <vhq7ik$17oi0$1@dont-email.me> References: <3deb64c5b0ee344acd9fbaea1002baf7302c1e8f@i2pn2.org> <vgd3ro$2pvl4$1@paganini.bofh.team> <vgdc4q$1ikja$1@dont-email.me> <vgdt36$2r682$2@paganini.bofh.team> <vge8un$1o57r$3@dont-email.me> <vgpi5h$6s5t$1@paganini.bofh.team> <vgtsli$1690f$1@dont-email.me> <vhgr1v$2ovnd$1@paganini.bofh.team> <vhic66$1thk0$1@dont-email.me> <vhins8$1vuvp$1@dont-email.me> <vhj7nc$2svjh$1@paganini.bofh.team> <vhje8l$2412p$1@dont-email.me> <WGl%O.42744$LlWc.33050@fx42.iad> <vhkr9e$4bje$1@dont-email.me> <vhptmn$3mlgf$1@paganini.bofh.team> <20241122161905.000018b9@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 22 Nov 2024 16:21:58 +0100 (CET) Injection-Info: dont-email.me; posting-host="05babf2ed3977eb75892bdd0e2d0437e"; logging-data="1303104"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+GT/rN8j7TfA22S+5QYPn49o95vEhZN5E=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cancel-Lock: sha1:9CTH9vtq53qFvYCme0ywvyUmF4k= In-Reply-To: <20241122161905.000018b9@yahoo.com> Content-Language: en-GB Bytes: 5904 On 22/11/2024 15:19, Michael S wrote: > On Fri, 22 Nov 2024 12:33:29 -0000 (UTC) > antispam@fricas.org (Waldek Hebisch) wrote: > >> Bart <bc@freeuk.com> wrote: >>> >>> Sure. That's when you run a production build. I can even do that >>> myself on some programs (the ones where my C transpiler still >>> works) and pass it through gcc-O3. Then it might run 30% faster. >> >> On fast machine running Dhrystone 2.2a I get: >> >> tcc-0.9.28rc 20000000 >> gcc-12.2 -O 64184852 >> gcc-12.2 -O2 83194672 >> clang-14 -O 83194672 >> clang-14 -O2 85763288 >> >> so with 02 this is more than 4 times faster. Dhrystone correlated >> resonably with runtime of tight compute-intensive programs. >> Compiler started to cheat on original Dhrystone, so there are >> bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications >> to make cheating harder, so I think it is still reasonable >> benchmark. Actually, difference may be much bigger, for example >> in image processing both clang and gcc can use vector intructions, >> with may give additional speedup of order 8-16. >> >> 30% above means that you are much better than tcc or your program >> is badly behaving (I have programs that make intensive use of >> memory, here effect of optimization would be smaller, but still >> of order 2). >> > > gcc -O is not what Bart was talking about. It is quite similar to -O1. "Similar" in this particular case being a synonym for "identical" :-) > Try gcc -O0. > With regard to speedup, I had run only one or two benchmarks with tcc > and my results were close to those of Bart. gcc -O0 very similar to tcc > in speed of the exe, but compiles several times slower. gcc -O2 exe > about 2.5 times faster. (Note that "gcc -O0" is still a vastly more powerful compiler than tcc in many ways.) > I'd guess, I can construct a case, where gcc successfully vectorized > some floating-point loop calculation and showed 10x speed up vs tcc on > modern Zen5 hardware. But that's would not be typical. > The effect you get from optimisation depends very much on the code in question, the exact compiler flags, and also on the processor you are using. Fairly obviously, if your code spends a lot of time in system calls, waiting for external events (files, networks, etc.), or calling code in other separately compiled libraries, then optimisation of your code will make almost no difference. Something that does a lot of calculations and data manipulation, on the other hand, can be much faster. Even then, however, it depends on what you are doing. Beyond simple "-O3" flags, things like "-march=native" and "-ffast-math" (if you have floating point calculations, and you are sure this does not affect the correctness of the code!) can make a huge difference by allowing more re-arrangements, vector/SIMD processing, using more instructions on newer processors, and having a more accurate model of scheduling. And the type of processor is also very important. x86 processors are tuned to running crappy code, since a lot of the time they are used to run old binaries made by old tools, or binaries made by people who don't know how to use their tools well. So they have features like extremely local data caches to hide the cost of using the stack for local variables instead of registers. And often it doesn't matter if you do one instruction or a dozen instructions, because you are waiting for memory anyway. If you are looking at microcontrollers, on the other hand, optimisation can make a huge difference for a lot of real-world code. There is also another substantial difference in code efficiency that is missed out in these sorts of pretend benchmarks. When efficiency really matters, top-shelf compilers give you features and extensions to help. You can use intrinsics, or vector extensions, or pragmas, or attributes, or "builtins", to give the compiler more information and work with it to give more opportunities for optimisation. Many of these are not portable (or of limited portability), and getting top speed from your code is not an easy job, but you certainly have possibilities with a tool like gcc or clang that you can never have with tcc or other tiny compilers.