Deutsch English Français Italiano |
<vhq6b4$17hkq$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Bart <bc@freeuk.com> Newsgroups: comp.lang.c Subject: Re: else ladders practice Date: Fri, 22 Nov 2024 15:00:51 +0000 Organization: A noiseless patient Spider Lines: 129 Message-ID: <vhq6b4$17hkq$1@dont-email.me> References: <3deb64c5b0ee344acd9fbaea1002baf7302c1e8f@i2pn2.org> <vgd3ro$2pvl4$1@paganini.bofh.team> <vgdc4q$1ikja$1@dont-email.me> <vgdt36$2r682$2@paganini.bofh.team> <vge8un$1o57r$3@dont-email.me> <vgpi5h$6s5t$1@paganini.bofh.team> <vgtsli$1690f$1@dont-email.me> <vhgr1v$2ovnd$1@paganini.bofh.team> <vhic66$1thk0$1@dont-email.me> <vhins8$1vuvp$1@dont-email.me> <vhj7nc$2svjh$1@paganini.bofh.team> <vhje8l$2412p$1@dont-email.me> <WGl%O.42744$LlWc.33050@fx42.iad> <vhkr9e$4bje$1@dont-email.me> <vhptmn$3mlgf$1@paganini.bofh.team> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 22 Nov 2024 16:00:52 +0100 (CET) Injection-Info: dont-email.me; posting-host="373db3823bc9f838b9ab0e3fdedd4a11"; logging-data="1296026"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX197E3quwljsycRoxggcLvJz" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:wvbSciwUzqEymU4Pk8z0LYS34S0= In-Reply-To: <vhptmn$3mlgf$1@paganini.bofh.team> Content-Language: en-GB Bytes: 5929 On 22/11/2024 12:33, Waldek Hebisch wrote: > Bart <bc@freeuk.com> wrote: >> >> Sure. That's when you run a production build. I can even do that myself >> on some programs (the ones where my C transpiler still works) and pass >> it through gcc-O3. Then it might run 30% faster. > > On fast machine running Dhrystone 2.2a I get: > > tcc-0.9.28rc 20000000 > gcc-12.2 -O 64184852 > gcc-12.2 -O2 83194672 > clang-14 -O 83194672 > clang-14 -O2 85763288 > > so with 02 this is more than 4 times faster. Dhrystone correlated > resonably with runtime of tight compute-intensive programs. > Compiler started to cheat on original Dhrystone, so there are > bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications > to make cheating harder, so I think it is still reasonable > benchmark. Actually, difference may be much bigger, for example > in image processing both clang and gcc can use vector intructions, > with may give additional speedup of order 8-16. > > 30% above means that you are much better than tcc or your program > is badly behaving (I have programs that make intensive use of > memory, here effect of optimization would be smaller, but still > of order 2). The 30% applies to my typical programs, not benchmarks. Sure, gcc -O3 can do a lot of aggressive optimisations when everything is contained within one short module and most runtime is spent in clear bottlenecks. Real apps, like say my compilers, are different. They tend to use globals more, program flow is more disseminated. The bottlenecks are harder to pin down. But, OK, here's the first sizeable benchmark that I thought of (I can't find a reliable Dhrystone one; perhaps you can post a link). It's called Deltablue.c, copied to db.c below for convenience. I've no idea what it does, but the last figure shown is the runtime, so smaller is better: c:\cx>cc -r db Compiling db.c to db.(run) DeltaBlue C <S:> 1000x 0.517ms c:\cx>tcc -run db.c DeltaBlue C <S:> 1000x 0.546ms c:\cx>gcc db.c && a DeltaBlue C <S:> 1000x 0.502ms c:\cx>gcc -O3 db.c && a DeltaBlue C <S:> 1000x 0.314ms So here gcc is 64% faster than my product. However my 'cc' doesn't yet have the register allocator of the older 'mcc' compiler (which simply keeps some locals in registers). That gives this result: c:\cx>mcc -o3 db && db Compiling db.c to db.exe DeltaBlue C <S:> 1000x 0.439ms So, 40% faster, for a benchmark. Now, for a more practical test. First I will create an optimised version of my compiler via transpiling to C: c:\mx6>mc -opt mm -out:mmgcc M6 Compiling mm.m---------- to mmgcc.exe W:Invoking C compiler: gcc -m64 -O3 -ommgcc.exe mmgcc.c -s Now I run my normal compiler, self-hosted, on a test program 'fann4.m': c:\mx6>tm mm \mx\big\fann4 -ext Compiling \mx\big\fann4.m to \mx\big\fann4.exe TM: 0.99 Now the gcc-optimised version: c:\mx6>tm mmgcc \mx\big\fann4 -ext Compiling \mx\big\fann4.m to \mx\big\fann4.exe TM: 0.78 So it's 27% faster. Note that fann4.m is 740Kloc, so this represents compilation speed of just under a million lines per second. Some other stats: c:\mx6>dir mm.exe mmgcc.exe 22/11/2024 14:43 393,216 mm.exe 22/11/2024 14:37 651,776 mmgcc.exe So my product has a smaller EXE too. For more typical inputs, the differences are narrower: c:\mx6>copy mm.m bb.m c:\mx6>tm mm bb Compiling bb.m to bb.exe TM: 0.09 c:\mx6>tm mmgcc bb -ext Compiling bb.m to bb.exe TM: 0.08 gcc-O3 is 12% faster, saving 10ms in compile-time. Curious about how tcc would fare? Let's try it: c:\mx6>mc -tcc mm -out:mmtcc M6 Compiling mm.m---------- to mmtcc.exe W:Invoking C compiler: tcc -ommtcc.exe mmtcc.c c:\windows\system32\user32.dll -luser32 c:\windows\system32\kernel32.dll -fdollars-in-identifiers c:\mx6>tm mmtcc bb Compiling bb.m to bb.exe TM: 0.11 Yeah, a tcc-compiled M compiler would take 0.03 seconds longer to build my 35Kloc compiler than a gcc-O3-compiled one; about 37% slower. One more point: when gcc builds my compiler, it can use whole-program optimisation because the input is one source file. So that gives it an extra edge compared with compiling individual modules.