Article <vhq6b4$17hkq$1@dont-email.me>

Deutsch English Français Italiano
<vhq6b4$17hkq$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: else ladders practice
Date: Fri, 22 Nov 2024 15:00:51 +0000
Organization: A noiseless patient Spider
Lines: 129
Message-ID: <vhq6b4$17hkq$1@dont-email.me>
References: <3deb64c5b0ee344acd9fbaea1002baf7302c1e8f@i2pn2.org>
 <vgd3ro$2pvl4$1@paganini.bofh.team> <vgdc4q$1ikja$1@dont-email.me>
 <vgdt36$2r682$2@paganini.bofh.team> <vge8un$1o57r$3@dont-email.me>
 <vgpi5h$6s5t$1@paganini.bofh.team> <vgtsli$1690f$1@dont-email.me>
 <vhgr1v$2ovnd$1@paganini.bofh.team> <vhic66$1thk0$1@dont-email.me>
 <vhins8$1vuvp$1@dont-email.me> <vhj7nc$2svjh$1@paganini.bofh.team>
 <vhje8l$2412p$1@dont-email.me> <WGl%O.42744$LlWc.33050@fx42.iad>
 <vhkr9e$4bje$1@dont-email.me> <vhptmn$3mlgf$1@paganini.bofh.team>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 22 Nov 2024 16:00:52 +0100 (CET)
Injection-Info: dont-email.me; posting-host="373db3823bc9f838b9ab0e3fdedd4a11";
	logging-data="1296026"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX197E3quwljsycRoxggcLvJz"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:wvbSciwUzqEymU4Pk8z0LYS34S0=
In-Reply-To: <vhptmn$3mlgf$1@paganini.bofh.team>
Content-Language: en-GB
Bytes: 5929

On 22/11/2024 12:33, Waldek Hebisch wrote:
> Bart <bc@freeuk.com> wrote:
>>
>> Sure. That's when you run a production build. I can even do that myself
>> on some programs (the ones where my C transpiler still works) and pass
>> it through gcc-O3. Then it might run 30% faster.
> 
> On fast machine running Dhrystone 2.2a I  get:
> 
> tcc-0.9.28rc    20000000
> gcc-12.2 -O     64184852
> gcc-12.2 -O2    83194672
> clang-14 -O     83194672
> clang-14 -O2    85763288
> 
> so with 02 this is more than 4 times faster.  Dhrystone correlated
> resonably with runtime of tight compute-intensive programs.
> Compiler started to cheat on original Dhrystone, so there are
> bigger benchmarks like SPEC INT.  But Dhrystone 2 has modifications
> to make cheating harder, so I think it is still reasonable
> benchmark.  Actually, difference may be much bigger, for example
> in image processing both clang and gcc can use vector intructions,
> with may give additional speedup of order 8-16.
> 
> 30% above means that you are much better than tcc or your program
> is badly behaving (I have programs that make intensive use of
> memory, here effect of optimization would be smaller, but still
> of order 2).

The 30% applies to my typical programs, not benchmarks. Sure, gcc -O3 
can do a lot of aggressive optimisations when everything is contained 
within one short module and most runtime is spent in clear bottlenecks.

Real apps, like say my compilers, are different. They tend to use 
globals more, program flow is more disseminated. The bottlenecks are 
harder to pin down.

But, OK, here's the first sizeable benchmark that I thought of (I can't 
find a reliable Dhrystone one; perhaps you can post a link).

It's called Deltablue.c, copied to db.c below for convenience. I've no 
idea what it does, but the last figure shown is the runtime, so smaller 
is better:

   c:\cx>cc -r db
   Compiling db.c to db.(run)
   DeltaBlue       C       <S:>    1000x   0.517ms

   c:\cx>tcc -run db.c
   DeltaBlue       C       <S:>    1000x   0.546ms

   c:\cx>gcc db.c && a
   DeltaBlue       C       <S:>    1000x   0.502ms

   c:\cx>gcc -O3 db.c && a
   DeltaBlue       C       <S:>    1000x   0.314ms

So here gcc is 64% faster than my product. However my 'cc' doesn't yet 
have the register allocator of the older 'mcc' compiler (which simply 
keeps some locals in registers). That gives this result:

   c:\cx>mcc -o3 db && db
   Compiling db.c to db.exe
   DeltaBlue       C       <S:>    1000x   0.439ms

So, 40% faster, for a benchmark.

Now, for a more practical test. First I will create an optimised version 
of my compiler via transpiling to C:

   c:\mx6>mc -opt mm -out:mmgcc
   M6 Compiling mm.m---------- to mmgcc.exe
   W:Invoking C compiler: gcc -m64  -O3 -ommgcc.exe mmgcc.c -s

Now I run my normal compiler, self-hosted, on a test program 'fann4.m':

   c:\mx6>tm mm \mx\big\fann4 -ext
   Compiling \mx\big\fann4.m to \mx\big\fann4.exe
   TM: 0.99

Now the gcc-optimised version:

   c:\mx6>tm mmgcc \mx\big\fann4 -ext
   Compiling \mx\big\fann4.m to \mx\big\fann4.exe
   TM: 0.78

So it's 27% faster. Note that fann4.m is 740Kloc, so this represents 
compilation speed of just under a million lines per second.

Some other stats:

   c:\mx6>dir mm.exe mmgcc.exe
   22/11/2024  14:43           393,216 mm.exe
   22/11/2024  14:37           651,776 mmgcc.exe

So my product has a smaller EXE too. For more typical inputs, the 
differences are narrower:

   c:\mx6>copy mm.m bb.m

   c:\mx6>tm mm bb
   Compiling bb.m to bb.exe
   TM: 0.09

   c:\mx6>tm mmgcc bb -ext
   Compiling bb.m to bb.exe
   TM: 0.08

gcc-O3 is 12% faster, saving 10ms in compile-time. Curious about how tcc 
would fare? Let's try it:

   c:\mx6>mc -tcc mm -out:mmtcc
   M6 Compiling mm.m---------- to mmtcc.exe
   W:Invoking C compiler: tcc  -ommtcc.exe mmtcc.c 
c:\windows\system32\user32.dll -luser32 c:\windows\system32\kernel32.dll 
-fdollars-in-identifiers

   c:\mx6>tm mmtcc bb
   Compiling bb.m to bb.exe
   TM: 0.11

Yeah, a tcc-compiled M compiler would take 0.03 seconds longer to build 
my 35Kloc compiler than a gcc-O3-compiled one; about 37% slower.

One more point: when gcc builds my compiler, it can use whole-program 
optimisation because the input is one source file. So that gives it an 
extra edge compared with compiling individual modules.