Article <vhso61$1o2of$1@dont-email.me>

Deutsch English Français Italiano
<vhso61$1o2of$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: else ladders practice
Date: Sat, 23 Nov 2024 14:17:36 +0000
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <vhso61$1o2of$1@dont-email.me>
References: <3deb64c5b0ee344acd9fbaea1002baf7302c1e8f@i2pn2.org>
 <vgdt36$2r682$2@paganini.bofh.team> <vge8un$1o57r$3@dont-email.me>
 <vgpi5h$6s5t$1@paganini.bofh.team> <vgtsli$1690f$1@dont-email.me>
 <vhgr1v$2ovnd$1@paganini.bofh.team> <vhic66$1thk0$1@dont-email.me>
 <vhins8$1vuvp$1@dont-email.me> <vhj7nc$2svjh$1@paganini.bofh.team>
 <vhje8l$2412p$1@dont-email.me> <WGl%O.42744$LlWc.33050@fx42.iad>
 <vhkr9e$4bje$1@dont-email.me> <vhptmn$3mlgf$1@paganini.bofh.team>
 <vhq6b4$17hkq$1@dont-email.me> <vhqm3l$3ntp7$1@paganini.bofh.team>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Nov 2024 15:17:38 +0100 (CET)
Injection-Info: dont-email.me; posting-host="a58c04c916580acacf6def0185ed8157";
	logging-data="1837839"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/JJh32PojrULEbI3vGW56n"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:bn6SbHioDm1pE2onSNUF+3ZWgEs=
In-Reply-To: <vhqm3l$3ntp7$1@paganini.bofh.team>
Content-Language: en-GB
Bytes: 4487

On 22/11/2024 19:29, Waldek Hebisch wrote:
> Bart <bc@freeuk.com> wrote:

> clang -O3 -march=native       126112us
> clang -O3                     222136us
> clang -O                      225855us
> gcc -O3 -march=native          82809us
> gcc -O3                       114365us
> gcc -O                        287786us
> tcc                           757347us

You've omitted -O0 for gcc and clang. That timing probably won't be too 
far from tcc, but compilation time for larger programs will be 
significantly longer (eg. 10 times or more).

The trade-off then is not worth it unless you are running gcc for other 
reasons (eg. for deeper analysis, or to compile less portable code that 
has only been tested on or written for gcc/clang; or just an irrational 
hatred of simple tools).

> 
> There is some irregularity in timings, but this shows that
> factor of order 9 is possible.

That's an extreme case, for one small program with one obvious 
bottleneck where it spends 99% of its time, and with little use of 
memory either.

For simply written programs, the difference is more like 2:1. For more 
complicated C code that makes much use of macros that can expand to lots 
of nested function calls, it might be 4:1, since it might rely on 
optimisation to inline some of those calls.

Again, that would be code written to take advantage of specific compilers.

But that is still computationally intensive code working on small 
amounts of memory.

I have a text editor written in my scripting language. I can translate 
its interpreter to C and compile with both gcc-O3 and tcc.

Then, yes, you will notice twice as much latency with the tcc 
interpreter compared with gcc-O3, when doing things like 
deleting/inserting lines at the beginning of a 1000000-line text file.

But typically, the text files will be 1000 times smaller; you will 
notice no difference at all.

I'm not saying no optimisation is needed, ever, I'm saying that the NEED 
for optimisation is far smaller than most people seem to think.

Here are some timings for that interpreter, when used to run a script to 
compute fib(38) the long way:

Interp   Built with       Timing

qc       tcc              9.0 secs    (qc is C transpiled version)
qq       mm               5.0         (-fn; qq is original M version) 

qc       gcc-O3           4.0
qq       mm               1.2         (-asm)

(My interpreter doesn't bother with faster switch-based or computed-goto 
based dispatchers. The choice is between a slower function-table-based 
one, and an accelerated threaded-code version using inline ASM.

These are selected with -fn/-asm options. The -asm version is not JIT; 
it is still interpreting a bytecode at a time).

So the fastest version here doesn't use compiler optimisation, and it's 
3 times the speed of gcc-O3. My unoptimised HLL code is also only 25% 
slower than gcc-O3.

That is for this test, but that's also one that is popular for language 
benchmarks.