Article <v4et7o$29ejt$1@dont-email.me>

Deutsch English Français Italiano
<v4et7o$29ejt$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: Baby X is bor nagain
Date: Thu, 13 Jun 2024 14:46:32 +0100
Organization: A noiseless patient Spider
Lines: 94
Message-ID: <v4et7o$29ejt$1@dont-email.me>
References: <v494f9$von8$1@dont-email.me>
 <v49seg$14cva$1@raubtier-asyl.eternal-september.org>
 <v49t6f$14i1o$1@dont-email.me>
 <v4bcbj$1gqlo$1@raubtier-asyl.eternal-september.org>
 <v4bh56$1hibd$1@dont-email.me> <v4c0mg$1kjmk$1@dont-email.me>
 <v4c8s4$1lki1$4@dont-email.me> <20240613002933.000075c5@yahoo.com>
 <v4emki$28d1b$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 13 Jun 2024 15:46:32 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="67a186d3dfa54250815f91d60b152bd4";
	logging-data="2407037"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19OF5eRia3tKUfvvFdq4v4Q"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:TSLEHoifray85UovFN2xYM1qeJY=
Content-Language: en-GB
In-Reply-To: <v4emki$28d1b$1@dont-email.me>
Bytes: 5284

On 13/06/2024 12:53, David Brown wrote:
> On 12/06/2024 23:29, Michael S wrote:
>> On Wed, 12 Jun 2024 15:46:44 +0200
>> David Brown <david.brown@hesbynett.no> wrote:
>>
>>>
>>> I also don't imagine that string literals would be much faster for
>>> compilation, at least for file sizes that I think make sense.
>>
>> Just shows how little do you know about internals of typical compiler.
>> Which, by itself, is o.k. What is not o.k. is that with your level of
>> knowledge you have a nerve to argue vs bart that obviously knows a lot
>> more.
>>
> 
> I know more than most C programmers about how certain C compilers work, 
> and what works well with them, and what is relevant for them - though I 
> certainly don't claim to know everything.  Obviously Bart knows vastly 
> more about how /his/ compiler works.  He also tends to do testing with 
> several small and odd C compilers, which can give interesting results 
> even though they are of little practical relevance for real-world C 
> development work.
> 
> Testing a 1 MB file of random data, gcc -O2 took less than a second to 
> compile it.  One megabyte is about the biggest size I would think makes 
> sense to embed directly in C code unless you are doing something very 
> niche - usually if you need that much data, you'd be better off with 
> separate files and standardised packaging systems like zip files, 
> installer setup.exe builds, or that kind of thing.

Here are some tests embedding a 1.1 MB binary on my machine:

                      Numbers        One string

   gcc 14.1 -O0       3.2 (0.2)      0.4 (0.2)     Seconds

   tcc                0.4 (0.03)     0.07 (0.03)

Using 'One string' makes gcc as fast as Tiny C working with 'Numbers'!

The figures in brackets are the build times for hello.c, to better 
appreciate the differences.

Including those overheads, 'One string' makes gcc 8 times as fast as 
with 'Numbers'. Excluded those overheads, and it is 15 times as fast 
(3.0 vs 0.2).

For comparions, here is the timing for my non-C compiler using direct 
embedding:

   mm                 0.05 (0.03)

The extra time compared with 'hello' is 20ms; tcc was 370/40ms, and gcc 
was 3000/200ms.

> Using string literals, the compile time was shorter, but when you are 
> already below a second, it's all just irrelevant noise.

My machine is slower than yours. It's not anyway just about one machine 
and one program. You're choosing to spend 10 times as long to do a task, 
using resources that could be used for other processes, and using extra 
power.

But if you are creating a tool for N other people to use who may be 
running it M times a day on data of size X, you can't just dismiss these 
considerations. You don't know how far people will push the operating 
limits of your tool.

> Each individual string is up to 2048 bytes, which can be concatenated to 
> a maximum of 65K in total.
> 
> I see other links giving different values, but I expect the MS ones to 
> be authoritative.  It is possible that newer versions of their C 
> compiler have removed the limit, just as for their C++ compiler, but it 
> was missing from that webpage.
> 
> (And I noticed also someone saying that MSVC is 70x faster at using 
> string literals compared to lists of integers for array initialisation.)

That doesn't sound unreasonable.

Note that it is not necessary to use one giant string; you can chop it 
up into smaller strings, say with one line's worth of values per string, 
and still get most of the benefits. It's just a tiny bit more fiddly to 
generate the strings.

Within my compiler, each single number takes a 64-byte record to 
represent. So 1MB of data takes 64MB, while a 1MB string takes one 
64-byte record plus the 1MB of the string data.

Then there are the various type analysis and other passes that have to 
be done a million times rather then once. I'd imagine that compilers 
like gcc do a lot more.