Deutsch   English   Français   Italiano  
<v2vtdi$3gnl6$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: C23 thoughts and opinions
Date: Sun, 26 May 2024 19:01:21 +0100
Organization: A noiseless patient Spider
Lines: 154
Message-ID: <v2vtdi$3gnl6$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
 <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
 <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
 <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
 <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
 <87y18047jk.fsf@nosuchdomain.example.com>
 <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
 <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
 <v2v7ni$3d70v$1@dont-email.me> <20240526161832.000012a6@yahoo.com>
 <v2vka0$3f4a2$1@dont-email.me> <20240526193549.000031a8@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 May 2024 20:01:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8e9185d4e0820a2f5b79db78e2103a30";
	logging-data="3694246"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19wRvIA/Sq7Xx/6x7m4WQvE"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:PMgautufRr38wgGJDKObXpJl9AE=
In-Reply-To: <20240526193549.000031a8@yahoo.com>
Content-Language: en-GB
Bytes: 6701

On 26/05/2024 17:35, Michael S wrote:
> On Sun, 26 May 2024 16:25:51 +0100
> bart <bc@freeuk.com> wrote:
> 
>> On 26/05/2024 14:18, Michael S wrote:

>> Are you talking about a 5MB array initialised like this:
>>
>> unsigned char data[] = {
>>      45,
>>      67,
>>      17,
>>      ...            // 5M-3 more rows
>> };
>>
> 
> Yes.
> 
>> The timing for 120M entries was challenging as it exceeded physical
>> memory. However that test I can also do with C compilers. Results for
>> 120 million lines of data are:
>>
>>     DMC          -    Out-of-memory
>>
>>     Tiny C       -    Silently stopped after 13 second (I thought it
>> had finished but no)
>>
>>     lccwin32     -    Insufficient memory
>>
>>     gcc 10.x.x   -    Out of memory after 80 seconds
>>
>>     mcc          -    (My product) Memory failure after 27 seconds
>>
>>     Clang        -    (Crashed after 5 minutes)
>>
>>     MM         144s   (Compiler for my language)
>>
>> So the compiler for my language did quite well, considering!
>>
> 
> That's an interesting test as well, but I don't want to run it on my HW
> right now. May be, at night.
> 
>>
>> Back to the 5MB test:
>>
>>     Tiny C     1.7s    2.9MB/sec (Tcc doesn't use any IR)
>>
>>     mcc        3.7s    1.3MB/sec (my product; uses intermediate ASM)
> 
> Faster than new MSVC, but slower than old MSVC.

My mcc is never going to be fast, because it uses ASM, which itself will 
generate a text file several times larger than the C (so the line "123," 
in C ends up as "   db    123" in the ASM file).

However I've looked at a possible way of speeding this up in general, 
see below.


>>
>>     DMC        --      --        (Out of memory; 32-bit compiler)
>>
>>     lccwin32   3.9s    1.3MB/sec
>>
>>     gcc 10.x  10.6s    0.5MB/sec
>>
>>     clang      7.4s    0.7MB/sec (to object file only)
>>
>>     MM         1.4s    3.6MB/sec (compiler for my language)
>>
>>     MM         0.7     7.1MB/sec (MM optimised via C and gcc-O3)
>>
> 
> That's quite impressive.
> Does it generate object files or goes directly to exe?

All produce EXE files, via linkers if necessary, except Clang (its hefty 
LLVM installation doesn't come with standard C headers, nor a linker; it 
depends on MS tools, but never manages to sync with them).

My MM product directly generates EXE files with no intermediate OBJ files.

> Even if later, it's still impressive.


So, it's more impressive if it first generates an OBJ file then invokes 
a linker? I'd have thought that eliminating that pointless intermediate 
step would be more impressive!

Anyway, I thought of a way of speeding up initialisation of byte-arrays 
which is, instead of parsing each value into its own AST node, to 
directly parse successive numeric values into a special data-string 
object (similar to normal strings, and identical to the data-strings 
used for embedded data).

Then there is only one AST node containing one 'string' value, instead 
of 5M or 120M nodes.

This produced a timing, for 5M lines, of 0.34s (0.28s optimised), a 
throughput of 15-18MB/sec.

When I applied this to the 120M line data (which is a 0.6GB source 
file), it finished in 6.5 seconds (5.5 optimised), or 18-21MB/sec. 
Previously that took 144 seconds.

However I can't keep that experimental code, since if it turns out not 
all values are constant expressions, it has to revert to normal 
processing, which is tricky to do; it may already have read 1M numbers 
and needs to backtrack). This was just to see how fast it could be.

Processing 120MB as binary rather than text is still faster; that works 
at up to 110MB/sec with an optimised compiler.


>> As a reminder, when using my version of 'embed' in my language,
>> embedding a 120MB binary file took 1.3 seconds, about 90MB/second.
>>
>>
>>> But both are much faster than compiling through text. Even "slow"
>>> 40MB/3 is 6-7 times faster than the fastest of compilers in my
>>> tests.
>>
>> Do you have a C compiler that supports #embed?
>>
> 
> No, I just blindly believe the paper.

Funny that no one else has access to an implementation! Those figures 
have been around for a while.

> But it probably would be available in clang this year and in gcc around
> start of the next year. At least I hope so.
> 
>> It's generally understood that processing text is slow, if
>> representing byte-at-a-time data. If byte arrays could be represented
>> as sequences of i64 constants, it would improve matters. That could
>> be done in C, but awkwardly, by aliasing a byte-array with an
>> i64-array.
>>
> 
> I don't think that conversion from text to binary is a significant
> bottleneck here.

That's not quite what I meant. That conversion is the lexical part of 
processing source code, it can be very fast.

It is parsing, and especially constructing a list of 5M or 120M AST 
nodes, each containing one expression, and the subsequent type-checking 
and code generation that takes the time.

However your benchmark looks intriguing and I'll have a closer look later.