| Deutsch English Français Italiano |
|
<v2vtdi$3gnl6$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: C23 thoughts and opinions
Date: Sun, 26 May 2024 19:01:21 +0100
Organization: A noiseless patient Spider
Lines: 154
Message-ID: <v2vtdi$3gnl6$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
<00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
<v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
<f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
<v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
<87y18047jk.fsf@nosuchdomain.example.com>
<87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
<87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
<v2v7ni$3d70v$1@dont-email.me> <20240526161832.000012a6@yahoo.com>
<v2vka0$3f4a2$1@dont-email.me> <20240526193549.000031a8@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 May 2024 20:01:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8e9185d4e0820a2f5b79db78e2103a30";
logging-data="3694246"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wRvIA/Sq7Xx/6x7m4WQvE"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:PMgautufRr38wgGJDKObXpJl9AE=
In-Reply-To: <20240526193549.000031a8@yahoo.com>
Content-Language: en-GB
Bytes: 6701
On 26/05/2024 17:35, Michael S wrote:
> On Sun, 26 May 2024 16:25:51 +0100
> bart <bc@freeuk.com> wrote:
>
>> On 26/05/2024 14:18, Michael S wrote:
>> Are you talking about a 5MB array initialised like this:
>>
>> unsigned char data[] = {
>> 45,
>> 67,
>> 17,
>> ... // 5M-3 more rows
>> };
>>
>
> Yes.
>
>> The timing for 120M entries was challenging as it exceeded physical
>> memory. However that test I can also do with C compilers. Results for
>> 120 million lines of data are:
>>
>> DMC - Out-of-memory
>>
>> Tiny C - Silently stopped after 13 second (I thought it
>> had finished but no)
>>
>> lccwin32 - Insufficient memory
>>
>> gcc 10.x.x - Out of memory after 80 seconds
>>
>> mcc - (My product) Memory failure after 27 seconds
>>
>> Clang - (Crashed after 5 minutes)
>>
>> MM 144s (Compiler for my language)
>>
>> So the compiler for my language did quite well, considering!
>>
>
> That's an interesting test as well, but I don't want to run it on my HW
> right now. May be, at night.
>
>>
>> Back to the 5MB test:
>>
>> Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)
>>
>> mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)
>
> Faster than new MSVC, but slower than old MSVC.
My mcc is never going to be fast, because it uses ASM, which itself will
generate a text file several times larger than the C (so the line "123,"
in C ends up as " db 123" in the ASM file).
However I've looked at a possible way of speeding this up in general,
see below.
>>
>> DMC -- -- (Out of memory; 32-bit compiler)
>>
>> lccwin32 3.9s 1.3MB/sec
>>
>> gcc 10.x 10.6s 0.5MB/sec
>>
>> clang 7.4s 0.7MB/sec (to object file only)
>>
>> MM 1.4s 3.6MB/sec (compiler for my language)
>>
>> MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)
>>
>
> That's quite impressive.
> Does it generate object files or goes directly to exe?
All produce EXE files, via linkers if necessary, except Clang (its hefty
LLVM installation doesn't come with standard C headers, nor a linker; it
depends on MS tools, but never manages to sync with them).
My MM product directly generates EXE files with no intermediate OBJ files.
> Even if later, it's still impressive.
So, it's more impressive if it first generates an OBJ file then invokes
a linker? I'd have thought that eliminating that pointless intermediate
step would be more impressive!
Anyway, I thought of a way of speeding up initialisation of byte-arrays
which is, instead of parsing each value into its own AST node, to
directly parse successive numeric values into a special data-string
object (similar to normal strings, and identical to the data-strings
used for embedded data).
Then there is only one AST node containing one 'string' value, instead
of 5M or 120M nodes.
This produced a timing, for 5M lines, of 0.34s (0.28s optimised), a
throughput of 15-18MB/sec.
When I applied this to the 120M line data (which is a 0.6GB source
file), it finished in 6.5 seconds (5.5 optimised), or 18-21MB/sec.
Previously that took 144 seconds.
However I can't keep that experimental code, since if it turns out not
all values are constant expressions, it has to revert to normal
processing, which is tricky to do; it may already have read 1M numbers
and needs to backtrack). This was just to see how fast it could be.
Processing 120MB as binary rather than text is still faster; that works
at up to 110MB/sec with an optimised compiler.
>> As a reminder, when using my version of 'embed' in my language,
>> embedding a 120MB binary file took 1.3 seconds, about 90MB/second.
>>
>>
>>> But both are much faster than compiling through text. Even "slow"
>>> 40MB/3 is 6-7 times faster than the fastest of compilers in my
>>> tests.
>>
>> Do you have a C compiler that supports #embed?
>>
>
> No, I just blindly believe the paper.
Funny that no one else has access to an implementation! Those figures
have been around for a while.
> But it probably would be available in clang this year and in gcc around
> start of the next year. At least I hope so.
>
>> It's generally understood that processing text is slow, if
>> representing byte-at-a-time data. If byte arrays could be represented
>> as sequences of i64 constants, it would improve matters. That could
>> be done in C, but awkwardly, by aliasing a byte-array with an
>> i64-array.
>>
>
> I don't think that conversion from text to binary is a significant
> bottleneck here.
That's not quite what I meant. That conversion is the lexical part of
processing source code, it can be very fast.
It is parsing, and especially constructing a list of 5M or 120M AST
nodes, each containing one expression, and the subsequent type-checking
and code generation that takes the time.
However your benchmark looks intriguing and I'll have a closer look later.