Deutsch English Français Italiano |
<v2vtdi$3gnl6$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: bart <bc@freeuk.com> Newsgroups: comp.lang.c Subject: Re: C23 thoughts and opinions Date: Sun, 26 May 2024 19:01:21 +0100 Organization: A noiseless patient Spider Lines: 154 Message-ID: <v2vtdi$3gnl6$1@dont-email.me> References: <v2l828$18v7f$1@dont-email.me> <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com> <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com> <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me> <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me> <v2v7ni$3d70v$1@dont-email.me> <20240526161832.000012a6@yahoo.com> <v2vka0$3f4a2$1@dont-email.me> <20240526193549.000031a8@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sun, 26 May 2024 20:01:23 +0200 (CEST) Injection-Info: dont-email.me; posting-host="8e9185d4e0820a2f5b79db78e2103a30"; logging-data="3694246"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wRvIA/Sq7Xx/6x7m4WQvE" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:PMgautufRr38wgGJDKObXpJl9AE= In-Reply-To: <20240526193549.000031a8@yahoo.com> Content-Language: en-GB Bytes: 6701 On 26/05/2024 17:35, Michael S wrote: > On Sun, 26 May 2024 16:25:51 +0100 > bart <bc@freeuk.com> wrote: > >> On 26/05/2024 14:18, Michael S wrote: >> Are you talking about a 5MB array initialised like this: >> >> unsigned char data[] = { >> 45, >> 67, >> 17, >> ... // 5M-3 more rows >> }; >> > > Yes. > >> The timing for 120M entries was challenging as it exceeded physical >> memory. However that test I can also do with C compilers. Results for >> 120 million lines of data are: >> >> DMC - Out-of-memory >> >> Tiny C - Silently stopped after 13 second (I thought it >> had finished but no) >> >> lccwin32 - Insufficient memory >> >> gcc 10.x.x - Out of memory after 80 seconds >> >> mcc - (My product) Memory failure after 27 seconds >> >> Clang - (Crashed after 5 minutes) >> >> MM 144s (Compiler for my language) >> >> So the compiler for my language did quite well, considering! >> > > That's an interesting test as well, but I don't want to run it on my HW > right now. May be, at night. > >> >> Back to the 5MB test: >> >> Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR) >> >> mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM) > > Faster than new MSVC, but slower than old MSVC. My mcc is never going to be fast, because it uses ASM, which itself will generate a text file several times larger than the C (so the line "123," in C ends up as " db 123" in the ASM file). However I've looked at a possible way of speeding this up in general, see below. >> >> DMC -- -- (Out of memory; 32-bit compiler) >> >> lccwin32 3.9s 1.3MB/sec >> >> gcc 10.x 10.6s 0.5MB/sec >> >> clang 7.4s 0.7MB/sec (to object file only) >> >> MM 1.4s 3.6MB/sec (compiler for my language) >> >> MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3) >> > > That's quite impressive. > Does it generate object files or goes directly to exe? All produce EXE files, via linkers if necessary, except Clang (its hefty LLVM installation doesn't come with standard C headers, nor a linker; it depends on MS tools, but never manages to sync with them). My MM product directly generates EXE files with no intermediate OBJ files. > Even if later, it's still impressive. So, it's more impressive if it first generates an OBJ file then invokes a linker? I'd have thought that eliminating that pointless intermediate step would be more impressive! Anyway, I thought of a way of speeding up initialisation of byte-arrays which is, instead of parsing each value into its own AST node, to directly parse successive numeric values into a special data-string object (similar to normal strings, and identical to the data-strings used for embedded data). Then there is only one AST node containing one 'string' value, instead of 5M or 120M nodes. This produced a timing, for 5M lines, of 0.34s (0.28s optimised), a throughput of 15-18MB/sec. When I applied this to the 120M line data (which is a 0.6GB source file), it finished in 6.5 seconds (5.5 optimised), or 18-21MB/sec. Previously that took 144 seconds. However I can't keep that experimental code, since if it turns out not all values are constant expressions, it has to revert to normal processing, which is tricky to do; it may already have read 1M numbers and needs to backtrack). This was just to see how fast it could be. Processing 120MB as binary rather than text is still faster; that works at up to 110MB/sec with an optimised compiler. >> As a reminder, when using my version of 'embed' in my language, >> embedding a 120MB binary file took 1.3 seconds, about 90MB/second. >> >> >>> But both are much faster than compiling through text. Even "slow" >>> 40MB/3 is 6-7 times faster than the fastest of compilers in my >>> tests. >> >> Do you have a C compiler that supports #embed? >> > > No, I just blindly believe the paper. Funny that no one else has access to an implementation! Those figures have been around for a while. > But it probably would be available in clang this year and in gcc around > start of the next year. At least I hope so. > >> It's generally understood that processing text is slow, if >> representing byte-at-a-time data. If byte arrays could be represented >> as sequences of i64 constants, it would improve matters. That could >> be done in C, but awkwardly, by aliasing a byte-array with an >> i64-array. >> > > I don't think that conversion from text to binary is a significant > bottleneck here. That's not quite what I meant. That conversion is the lexical part of processing source code, it can be very fast. It is parsing, and especially constructing a list of 5M or 120M AST nodes, each containing one expression, and the subsequent type-checking and code generation that takes the time. However your benchmark looks intriguing and I'll have a closer look later.