Article <v34thr$letg$1@dont-email.me>

Deutsch English Français Italiano
<v34thr$letg$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: xxd -i vs DIY Was: C23 thoughts and opinions
Date: Tue, 28 May 2024 17:34:19 +0200
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <v34thr$letg$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
 <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
 <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
 <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
 <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
 <87y18047jk.fsf@nosuchdomain.example.com>
 <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
 <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
 <20240528144118.00002012@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 May 2024 17:34:19 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d21593105ce034ab606a049d42e9f782";
	logging-data="703408"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18dQya/UFeXBZZWZ0fnTehzCY2AbPNvhjQ="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:GIAYNLFfuGXXaoiKizgiAZdfBsU=
Content-Language: en-GB
In-Reply-To: <20240528144118.00002012@yahoo.com>
Bytes: 4074

On 28/05/2024 13:41, Michael S wrote:
> On Sun, 26 May 2024 13:09:36 +0200
> David Brown <david.brown@hesbynett.no> wrote:
> 
>>
>> No, it does /not/.  That's the /whole/ point of #embed, and the main
>> motivation for its existence.  People have always managed to embed
>> binary source files into their binary output files - using linker
>> tricks, or using xxd or other tools (common or specialised) to turn
>> binary files into initialisers for constant arrays (or structs).
>> I've done so myself on many projects, all integrated together in
>> makefiles.
>>
> 
> Let's start another round of private parts' measurements turnament!
> 'xxd -i' vs DIY
> 

I used 100 MB of random data:

dd if=/dev/urandom bs=1M count=100 of=100MB

I compiled your code with "gcc-11 -O2 -march=native".

I ran everything in a tmpfs filesystem, completely in ram.


xxd took 5.4 seconds - that's the baseline.

Your simple C code took 4.35 seconds.  Your second program took 0.9 
seconds - a big improvement.

One line of Python code took 8 seconds :

print(", ".join([hex(b) for b in open("100MB", "rb").read()]))


A slightly nicer Python program took 14.3 seconds :

import sys
bs = open(sys.argv[1], "rb").read()
xs = "".join([" 0x%02x," % b for b in bs])
ln = len(xs)
print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))


Like "xxd -i", that one split the output into lines of 12 bytes.  Some 
compilers might not like a single 300-600 MB line !


I didn't try compiling a test file from the 100 MB source data, but gcc 
took about 16 seconds for an include file generated from 20 MB of random 
data.  It didn't make a significant difference if the data was in 
decimal or hex, one line or multiple lines.

But since compilation took about ten times as long as the single line of 
Python code, my conclusion is that the speed of generating the include 
file is pretty much irrelevant.  Compared to the one-line Python code 
and considering the generation and compilation combined, using xxd saves 
5% of the time and your best code saves 9% - out of possible 10% cost 
saving.

Thus if you want to save build time when including large arrays of data 
in the generated executable, time spent on beating xxd is wasted - 
implementing optimised #embed is the only way to make an impact.


(I have had reason to include a 0.5 MB file in a statically linked 
single binary - I'm not sure when you'd need very fast handling of 
multi-megabyte embeds.)