Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v35kkl$pis1$1@dont-email.me>
Deutsch   English   Français   Italiano  
<v35kkl$pis1$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: xxd -i vs DIY Was: C23 thoughts and opinions
Date: Tue, 28 May 2024 23:08:22 +0100
Organization: A noiseless patient Spider
Lines: 109
Message-ID: <v35kkl$pis1$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me>
 <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com>
 <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com>
 <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com>
 <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me>
 <87y18047jk.fsf@nosuchdomain.example.com>
 <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me>
 <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me>
 <20240528144118.00002012@yahoo.com> <v34odg$kh7a$1@dont-email.me>
 <20240528185624.00002494@yahoo.com> <v359f1$nknu$1@dont-email.me>
 <20240528232315.00006a58@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 29 May 2024 00:08:22 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f50a0a3313aa9a4c5e71b452f1561318";
	logging-data="838529"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+lbf5kz8Enerr+6po4DwGb"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:dF4SdtY7+b/3shAgHA2s0pQ1aA0=
In-Reply-To: <20240528232315.00006a58@yahoo.com>
Content-Language: en-GB
Bytes: 5673

On 28/05/2024 21:23, Michael S wrote:
> On Tue, 28 May 2024 19:57:38 +0100
> bart <bc@freeuk.com> wrote:
> 

>> OK, I had go with your program. I used a random data file of exactly
>> 100M bytes.
>>
>> Runtimes varied from 4.1 to 5 seconds depending on compiler. The
>> fastest time was with gcc -O3.

> 
> It sounds like your mass storage device is much slower than aging SSD
> on my test machine and ALOT slower than SSD of David Brown.


My machine uses an SSD.

However the tests were run on Windows, so I ran your program again under 
WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).


>> I then tried a simple program in my language, which took 10 seconds.
>>
>> I looked more closely at yours, and saw you used a clever method of a
>> table of precalculated stringified numbers.
>>
>> Using a similar table, plus more direct string handling, the fastest
>> timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was
>> supposed to match your layout, but that turned out to be variable.)
>>
> 
> Yes, I try to get line length almost fixed (77 to 80 characters) and
> make no attempts to control number of entries per line.
> Since you used random generator, a density advantage of my approach is
> smaller than in more typical situations, where 2-digit numbers are more
> common than 3-digit numbers.
> 
> Also, I think that random numbers are close to worst case for branch
> predictor / loop length predictor in my inner loop.
> Were I thinking about random case upfront, I'd code an inner loop
> differently. I'd always copy 4 octets (comma would be stored in the same
> table). After that I would update outptr by length taken from
> additional table, similarly, but not identically to your method below.

The difference in file sizes for N bytes will be a factor of 2:1 maximum 
(all "1," or all "123," for example).

> There exist files that have near-random distribution, e.g. anything
> zipped or anything encrypted, but I would think that we rarely want
> them embedded.

>> This hardcodes the input filename. 'readfile' is a function in my
>> library.
>>
>> --------------------------------
>>
>> [0:256]ichar numtable
>> [0:256]int numlengths
>>
>> proc main=
>>       ref byte data
>>       [256]char str
>>       const perline=21
>>       int m, n, slen
>>       byte bb
>>       ichar s, p
>>
>>       for i in 0..255 do
>>           numtable[i] := strdup(strint(i))
>>           numlengths[i] := strlen(numtable[i])
>>       od
>>
>>       data := readfile("/c/data100")

> Reading whole file upfront is undoubtly faster than interleaving of
> reads and writes. But by my set of unwritten rules that I imposed on
> myself, it is cheating.

Why not? Isn't the whole point to have a practical tool which is faster 
than xxd?

I never use buffered file input or request file data a character at a 
time from a file system API; who knows how inefficient it might be.

I looked at your code again, and saw you're using fwrite to output each 
line. If I adapt my 'readln' (which ends up calling 'printf') to use 
fwrite too, then my timing reduces from 3.1 to 2.1 seconds.

Of that 2.1 seconds, the file-loading time is 0.03 seconds. If I switch 
to using a fgetc loop (after determining the file size; it still loads 
the whole file), the file-loading takes nearly 2 seconds. Overall the 
timing becomes only a little faster than the gcc-compiled C code.

(My compiler doesn't have an equivalent optimiser, but this is mostly 
about I/O and algorithm.)

My view is that my approach leads to a simpler program.

Maybe processing very large files that won't fit into memory might be a 
problem, eg. a 2GB binary. But remember that the output will be a text 
file of 4-8GB, which for now has to be processed by a compiler which has 
to build data structures in memory to represent that, taking even more 
space.

So they would be unviable anyway.

A proper 'embed' feature would most likely have to load the entire 
binary file into memory too.