Article <v47n0q$jtir$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v47n0q$jtir$1@dont-email.me>

Deutsch English Français Italiano

<v47n0q$jtir$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: ASCII to ASCII compression.
Date: Mon, 10 Jun 2024 21:17:30 +0100
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <v47n0q$jtir$1@dont-email.me>
References: <v3snu1$1io29$2@dont-email.me> <v45iak$3t1l5$1@dont-email.me>
 <v465h9$76f0$1@dont-email.me> <87tti03co9.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 10 Jun 2024 22:17:31 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a5f2fee6498babfeedcde7339d6d2227";
	logging-data="652891"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/3QaTCzE6IZ21T2SDBL7xf+MVTGbdk7RM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:wHQQZhD43/43+7BHdHImqR9EetQ=
Content-Language: en-GB
In-Reply-To: <87tti03co9.fsf@bsb.me.uk>
Bytes: 2798

On 10/06/2024 18:55, Ben Bacarisse wrote:
> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
> 
>> We have a fixed Huffman tree which is part of the algorithm and optmised
>> for ASCII. And we take each line otext, and comress it to a binary string,
>> using the Huffman table. The we code the binary string six bytes ar a time
>> using a 64 character dubset of ASCCI. And the we append a special character
>> which is chosen to be visually distinctive..
>>
>> So the inout is
>>
>> Mary had a little lamb,
>> it's fleece was white as snow,
>> and eveywhere that Mary went,
>> the lamb was sure to. go.
>>
>> And we get the output.
>>
>> CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf-
> 
> It would be more like
> 
> pOHcDdz8v3cz5Nl7WP2gno5krTqU6g/ZynQYlawju8rxyhMT6B30nDusHrWaE+TZf1KdKmJ9Fb6orB
> 
> (That's an actual example using an optimal Huffman encoding for that
> input and the conventional base 64 encoding.  I can post the code table,
> if you like.)
> 
>> And if it shorter or not depends on whether the fixed Huffman table is any
>> good.
> 
> If I use a bigger corpus of English text to derive the Huffman codes,
> the encoding becomes less efficient (of course) so those 110 characters
> need more like 83 base 64 encoded bytes to represent them.  Is 75% of
> the size worth it?
> 
> What is the use-case where there is so much English text that a little
> compression is worthwhile?
> 
The FileSystem XML files. They are uncompressed, and as you can take in 
entire folders, they can be very large.

But the compression is rather diappointing.

-- 
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc