Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <87o78835l4.fsf@bsb.me.uk>
Deutsch   English   Français   Italiano  
<87o78835l4.fsf@bsb.me.uk>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ben Bacarisse <ben@bsb.me.uk>
Newsgroups: comp.lang.c
Subject: Re: ASCII to ASCII compression.
Date: Mon, 10 Jun 2024 21:28:39 +0100
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <87o78835l4.fsf@bsb.me.uk>
References: <v3snu1$1io29$2@dont-email.me> <v45iak$3t1l5$1@dont-email.me>
	<v465h9$76f0$1@dont-email.me> <87tti03co9.fsf@bsb.me.uk>
	<v47n0q$jtir$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 10 Jun 2024 22:28:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b018a74a51e8de81a68590f7334ceb3f";
	logging-data="660536"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19LK8juGjh4xMTnttn5Mb6d29HYJjxeqIo="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:Zw+23xfUtunHsNE5b8U3NfJoWWk=
	sha1:w3gieNrf388iBR7LSBqohbh7xnI=
X-BSB-Auth: 1.fead64b772e977d3801d.20240610212839BST.87o78835l4.fsf@bsb.me.uk
Bytes: 3230

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On 10/06/2024 18:55, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>> 
>>> We have a fixed Huffman tree which is part of the algorithm and optmised
>>> for ASCII. And we take each line otext, and comress it to a binary string,
>>> using the Huffman table. The we code the binary string six bytes ar a time
>>> using a 64 character dubset of ASCCI. And the we append a special character
>>> which is chosen to be visually distinctive..
>>>
>>> So the inout is
>>>
>>> Mary had a little lamb,
>>> it's fleece was white as snow,
>>> and eveywhere that Mary went,
>>> the lamb was sure to. go.
>>>
>>> And we get the output.
>>>
>>> CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf-
>> It would be more like
>> pOHcDdz8v3cz5Nl7WP2gno5krTqU6g/ZynQYlawju8rxyhMT6B30nDusHrWaE+TZf1KdKmJ9Fb6orB
>> (That's an actual example using an optimal Huffman encoding for that
>> input and the conventional base 64 encoding.  I can post the code table,
>> if you like.)
>> 
>>> And if it shorter or not depends on whether the fixed Huffman table is any
>>> good.
>> If I use a bigger corpus of English text to derive the Huffman codes,
>> the encoding becomes less efficient (of course) so those 110 characters
>> need more like 83 base 64 encoded bytes to represent them.  Is 75% of
>> the size worth it?
>> What is the use-case where there is so much English text that a little
>> compression is worthwhile?
>> 
> The FileSystem XML files. They are uncompressed, and as you can take in
> entire folders, they can be very large.

I don't know what the XML file system is for either so explaining one by
the other doesn't help.  I was hoping for a use -- a user story -- that
would help me understand what the point of all this is.

Tell me as a story: A user wants to ... what?  And having a directory of
large text files in a structured XML text helps because they can
.... what?  And if they were a quarter of the size it would be better
because ... why?

-- 
Ben.