Deutsch English Français Italiano |
<87o78835l4.fsf@bsb.me.uk> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse <ben@bsb.me.uk> Newsgroups: comp.lang.c Subject: Re: ASCII to ASCII compression. Date: Mon, 10 Jun 2024 21:28:39 +0100 Organization: A noiseless patient Spider Lines: 50 Message-ID: <87o78835l4.fsf@bsb.me.uk> References: <v3snu1$1io29$2@dont-email.me> <v45iak$3t1l5$1@dont-email.me> <v465h9$76f0$1@dont-email.me> <87tti03co9.fsf@bsb.me.uk> <v47n0q$jtir$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Injection-Date: Mon, 10 Jun 2024 22:28:39 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b018a74a51e8de81a68590f7334ceb3f"; logging-data="660536"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19LK8juGjh4xMTnttn5Mb6d29HYJjxeqIo=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:Zw+23xfUtunHsNE5b8U3NfJoWWk= sha1:w3gieNrf388iBR7LSBqohbh7xnI= X-BSB-Auth: 1.fead64b772e977d3801d.20240610212839BST.87o78835l4.fsf@bsb.me.uk Bytes: 3230 Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > On 10/06/2024 18:55, Ben Bacarisse wrote: >> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >> >>> We have a fixed Huffman tree which is part of the algorithm and optmised >>> for ASCII. And we take each line otext, and comress it to a binary string, >>> using the Huffman table. The we code the binary string six bytes ar a time >>> using a 64 character dubset of ASCCI. And the we append a special character >>> which is chosen to be visually distinctive.. >>> >>> So the inout is >>> >>> Mary had a little lamb, >>> it's fleece was white as snow, >>> and eveywhere that Mary went, >>> the lamb was sure to. go. >>> >>> And we get the output. >>> >>> CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf- >> It would be more like >> pOHcDdz8v3cz5Nl7WP2gno5krTqU6g/ZynQYlawju8rxyhMT6B30nDusHrWaE+TZf1KdKmJ9Fb6orB >> (That's an actual example using an optimal Huffman encoding for that >> input and the conventional base 64 encoding. I can post the code table, >> if you like.) >> >>> And if it shorter or not depends on whether the fixed Huffman table is any >>> good. >> If I use a bigger corpus of English text to derive the Huffman codes, >> the encoding becomes less efficient (of course) so those 110 characters >> need more like 83 base 64 encoded bytes to represent them. Is 75% of >> the size worth it? >> What is the use-case where there is so much English text that a little >> compression is worthwhile? >> > The FileSystem XML files. They are uncompressed, and as you can take in > entire folders, they can be very large. I don't know what the XML file system is for either so explaining one by the other doesn't help. I was hoping for a use -- a user story -- that would help me understand what the point of all this is. Tell me as a story: A user wants to ... what? And having a directory of large text files in a structured XML text helps because they can .... what? And if they were a quarter of the size it would be better because ... why? -- Ben.