Deutsch English Français Italiano |
<v47siv$l29v$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Malcolm McLean <malcolm.arthur.mclean@gmail.com> Newsgroups: comp.lang.c Subject: Re: ASCII to ASCII compression. Date: Mon, 10 Jun 2024 22:52:31 +0100 Organization: A noiseless patient Spider Lines: 60 Message-ID: <v47siv$l29v$1@dont-email.me> References: <v3snu1$1io29$2@dont-email.me> <v45iak$3t1l5$1@dont-email.me> <v465h9$76f0$1@dont-email.me> <87tti03co9.fsf@bsb.me.uk> <v47n0q$jtir$1@dont-email.me> <87o78835l4.fsf@bsb.me.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 10 Jun 2024 23:52:32 +0200 (CEST) Injection-Info: dont-email.me; posting-host="a5f2fee6498babfeedcde7339d6d2227"; logging-data="690495"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19o/a88oOTazLhT43rf6EaSTx2X/ZZqxCk=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:THeC2vjsnRzNDXzrDEcw6QRGWnE= In-Reply-To: <87o78835l4.fsf@bsb.me.uk> Content-Language: en-GB Bytes: 3782 On 10/06/2024 21:28, Ben Bacarisse wrote: > Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: > >> On 10/06/2024 18:55, Ben Bacarisse wrote: >>> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes: >>> >>>> We have a fixed Huffman tree which is part of the algorithm and optmised >>>> for ASCII. And we take each line otext, and comress it to a binary string, >>>> using the Huffman table. The we code the binary string six bytes ar a time >>>> using a 64 character dubset of ASCCI. And the we append a special character >>>> which is chosen to be visually distinctive.. >>>> >>>> So the inout is >>>> >>>> Mary had a little lamb, >>>> it's fleece was white as snow, >>>> and eveywhere that Mary went, >>>> the lamb was sure to. go. >>>> >>>> And we get the output. >>>> >>>> CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf- >>> It would be more like >>> pOHcDdz8v3cz5Nl7WP2gno5krTqU6g/ZynQYlawju8rxyhMT6B30nDusHrWaE+TZf1KdKmJ9Fb6orB >>> (That's an actual example using an optimal Huffman encoding for that >>> input and the conventional base 64 encoding. I can post the code table, >>> if you like.) >>> >>>> And if it shorter or not depends on whether the fixed Huffman table is any >>>> good. >>> If I use a bigger corpus of English text to derive the Huffman codes, >>> the encoding becomes less efficient (of course) so those 110 characters >>> need more like 83 base 64 encoded bytes to represent them. Is 75% of >>> the size worth it? >>> What is the use-case where there is so much English text that a little >>> compression is worthwhile? >>> >> The FileSystem XML files. They are uncompressed, and as you can take in >> entire folders, they can be very large. > > I don't know what the XML file system is for either so explaining one by > the other doesn't help. I was hoping for a use -- a user story -- that > would help me understand what the point of all this is. > > Tell me as a story: A user wants to ... what? And having a directory of > large text files in a structured XML text helps because they can > ... what? And if they were a quarter of the size it would be better > because ... why? > The idea is that you package up a directory, and use it as an embedded resource on inside the application. So it has access to an internal filing system. That's the point of the Baby X Filesystem sub component, BabyXFS, which I am currently adding to the resource compiler repository. And that what this stage of the hobby project is all about. -- Check out my hobby project. http://malcolmmclean.github.io/babyxrc