Deutsch English Français Italiano |
<v3vkn1$265uv$2@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!feeds.phibee-telecom.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Malcolm McLean <malcolm.arthur.mclean@gmail.com> Newsgroups: comp.lang.c Subject: Re: ASCII to ASCII compression. Date: Fri, 7 Jun 2024 19:49:05 +0100 Organization: A noiseless patient Spider Lines: 33 Message-ID: <v3vkn1$265uv$2@dont-email.me> References: <v3snu1$1io29$2@dont-email.me> <874ja657s9.fsf@bsb.me.uk> <v3t1gf$1kia9$2@dont-email.me> <v3u5a3$1ul3c$1@dont-email.me> <v3ui89$20jte$1@dont-email.me> <v3uvq9$22s77$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 07 Jun 2024 20:49:05 +0200 (CEST) Injection-Info: dont-email.me; posting-host="3d5cf5214f2d8b45c65425247320c56d"; logging-data="2299871"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QD2IWt1m4HraBE4g0yh5pc9gv+1euXNA=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:vCfSkU6/3ZGm6DtSU5WBr/D1/Jg= In-Reply-To: <v3uvq9$22s77$1@dont-email.me> Content-Language: en-GB Bytes: 2790 On 07/06/2024 13:52, Mikko wrote: > On 2024-06-07 09:00:57 +0000, Malcolm McLean said: > >> Yes, but Huffman is easy to decode. It's the sort of project you give >> to people who have just got past the beginner stage but aren't very >> experienced programmers yet, whilst implementing Lempel-Ziv is a job >> for someone who knows what he is doing. >> >> Because the lines will often be very short, adaptive Huffman coding is >> no good. I need a fixed Huffman table with 128 entries for each 7 bit >> value plus one for "stop". I wonder if any such standard table exists. > > You don't need a standard table. You need statistics. Once you have the > statistics the table is easy to costruct with Huffman's algorithm. > No you do. The text might be very short, like "Mary had a little lamb", and you will compress it because you know that you are being fed meaningful ASCII. For example even this tiny fragment contains the letter "e", which would have a short Huffman code. And four a's and two t's, which are the third and the second most commn letters. So it should compress. And we're compressing each line independently, and choosing a visually distinctive ASCII character as the line break. So anyone seeing the compressed data will immediately be able to home in on the line breaks, and will be able to fix any corruption without special tools. And you have a standard table which never changes. And so that makes the decompressor much easier to write. -- Check out Basic Algorithms and my other books: https://www.lulu.com/spotlight/bgy1mm