Article <v3vkn1$265uv$2@dont-email.me>

Deutsch English Français Italiano
<v3vkn1$265uv$2@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!feeds.phibee-telecom.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: ASCII to ASCII compression.
Date: Fri, 7 Jun 2024 19:49:05 +0100
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <v3vkn1$265uv$2@dont-email.me>
References: <v3snu1$1io29$2@dont-email.me> <874ja657s9.fsf@bsb.me.uk>
 <v3t1gf$1kia9$2@dont-email.me> <v3u5a3$1ul3c$1@dont-email.me>
 <v3ui89$20jte$1@dont-email.me> <v3uvq9$22s77$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 07 Jun 2024 20:49:05 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3d5cf5214f2d8b45c65425247320c56d";
	logging-data="2299871"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/QD2IWt1m4HraBE4g0yh5pc9gv+1euXNA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:vCfSkU6/3ZGm6DtSU5WBr/D1/Jg=
In-Reply-To: <v3uvq9$22s77$1@dont-email.me>
Content-Language: en-GB
Bytes: 2790

On 07/06/2024 13:52, Mikko wrote:
> On 2024-06-07 09:00:57 +0000, Malcolm McLean said:
> 
>> Yes, but Huffman is easy to decode. It's the sort of project you give 
>> to people who have just got past the beginner stage but aren't very 
>> experienced programmers yet, whilst implementing Lempel-Ziv is a job 
>> for someone who knows what he is doing.
>>
>> Because the lines will often be very short, adaptive Huffman coding is 
>> no good. I need a fixed Huffman table with 128 entries for each 7 bit 
>> value plus one for "stop". I wonder if any such standard table exists.
> 
> You don't need a standard table. You need statistics. Once you have the
> statistics the table is easy to costruct with Huffman's algorithm.
> 
No you do. The text might be very short, like "Mary had a little lamb", 
and you will compress it because you know that you are being fed 
meaningful ASCII. For example even this tiny fragment contains the 
letter "e", which would have a short Huffman code. And four a's and two 
t's, which are the third and the second most commn letters. So it should 
compress.

And we're compressing each line independently, and choosing a visually 
distinctive ASCII character as the line break. So anyone seeing the 
compressed data will immediately be able to home in on the line breaks, 
and will be able to fix any corruption without special tools.

And you have a standard table which never changes. And so that makes the 
decompressor much easier to write.
-- 
Check out Basic Algorithms and my other books:
https://www.lulu.com/spotlight/bgy1mm