Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <87tti03co9.fsf@bsb.me.uk>
Deutsch   English   Français   Italiano  
<87tti03co9.fsf@bsb.me.uk>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ben Bacarisse <ben@bsb.me.uk>
Newsgroups: comp.lang.c
Subject: Re: ASCII to ASCII compression.
Date: Mon, 10 Jun 2024 18:55:34 +0100
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <87tti03co9.fsf@bsb.me.uk>
References: <v3snu1$1io29$2@dont-email.me> <v45iak$3t1l5$1@dont-email.me>
	<v465h9$76f0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 10 Jun 2024 19:55:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b018a74a51e8de81a68590f7334ceb3f";
	logging-data="608526"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18dNiT0kUKHiQ+xH05VYJOMy6NYJSXBBmg="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:Pqgarly6K5zzNEBWi5Bmo+hmDRU=
	sha1:iV5wpPjEAWUP3Y+1ryN/8YCt1Nw=
X-BSB-Auth: 1.fe72d74e9ef02018b5a6.20240610185534BST.87tti03co9.fsf@bsb.me.uk
Bytes: 2459

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> We have a fixed Huffman tree which is part of the algorithm and optmised
> for ASCII. And we take each line otext, and comress it to a binary string,
> using the Huffman table. The we code the binary string six bytes ar a time
> using a 64 character dubset of ASCCI. And the we append a special character
> which is chosen to be visually distinctive..
>
> So the inout is
>
> Mary had a little lamb,
> it's fleece was white as snow,
> and eveywhere that Mary went,
> the lamb was sure to. go.
>
> And we get the output.
>
> CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf-

It would be more like

pOHcDdz8v3cz5Nl7WP2gno5krTqU6g/ZynQYlawju8rxyhMT6B30nDusHrWaE+TZf1KdKmJ9Fb6orB

(That's an actual example using an optimal Huffman encoding for that
input and the conventional base 64 encoding.  I can post the code table,
if you like.)

> And if it shorter or not depends on whether the fixed Huffman table is any
> good.

If I use a bigger corpus of English text to derive the Huffman codes,
the encoding becomes less efficient (of course) so those 110 characters
need more like 83 base 64 encoded bytes to represent them.  Is 75% of
the size worth it?

What is the use-case where there is so much English text that a little
compression is worthwhile?

-- 
Ben.