Deutsch   English   Français   Italiano  
<20240610142930.00005c8a@yahoo.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.lang.c
Subject: Re: ASCII to ASCII compression.
Date: Mon, 10 Jun 2024 14:29:30 +0300
Organization: A noiseless patient Spider
Lines: 79
Message-ID: <20240610142930.00005c8a@yahoo.com>
References: <v3snu1$1io29$2@dont-email.me>
	<v45iak$3t1l5$1@dont-email.me>
	<v465h9$76f0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Date: Mon, 10 Jun 2024 13:29:16 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="880dcd519fee37eed7f03a0e29de7d3f";
	logging-data="434032"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18PaFp1UoZGBvIcwFZa4YNiOn0MT0IILAc="
Cancel-Lock: sha1:XvdRvsjKvS6qQgOhstqKg8rWdeM=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
Bytes: 4136

On Mon, 10 Jun 2024 07:12:57 +0100
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:

> On 10/06/2024 01:45, Lew Pitcher wrote:
> > On Thu, 06 Jun 2024 17:25:37 +0100, Malcolm McLean wrote:
> >  =20
> >> Not strictly a C programming question, but smart people will see
> >> the relavance to the topicality, which is portability.
> >>
> >> Is there a compresiion algorthim which converts human language
> >> ASCII text to compressed ASCII, preferably only "isgraph"
> >> characters?
> >>
> >> So "Mary had a little lamb, its fleece was white as snow".
> >>
> >> Would become
> >>
> >> QWE=C2=A3$543GtT=C2=A3$"||x|VVBB? =20
> >=20
> > I'm afraid that you have conflicting requirements here. In effect,
> > you want to take an array of values (each within the range of
> > 0 to 127) and
> > a) make the array shorter ("compress it"), and
> > b) express the individual elements of this shorter array with
> >     a range of 96 values ("isgraph() characters")
> >=20
> > Because you reduce the number of values each result element
> > can carry, each result element can only express a fraction
> > (96/128'ths) of the corresponding source element. Thus,
> > with the isgraph() requirement, the result will take /more/
> > elements to express the same data as the source did.
> >=20
> > However, you want /compression/, which implies that you want
> > the result to be smaller than the source. And, therein lies
> > the conflict.
> >=20
> > Can you help clarify this for me?
>  > =20
> We have a fixed Huffman tree which is part of the algorithm and
> optmised for ASCII. And we take each line otext, and comress it to a
> binary string, using the Huffman table. The we code the binary string
> six bytes ar a time using a 64 character dubset of ASCCI. And the we
> append a special character which is chosen to be visually
> distinctive..
>=20
> So the inout is
>=20
> Mary had a little lamb,
> it's fleece was white as snow,
> and eveywhere that Mary went,
> the lamb was sure to. go.
>=20
> And we get the output.
>=20
> CVbGNh=C2=A3-H$=C2=A3*MMH&-VVdsE3w2as3-vv$G^&ggf-
>=20
>=20
> And if it shorter or not depends on whether the fixed Huffman table
> is any good.
>=20

Take something that is a little bigger than a text above. It does not
have to be much bigger. One page from any book  will do ("Alice's
Adventures in Wonderland" is used most often for that purpose).
Apply your compression procedure.=20
Then run automatic test that applies all possible single bit flips,
de-compresses and count # of mismatches vs original text. The test will
report the case with maximal # of mismatches.=20
Look at most corrupted text.=20
If your fixed Huffman table is any good, you'll see that output is
corrupted rather seriously, most likely at least one sentence will be
unrecognizable.
Alternatively, if your fixed Huffman table is no good, you output will
be as big or bigger than the input.

Popular corpus of samples for compression tests:
https://corpus.canterbury.ac.nz/descriptions/
http://corpus.canterbury.ac.nz/resources/cantrbry.zip