Deutsch English Français Italiano |
<v465h9$76f0$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Malcolm McLean <malcolm.arthur.mclean@gmail.com> Newsgroups: comp.lang.c Subject: Re: ASCII to ASCII compression. Date: Mon, 10 Jun 2024 07:12:57 +0100 Organization: A noiseless patient Spider Lines: 59 Message-ID: <v465h9$76f0$1@dont-email.me> References: <v3snu1$1io29$2@dont-email.me> <v45iak$3t1l5$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 10 Jun 2024 08:12:57 +0200 (CEST) Injection-Info: dont-email.me; posting-host="6dd644d904799cc70f95f2de10652995"; logging-data="236000"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9MaAHHU5YV13v5LMi+Thw7nvyoij8yOQ=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:iqB5OUYbI1SMnBjM1D5rFc4tQFE= In-Reply-To: <v45iak$3t1l5$1@dont-email.me> Content-Language: en-GB Bytes: 3090 On 10/06/2024 01:45, Lew Pitcher wrote: > On Thu, 06 Jun 2024 17:25:37 +0100, Malcolm McLean wrote: > >> Not strictly a C programming question, but smart people will see the >> relavance to the topicality, which is portability. >> >> Is there a compresiion algorthim which converts human language ASCII >> text to compressed ASCII, preferably only "isgraph" characters? >> >> So "Mary had a little lamb, its fleece was white as snow". >> >> Would become >> >> QWE£$543GtT£$"||x|VVBB? > > I'm afraid that you have conflicting requirements here. In effect, > you want to take an array of values (each within the range of > 0 to 127) and > a) make the array shorter ("compress it"), and > b) express the individual elements of this shorter array with > a range of 96 values ("isgraph() characters") > > Because you reduce the number of values each result element > can carry, each result element can only express a fraction > (96/128'ths) of the corresponding source element. Thus, > with the isgraph() requirement, the result will take /more/ > elements to express the same data as the source did. > > However, you want /compression/, which implies that you want > the result to be smaller than the source. And, therein lies > the conflict. > > Can you help clarify this for me? > We have a fixed Huffman tree which is part of the algorithm and optmised for ASCII. And we take each line otext, and comress it to a binary string, using the Huffman table. The we code the binary string six bytes ar a time using a 64 character dubset of ASCCI. And the we append a special character which is chosen to be visually distinctive.. So the inout is Mary had a little lamb, it's fleece was white as snow, and eveywhere that Mary went, the lamb was sure to. go. And we get the output. CVbGNh£-H$£*MMH&-VVdsE3w2as3-vv$G^&ggf- And if it shorter or not depends on whether the fixed Huffman table is any good. -- Check out my hobby project. http://malcolmmclean.github.io/babyxrc