Deutsch English Français Italiano |
<v447il$3g7v7$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Malcolm McLean <malcolm.arthur.mclean@gmail.com> Newsgroups: comp.lang.c Subject: Re: ASCII to ASCII compression. Date: Sun, 9 Jun 2024 13:35:33 +0100 Organization: A noiseless patient Spider Lines: 43 Message-ID: <v447il$3g7v7$1@dont-email.me> References: <v3snu1$1io29$2@dont-email.me> <v3u3c4$1ubqm$1@dont-email.me> <v3uidi$20jte$2@dont-email.me> <20240609114413.00003e57@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 09 Jun 2024 14:35:33 +0200 (CEST) Injection-Info: dont-email.me; posting-host="402fd69fd9e3d4f943b81f027737026a"; logging-data="3678183"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ZlK1V9mGNvb9rYlMaiSN8u+C5SSoyvQw=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:wjSH5uESzfdu1H+s17At7Zt21+k= Content-Language: en-GB In-Reply-To: <20240609114413.00003e57@yahoo.com> Bytes: 2993 On 09/06/2024 09:44, Michael S wrote: > On Fri, 7 Jun 2024 10:03:46 +0100 > Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote: > >> On 07/06/2024 05:47, Mikko wrote: >>> On 2024-06-06 16:25:37 +0000, Malcolm McLean said: >>> >>>> Not strictly a C programming question, but smart people will see >>>> the relavance to the topicality, which is portability. >>>> >>>> Is there a compresiion algorthim which converts human language >>>> ASCII text to compressed ASCII, preferably only "isgraph" >>>> characters? >>>> >>>> So "Mary had a little lamb, its fleece was white as snow". >>>> >>>> Would become >>>> >>>> QWE£$543GtT£$"||x|VVBB? >>> >>> There are compression algorithms that can be adapted to any possible >>> size of input and output character sets, including that both are >>> ASCII and that the output character set is a subset of the input >>> set. >>> >>> Restricting the input set to ASCII may be too strong. Files that >>> should be ASCII files sometimes contain non-ascii bytes. The output >>> should be restricted to the 94 visible characters but the >>> decompressor should accept at least full ASCII and skip the invalid >>> characters as insignificant. >>> That permits addition of line brakes and perhaps other spaces that >>> could be useful for example when the file is printed for debugging. >>> >> That's exactly the idea. The system is robust to white space. You can >> add spaces to your heart's content, and they arec just skipped. > > Robustness to white spaces necessarily weakens robustness to bit flips. > Not that your set of requirements made much sense to start with... > No, because you can usually detect the bit flip with a text editor. -- Check out my hobby project, the Baby X Resource compiler http://malcolmmclean.github.io/babyxrc