Path: ...!weretis.net!feeder6.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: fir Newsgroups: comp.lang.c Subject: Re: simple compression mathod to code itself? Date: Sat, 06 Apr 2024 21:57:13 +0200 Organization: i2pn2 (i2pn.org) Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 6 Apr 2024 19:57:06 -0000 (UTC) Injection-Info: i2pn2.org; logging-data="315690"; mail-complaints-to="usenet@i2pn2.org"; posting-account="+ydHcGjgSeBt3Wz3WTfKefUptpAWaXduqfw5xdfsuS0"; User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 In-Reply-To: X-Spam-Checker-Version: SpamAssassin 4.0.0 Bytes: 5109 Lines: 89 David LaRue wrote: > fir wrote in news:uurd34$8ga0$1@i2pn2.org: > >> David LaRue wrote: >>> fir wrote in news:uupflr$5t1n$1@i2pn2.org: >>> >>>> i not code at all in recent years >>>> (recently i coded half of my compuler with a plan to write second half >>>> but its to ambitious to lazy coding mood i got recently) >>>> but recently i started this slamm coding modd and found >>>> ite pleasurable >>>> >>>> searching for lazy side things i could code in such mood >>>> i thought maybe i wopuld like to have compression routine >>>> but i would like to write it myself sortla like i use quicksort >>>> or so but i also want to write it myself >>>> >>>> so is there maybe some method i could use i mean some simple >>>> routine that compresses and uncompresses an array of bytes but >>>> maybe something a bit more complex than RLE (rune length encoding) >>>> - only thing i know from this domain >>>> >>> >>> Hello fir, >>> >>> A method a bit more complex that you might try builds a table of all >>> bytes as you scan them from the input. The compressed output is a >>> reference to the table you just built (wnhem repeated byte strings are >>> found, otherwise feed the input to the compressed image so that the >>> expansion method can build the sme dynamic table the encoder built. >>> The table generally has a limit on the number of entries (usually a >>> good size) and allows the table of bytes to dynamically change as new >>> patterns are read from the input. >>> >>> This is a well known and documented compression/expansion algorithm. >>> PKZIP and other engines use this as one of their compression methodz. >>> Look for the description of that if you need more details to figure out >>> what you need to write. >>> >>> Expansion is the reverse. Read the source (now the compressed image) >>> and build the compression table from the bytes. As encoded references >>> to the compression table are read from the compressed image output the >>> source byte sequences. The output should be the same as what your >>> encoder originally read. >>> >>> A good check on the final code is to compare the original input with >>> the eventual output and make sure they agree exactly. >>> >>> Have fun, >>> >>> David >>> >> this could be good but i dont quite understood that .. but eventually >> could be good... >> >> i thinged something abut that if rle search for repetitions of 1 >> byte then maybe after that search for repetitions of 2 bytes, then 3 >> bytes, 4 bytes and so on.. then do some "report" how many found and then >> find a way to encode that >> >> need to think a bit becouse if rle only stores repetitions that are >> one after another then this method should store repetitions that have >> various distances among them >> >> >> i also do nto want to spend a much time on this 1-2 days eventually >> > > Hello fir, > > The method above can do the same thing for each repetition of a single > byte. Alone it has the advantage that compression and decompression are > single pass operations. It is similar to RLE but allows for more > variations to be used along the way. The encoder also needs some table > searching to determine best matches, new byte sequences, and improved > length sequences. An ordered table of byte strings is typically used. > I've also seen more advanced search methods that use branch tables to > represent the encoded sequences with the final entry being the value to put > in the compressed output. Very easy to search and gives some improvements > to the basic search design to make performance better. This can be done in > a few days or less, depending on how much time you spend on deciding the > table size and so on. I've also seen the search part as a coding > requirement for a one hour test to see how the coder breaks the idea down. > > I like all the ideas you have. > > David > okay though i will answer to it alter as i decided to do it but not today yet