| Deutsch English Français Italiano |
|
<66F0246F.2010800@grunge.pl> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: fir <fir@grunge.pl> Newsgroups: comp.lang.c Subject: Re: program to remove duplicates Date: Sun, 22 Sep 2024 16:06:39 +0200 Organization: i2pn2 (i2pn.org) Message-ID: <66F0246F.2010800@grunge.pl> References: <ecb505e80df00f96c99d813c534177115f3d2b15@i2pn2.org> <vcnfbi$1ocq6$1@dont-email.me> <8630bec343aec589a6cdc42bb19dae28120ceabf@i2pn2.org> <vcnu3p$1vkui$2@dont-email.me> <66EF8293.30803@grunge.pl> <vcoh04$24ioi$1@dont-email.me> <66EFF046.8010709@grunge.pl> <vcos2o$264lk$1@dont-email.me> <66F01194.5030706@grunge.pl> <66F0120B.8090404@grunge.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: i2pn2.org; logging-data="3008460"; mail-complaints-to="usenet@i2pn2.org"; posting-account="+ydHcGjgSeBt3Wz3WTfKefUptpAWaXduqfw5xdfsuS0"; User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 To: Bart <bc@freeuk.com> X-Spam-Checker-Version: SpamAssassin 4.0.0 In-Reply-To: <66F0120B.8090404@grunge.pl> Bytes: 4170 Lines: 70 fir wrote: > fir wrote: >> Bart wrote: >>> On 22/09/2024 11:24, fir wrote: >>>> Paul wrote: >>> >>>>> The normal way to do this, is do a hash check on the >>>>> files and compare the hash. You can use MD5SUM, SHA1SUM, SHA256SUM, >>>>> as a means to compare two files. If you want to be picky about >>>>> it, stick with SHA256SUM. >>> >>> >>>> the code i posted work ok, and if someone has windows and mingw/tdm >>>> may compiel it and check the application if wants >>>> >>>> hashing is not necessary imo though probably could speed things up - >>>> im not strongly convinced that the probablility of misteke in this >>>> hashing is strictly zero (as i dont ever used this and would need to >>>> produce my own hashing probably).. probably its mathematically proven >>>> ists almost zero but as for now at least it is more interesting for me >>>> if the cde i posted is ok >>> >>> I was going to post similar ideas (doing a linear pass working out >>> checksums for each file, sorting the list by checksum and size, then >>> candidates for a byte-by-byte comparison, if you want to do that, will >>> be grouped together). >>> >>> But if you're going to reject everyone's suggestions in favour of your >>> own already working solution, then I wonder why you bothered posting. >>> >>> (I didn't post after all because I knew it would be futile.) >>> >>> >> i wanta discus nt to do enything that is mentioned .. it is hard to >> understand? so i may read on options but literally got no time to >> implement even good idead - thsi program i wrote showed to work and im >> now using it > > also note i posted whole working program and some other just say what > can be done... in working code was my main goal not quite starting in > contest of what is fastest (this is also interesting topic but not the > main goal) interesting thing is yet how it work in system ... im used to write cpu intensive application and used to controll frame times and usage of cpu.. bet generally never t vrite disk based apps here above is the first..i use sysinternals on windows ind when i run this prog the it has like 3 stages 1) read directory info (if ts big like 30k files it mat take soem time) 2) square part that read file contents and compares and sets flags of duplicates on list 3) the rename part - i mean i cal "reneme" function on duplicates the most tiem it takes the square part and indicator of disk usage is full , cpu usage is 50% it means probably one core usage is full the disk indicator in tray shows (in square phase) something like R: 1.6 GB O: 635 KB W: 198 B dont know what it is, R is for read for sure and W is for write but what it is exactly? there is also a question if closing or killing program in those phases may generate some disc dameges? - as ror most time in square phase it takes reads i quite sure that closing in read phase may not incur anny errors - but im not sure as to renaming phase