Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.lang.c Subject: Re: program to remove duplicates Date: Sat, 21 Sep 2024 16:46:09 -0700 Organization: A noiseless patient Spider Lines: 48 Message-ID: References: <8630bec343aec589a6cdc42bb19dae28120ceabf@i2pn2.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 22 Sep 2024 01:46:10 +0200 (CEST) Injection-Info: dont-email.me; posting-host="0029950ff4e92ba21a7d99fa35b943c5"; logging-data="1884962"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+k91lL6udYSHYC5iyHHpsg/4eJRKgUOPs=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:kSMffkslVrCdCSqCNn1/QBsJDx0= Content-Language: en-US In-Reply-To: <8630bec343aec589a6cdc42bb19dae28120ceabf@i2pn2.org> Bytes: 2736 On 9/21/2024 3:18 PM, fir wrote: > Chris M. Thomasson wrote: >> On 9/21/2024 11:53 AM, fir wrote: >>> >>> >>> i think if to write a simple comandline program >>> that remove duplicates in a given folder >> [...] >> >> Not sure if this will help you or not... ;^o >> >> Fwiw, I have to sort and remove duplicates in this experimental locking >> system that I called the multex. Here is the C++ code I used to do it. I >> sort and then remove any duplicates, so say a threads local lock set was: >> >> 31, 59, 69, 31, 4, 1, 1, 5 >> >> would become: >> >> 1, 4, 5, 31, 59, 69 >> >> this ensures no deadlocks. As for the algorithm for removing duplicates, >> well, there are more than one. Actually, I don't know what one my C++ >> impl is using right now. >> >> https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/Ti8LFyH4CgAJ >> >> // Deadlock free baby! >> void ensure_locking_order() >> { >>    // sort and remove duplicates >> >>    std::sort(m_lock_idxs.begin(), m_lock_idxs.end()); >> >>    m_lock_idxs.erase(std::unique(m_lock_idxs.begin(), >>      m_lock_idxs.end()), m_lock_idxs.end()); >> } >> >> Using the std C++ template lib. > > im not sure what you talking about but i write on finding file > duplicates (by binary contents not by name).. it is disk thing and i > dont think mutexes are needed - you just need to read all files in > folder and compare it byte by byte to other files in folder of the same > size It's just that there are many different ways to sort and remove duplicates. That sometimes, it is required...