Path: ...!uucp.uio.no!fnord.no!news1.firedrake.org!nntp.terraraq.uk!news.gegeweb.eu!gegeweb.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Rich Newsgroups: comp.os.linux.misc Subject: Re: Piping commands to a shell but keeping interactivity Date: Fri, 8 Mar 2024 03:39:56 -0000 (UTC) Organization: A noiseless patient Spider Lines: 75 Message-ID: References: Injection-Date: Fri, 8 Mar 2024 03:39:56 -0000 (UTC) Injection-Info: dont-email.me; posting-host="bf3d41fe1c02e22eb6be3e60dd9819bc"; logging-data="1586484"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QhgrObgZ+yFSRDj1LmLvE" User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64)) Cancel-Lock: sha1:hTxEgY6U5u3LTWG4YbzjwQ+xXLw= Bytes: 4004 James Harris wrote: > That's right. What this is for is code to list files in a folder > which are duplicates of those in another folder (same name, same > relative place in the folder hierarchy, etc). For example, say there > are two folders, c and d, and one is potentially a copy of the other. > The command > > $ ./lsdup.py -r c d > > lists files in d (and subdirectories due to the -r recurse option) which > are duplicates of those in c. > > In answer to your other point about using xargs, I would use it if it > would do what's required, and do so consistently, It will, provided you output file names with ASCII null terminators instead of ASCII newlines, and use the -0 option to tell xargs the filenames are null separated. > but I am not sure whether I can trust it or not. You can. Provided you feed it ASCII null terminated filenames, it will work properly, no matter what other weird characters might be in the filenames. > Here's a full example. > > $ ./lsdup.py c d -r | sed 's/^/rm -v /' > > which generates commands such as > > rm -v 'd/contentsame.txt' > rm -v 'd/hardlink.txt' > rm -v 'd/subdir/filesame.txt' Provided you modified lsdup.py to output null terminated filenames, you could do: $ ./lsdup.py c d -r | xargs -0 rm -v And the chosen files would be removed, no matter what odd characters or spaces they might contain. Note that if you want to 'inspect' before you delete, then you might want to add a "-0" option to lsdup.py to toggle between ASCII newline and ASCII null terminators. > Once the commands have been checked, if required, I pipe them into sh to > actually delete the files. xargs, with -0, into "rm" or "rm -v" will be much more efficient (i.e., faster) because xargs will call rm with the maximum number of files it can pass per invocation, avoiding calling rm once per file, and avoiding shell parsing to then fork and call rm, once per each file. > (A separate command, lsempty.py, is is used to delete the resultant > empty folders.) A separate python command is completely unnecessary. Removing empty leaf directories is already easy using the tools provided by the system.: $ find c d -type d -empty -print0 | xargs -0 rmdir If you also want to remove parent empty directories should all their children go away, change to: $ find c d -type d -empty -print0 | xargs -0 rmdir -p > In answer to the point about filenames with spaces and odd characters, I > currently output names in single quotes, as above. This will > complain until the parent is empty, but the complaints can be > ignored. Single quote is also a possible filename character, so if, by chance, you end up with a file containing one ' somewhere your wrapping in single quotes will result in a fail at that point.