Deutsch   English   Français   Italiano  
<17c3306f87b52b54$124402$1768716$4ad50060@news.newsdemon.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Rich <rich@example.invalid>
Newsgroups: comp.os.linux.misc
Subject: Re: Piping commands to a shell but keeping interactivity
Date: Sat, 9 Mar 2024 22:02:28 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 164
Message-ID: <usim9j$2hqge$1@dont-email.me>
References: <urhjc6$2enkl$1@dont-email.me> <l44pliF6eejU2@mid.individual.net> <urjj7j$3038r$1@dont-email.me> <wwvzfvmi9bp.fsf@LkoBDZeT.terraraq.uk> <l47faiFkqi7U1@mid.individual.net> <uscpnt$14tr1$1@dont-email.me> <use1ac$1gd9k$1@dont-email.me> <usiidu$2gve6$1@dont-email.me>
Injection-Date: Sat, 9 Mar 2024 22:02:28 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1d95848d99d0a249e466461ccd04331e";
	logging-data="2681358"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/EcQtSL2JBFowNnQG4mTDy"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:eeDVTpXIl+5ypOtC3T2Hf0tYCLE=
Bytes: 7279

James Harris <james.harris.1@gmail.com> wrote:
> On 08/03/2024 03:39, Rich wrote:
>> James Harris <james.harris.1@gmail.com> wrote:
>>> That's right.  What this is for is code to list files in a folder
>>> which are duplicates of those in another folder (same name, same
>>> relative place in the folder hierarchy, etc).  For example, say there
>>> are two folders, c and d, and one is potentially a copy of the other.
>>> The command
>>>
>>> $ ./lsdup.py -r c d
>>>
>>> lists files in d (and subdirectories due to the -r recurse option) which
>>> are duplicates of those in c.
>>>
>>> In answer to your other point about using xargs, I would use it if it
>>> would do what's required, and do so consistently,
>> 
>> It will, provided you output file names with ASCII null terminators
>> instead of ASCII newlines, and use the -0 option to tell xargs the
>> filenames are null separated.
>> 
>>> but I am not sure whether I can trust it or not.
>> 
>> You can.  Provided you feed it ASCII null terminated filenames, it will
>> work properly, no matter what other weird characters might be in the
>> filenames.
> 
> You are right: unusual characters in file names is the kind of issue I 
> am wary about with xargs.

And this is exactly why xargs was modified to consume null terminated 
names, so that no otherwise legal filename character would create a 
problem.  And ASCII newline is also a legal filename character.

> Running a command to do automatic deletion requires a lot of trust in 
> the command's operation.

Agreed.

> I /have/ been thinking about adding a -0 option but it has issues such as:
> 
> (1) With -0 one cannot easily postprocess the list of file names, if 
> required.

That depends upon what you mean by 'postprocessing'.

GNU sort has -z, --zero-terminated options to inform it to process 
"lines" with ASCII null terminators.

GNU grep has -z, --null-data options to also inform it to process 
"lines" that are ASCII null terminated.

If you want to "preview" in something like less, you can add 
"tr '\0' '\n'" to the pipeline before less to conver the nulls into 
newlines so that any filename other than one with an actual newline in 
it would be one line in less.

> (2) With xargs interactivity (e.g. the -i in rm -i) is lost.

This is true, so if you value using -i to acknowledge each file 
individually then yes, using xargs to launch rm is not in the cards.  

Although a small workaround would be to use xargs (GNU xargs at least) 
option -p, --interactive combined with -n 1 to run one rm per filename 
and have xargs prompt (instead of rm) before each.  This loses the 
efficiency gains of one rm and a maximum number of filenames, but it 
does allow 'interactivity' again.

> Yes, there are ways round such issues but they may require altering a 
> command line /after/ it has been found to be correct. That was the issue 
> which led to this thread. If I I generate a list of delete commands with 
> a series of commands such as
> 
>   A | B
> 
> then once I am happy with them I would prefer simply to append | sh as in
> 
>   A | B | sh
> 
> rather than changing the form to
> 
>   sh <(A | B)

With xargs and nulls, the change becomes:

A | B | tr '\0' '\n' | less 
to
A | B | xargs -0 rm 
or
A | B | xargs -0 -n 1 -p rm (for interactive deletes)

> This is not so much about convenience as about making sure less can go 
> wrong - always a good idea when deleting files by means of a command.

Yes, if you have no backup with which to recover then agreed, verifying 
the deletion before deleting is important.

>>> (A separate command, lsempty.py, is is used to delete the resultant
>>> empty folders.)
>> 
>> A separate python command is completely unnecessary.  Removing empty 
>> leaf directories is already easy using the tools provided by the 
>> system.:
>> 
>> $ find c d -type d -empty -print0 | xargs -0 rmdir
>> 
>> If you also want to remove parent empty directories should all their
>> children go away, change to:
>> 
>> $ find c d -type d -empty -print0 | xargs -0 rmdir -p
> 
> Thanks. That looks as though it would work, albeit that it would print a 
> few spurious error messages which add to the work of the person running 
> the code. For example,

If you dislike the spurious erros, then add the --ignore-fail-on-non-empty 
option, which suppresses the errors for 'not empty' as rmdir walks back 
up the tree.

>>> In answer to the point about filenames with spaces and odd 
>>> characters, I currently output names in single quotes, as above.  
>>> This will complain until the parent is empty, but the complaints 
>>> can be ignored.
>> 
>> Single quote is also a possible filename character, so if, by 
>> chance, you end up with a file containing one ' somewhere your 
>> wrapping in single quotes will result in a fail at that point.
> 
> I deal with that (at present) by juxtaposing single-quoted strings.  
> For example, if there is a file called
> 
>   won't scan.txt
> 
> with an apostrophe and a space then I get the following results. First, 
> the command:
> 
> $ ./lsdup.py c d -r | grep won | sed 's/^/rm -v /'
> rm -v 'd/won'\''t scan.txt'

Ok, you had not yet revealed you were doing so, and we can't read your 
mind remotely over Usenet to know you are handling single quotes 
properly (at least for one single quote in a filename).

> That said, I am not sure that this will work in all cases and expect 

Provided you replace every input instance of ' with '\'' it would work 
(at least for ').

The minefield aspect into which you are traversing is that each shell 
has a different set of meta-characters that are 'special' to it and so 
the proper "escaping" is dependent upon /which/ shell the output is 
being piped into.

And by piping into the shell, if your escaping is not 100% perfect, you 
open yourself up to someone crafting a filename that results in your 
shell doing one of:

   "rm -fr /" 

or 

   "rm -fr ~"

depending upon whether you run the deletion as root or a non-root user.