Article <6113015d-b7a3-4ba7-aa6f-b1475d594a3bn@googlegroups.com>

Deutsch English Français Italiano
<6113015d-b7a3-4ba7-aa6f-b1475d594a3bn@googlegroups.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: db <dieterhansbritz@gmail.com>
Newsgroups: comp.text.pdf
Subject: Re: pdf grep?
Date: Wed, 3 Apr 2024 15:19:24 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 23
Message-ID: <uujs1s$7u0$3@dont-email.me>
References: <uujj10$3tv68$2@dont-email.me>
	<XB6cnYfPsZMk_JD7nZ2dnZfqnPGdnZ2d@giganews.com>
	<grep-20240403151634@ram.dialup.fu-berlin.de>
	<search-20240403152924@ram.dialup.fu-berlin.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 03 Apr 2024 15:19:24 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3fc59fc5b164e30958ab4c2ac5ec4c56";
	logging-data="8128"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/ruhJn43xRmQgQeqvwqBUNjdbeWethttw="
User-Agent: Pan/0.149 (Bellevue; 4c157ba)
Cancel-Lock: sha1:g5dkZRgK81gQJJzE7717xw5oqLg=
Bytes: 2128

On 3 Apr 2024 14:29:40 GMT, Stefan Ram wrote:

> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>>Text in PDFs is sometimes compressed. So one can either use programs
>>like "Agent Ransack" to search for text in PDFs or tools like
>>"pdftotext" to first create a text file for every PDF file and then grep
>>those text files.
> 
>   PS: "Agent Ransack" is Windows software. "pdftotext" is also available
>   for Linux. Converting all PDFs to text files needs to be done only
>   once, and then search operations on those text files are faster than
>   scanning the PDF files for text on every search!

I should maybe have elaborated a bit. Sometimes I
remember a certain phrase or word but forget which
pdf it is in. With text files I can do
grep blabla *.txt
and I wanted an equivalent. Using pdftotext would
mean using it for every suspect pdf. Since a lot of
pdf files are searchable, I figured that such a
command might exist.
But if there really is a pdfgrep command, that might
do the job. I will do some googling, thanks.