Deutsch   English   Français   Italiano  
<v698an$3c5jp$2@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: hobby electronics
Date: Fri, 5 Jul 2024 09:51:26 -0700
Organization: A noiseless patient Spider
Lines: 150
Message-ID: <v698an$3c5jp$2@dont-email.me>
References: <j5a88jhm7pge920n2io4jnhs101i8ntb2g@4ax.com>
 <v635o1$24goj$1@dont-email.me> <v63k0i$271d8$1@dont-email.me>
 <v63ldd$26rbm$2@dont-email.me> <v667qj$2p9gt$4@dont-email.me>
 <v66doo$2q0be$1@dont-email.me> <v68tfj$3abt3$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 05 Jul 2024 18:51:37 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="84f4481ffa32c5eaa549edc266280f56";
	logging-data="3544697"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19OiP9BaWSVR4Jucv0Xmf7u"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:dCthbjdVN4rmtOCNOaYVkgt6gbs=
In-Reply-To: <v68tfj$3abt3$1@dont-email.me>
Content-Language: en-US
Bytes: 8224

On 7/5/2024 6:46 AM, BillGill wrote:
> I have a large paper library.  I am also getting old.

Ditto -- on both counts.  When I moved here, I had *80*
"xerox paper" cartons (the sort that hold ten 500 sheet reams)
full of just paperback novels.  The older paperbacks were tiny
things -- maybe 250pp.  So, I would read several each week.
(Having bought the title, it was silly not to KEEP it)

> I may have to go into some sort of assisted living
> when I can't go on living by myself.  When I do I will
> not be able to take my library with me.  So I have

In my case, I simply didn't have the room for all that paper.
Yet, wanted to retain access to the *content* as I am often
looking for some story that I'd read "some time ago" and one
easy way to find it is to look through MY titles (if I read
it, I still *have* it!)

> been building a digital library.  Most books that I have
> are available in digital format.  But I realized that many
> of the older books are not available.  They are mainly fiction,
> mysteries, SF, even a few romances.  And mostly from
> the time when books were mostly a one time event.  A

Agreed.  Or, are oddball titles:  _Mouthsounds_, _Ben & Jerry's
Ice Cream & Dessert Book_, _Joyce Chen Cookbook_, _The Fabulous
Furry Freak Brothers in 'The Idiots Abroad'_, _TeXniques_,
_Optimal Strategy for Pai Gow Poker_, etc.  Adopting the PDF
container means I can preserve any illustrations in the texts,
as well.

I also scan paper documents ("research papers") that are no longer
available on-line.  And, a variety of different "manuals" (I had a
few cubic feet of MULTICS manuals that now occupy zero space on my
shelf!  :> )    These tend to be larger page sizes so I need to
view them on a larger screen than my eReaders -- I will eventually
buy an oversized tablet to use for this (instead of my monitors).

Thankfully, a lot of other "reference" titles were published
in "Perfect" bindings.  As with the paperbacks, it's easy to chop
the binding edge off of the book (I have a paper cutter that will
cut up to a 1" thick stack of paper, "straight" -- the "slicing"
kind will leave you with different size pages!).  Anything too
thick for the cutter is manually cut (or "sliced" with a box cutter)
along the *inside* of the binding to produce 1" thick chunks.

Then, place the stack on the scanner and let it scan them,
sequentially (both sides) to TIFFs and package those in PDFs.
If all of the pages are similar size (true for most things
except service manuals with larger fold-outs) *and* the
same "type" (i.e., all B&W print instead of some "color
inserts"), then they can be scanned pretty quickly.  I think
the main scanner that I use does 20 or 30 double-sided pages
per minute.

(If I have to scan an 11x17 "fold out", I have to do so on a manual,
flatbed scanner -- which takes MINUTES by the time you set the ONE
page in place)

The "small" scanner claims I have scanned 94931 double-sided sheets
(i.e., ~190K images)

For the already small page size (of old paperbacks), my
eReaders can display PDFs at full size -- or larger.

> few old time authors, such as Agatha Christie, are still
> in print and available as print or digital, but many
> are not.  So I decided to digitize those books for myself.
> While most of them are in copyright, I have no idea how
> to get permission.

I think you can probably argue that they are for your own
use and, having had the originals, there is no difference
in having PHOTOGRAPHS of the original pages.

I think *distributing* same would run the risk of some legal
action.  I save the front covers as "proof" of having owned
the book (a stack of covers takes up relatively little space)

> I suspect that is why many of them are
> not in digital format.  So I have been digitizing them for
> my own use.  I will not distribute them in any way.  They
> are strictly for my own use.  If any of them show up in
> digital format I will buy that edition.

I made a systematic effort to find "original" (PDF) copies of
most of the research papers in my collection.  That's where *my*
paper copies originated -- I just failed to preserve the PDFs
in favor of print copies, "back then".

For each title found, I would discard my paper copy in favor of
the digital version -- regardless of whether it was a low resolution
scan, "true" PDF, etc.  I did this mainly to get "cleaner" copies
of the documents (not stained/dog-eared).

> So I have been doing non-destructive scanning.  This is a
> rather long process, since I am creating epub formatted books
> epub is a format based on HTML so that it can be automatically
> reformatted to fit on any screen.

Yes, but this only works well with "pure text" documents
(e.g., old "pocket" paperbacks).  Anything with illustrations,
tables, etc. tend to be poorly suited for epubs.  As my goal
is just to replace the paper, a "collection of TIFFs" achieves
that goal *quickly*.

[Depending on the material and the size of the typeface,
I scan at 600 or 1200 dpi -- so I can postprocess the TIFFs
with OCR /at a later date/, if I choose to do so]

> But that means extra
> work.  It takes anyplace for 3 days to a week, depending on
> the size and quality of the book.  First I scan it using
> my DIY scanner.  This involves taking a photo of each page,
> then converting the photos to text, using Optical Character
> Recognition (OCR) software.  After that is the slow part.

Ah, I would consider capturing the images in this manner to
be slow.  You have to manually flip pages and reposition the
book in the scanner -- ?  It's got to take 10+ (20+??) seconds
to perform that action?  So, even a 250p "pocket paperback"
would be > 1200 (2400??) seconds just to scan!  And then "collect"?

[I.e., 95K scans would have taken 950K seconds -- 16000 minutes
(~250 hours)]

> I insert the text into a word processor and proof it to
> correct all the many errors the OCR makes in the process.

The (my) scanner can do the OCR but it leaves you (me) with
these problems you've outlined.  If you forego the ability
to do searches, then having a "photo" of the page and
relying on your own brain for the OCR seems more expedient.

> How many errors depends partly on the quality of the source.
> Then it is fairly simple to convert it to the epub format,
> or into the AZW3 format that can be read by kindle.

But, you still have those books lying around?  Here, you
could donate them to the local library -- but, they will
simply be sold ($1/each) to raise funds for "other uses".
Their content will only be available to a person who stumbles
upon the title on the "for sale" rack.  (I'd rather just
donate monies and discard the "paper")

Good luck with your effort!  I can recall digitizing 35mm
slides -- a similarly slow process.  Thankfully, I didn't
have more than a few hundred to process...