Article <v6h0oa$u3fa$1@dont-email.me>

Deutsch English Français Italiano
<v6h0oa$u3fa$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: hobby electronics
Date: Mon, 8 Jul 2024 08:31:19 -0700
Organization: A noiseless patient Spider
Lines: 193
Message-ID: <v6h0oa$u3fa$1@dont-email.me>
References: <j5a88jhm7pge920n2io4jnhs101i8ntb2g@4ax.com>
 <v635o1$24goj$1@dont-email.me> <v63k0i$271d8$1@dont-email.me>
 <v63ldd$26rbm$2@dont-email.me> <v667qj$2p9gt$4@dont-email.me>
 <v66doo$2q0be$1@dont-email.me> <v68tfj$3abt3$1@dont-email.me>
 <v698an$3c5jp$2@dont-email.me> <v6bge1$3qrun$1@dont-email.me>
 <v6busc$3t51f$1@dont-email.me> <v6e5ut$bs5n$1@dont-email.me>
 <v6ed9e$cpga$1@dont-email.me> <v6gqh7$t2eb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 08 Jul 2024 17:31:24 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="1204b6b780de24b9cb690be0c57d3575";
	logging-data="986602"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/ACibLbFq9s+HAKuJVHh4d"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:GQ12rnyzJ1mVh0krswNvDS6Pjrc=
In-Reply-To: <v6gqh7$t2eb$1@dont-email.me>
Content-Language: en-US
Bytes: 10509

On 7/8/2024 6:45 AM, BillGill wrote:

>>> As you can see I have a mirror in the base of the scanner
>>> that I used to verify that the page is correctly placed.  It
>>> doesn't zoom in to fit the page, it just overscans.
>>
>> If the USB i/f worked, you wouldn't need the mirror (?)

> There is some software that works with some cameras that will
> send the images directly to the computer.

But, would it show "viewfinder" images or just *snapped* images?
I.e., could you use the images delivered "in real time" at
the computer to PREVIEW the photos that will be snapped?  Or,
does it only transfer the images after they have been taken?
(which would be more tedious to try to use to align the book)

> When I was KISSing
> the scanner I went with the simplest approach.  This way it
> will work with any camera, you just have to build it so that
> the camera is at the correct distance from the platen.

Makes sense.  And, as you are shooting THROUGH the glass,
this distance is constant (whereas those builds that shot
the "open book" have to accommodate the distance changing
as the number of pages increases the "thickness" as they
are sequentially "flipped"

>> Your approach seems more like the Reading Machine used
>> (in "paper handling") -- though it used a moving camera-illuminator
>> to scan the actual page (which meant the book had to remain in place for
>> a considerable length of time):
>> <https://life.ieee.org/wp-content/uploads/Harvey-Lauer-with-Kurzweil-Reading-Machine-1200x819.png>
>
> Well, I see that the way the book is set on the scanner is
> similar, but of course my scanner takes the whole page at
> once.

The point was that the approach makes it easier to access deeper
into the gutter.  I would imagine the builds that have the book face
up, opened to a pair of pages have to contend with pages infringing
on the gutter as the position *in* the book changes (and more paper
piles up on one side or the other)

> The goal is, of course, different.  The reading machine
> is turning the printed text into sound, which is a whole
> different thing from turning it into text.

Actually, The Reading Machine was one of the first commercial
"omnifont" OCR machines.  The scanner (a linear CCD) assembles
the image of the characters and then the characters are
recognized, fed to the text-to-speech system and converted
to sound (by the speech synthesizer).

An early attempt to commercialize the OCR capabilities was
The Data Entry System.  There, the text-to-speech module and
speech synthesizer were elided from the system with the
recognized text as the primary output.  A graphic display
allowed a *sighted* operator (The Reading Machine was targeted
to the visually impaired) to verify and correct the OCR
as the pages were being scanned.

Note this is mid/late 70's so these sorts of capabilities
didn't exist.  How would you get a *digital* copy of a
telephone book (name, address, phone)?  Or, copies of
published newspapers (AS they were being published)?

>>> I don't do much manipulation of the images before I OCR them.
>>> I use Abby Finereader 14 which does a pretty good job of
>>> picking out the text.  I stick with 14 because it works good
>>> and newer versions are only available as subscriptions.
>>
>> All OCR tools "have problems".  My scanner will do OCR but then I
>> lose the original images (so how do I sort out what the OCR *should* have
>> been once the original is gone?).  I've also had some luck with
>> Omnipage.

> Since I have the original scans in the computer, rather than running
> them straight through the OCR and losing the originals that is not
> a problem.  And of course I still have the original books so that I
> can proof the text with confidence.

Yes, but this means KEEPING extra "stuff" -- the original books,
the original scans, AND the output of your process.

My approach keeps the scans *as* the output -- so the books are
redundant.  And, being TIFFs, they are lossless so I can do
(or RE-do) the OCR at any time -- including as I am reading them.

>>> Understand that I am making ebooks that I can carry around on
>>> different devises, not PDFs that can also be viewed on different
>>> devices, but don't necessarily have all the text correct.
>>
>> The PDF doesn't have to get the text correct; it can store the
>> image of the page (and let your eyes/brain do the OCR).
>>
>> I can store the OCRed text "behind" the image so that you can select
>> the text with your cursor (in a PC application).  But, again, you
>> are stuck relying on the quality of the OCR algorithm.
 >
> Does the PDF reflow the text so that the text size is the same
> size on all devices, including a phone?  The EPUB format does
> that. It also resizes any illustrations so that they will fit.

No.  The PDF is a (lossless) photo of the page.  For "pocket books"
(i.e., the paperbacks of the 60's), my ereader screens are large
enough that it is as if I was holding the original book in my hand
(but only seeing recto or verso page-at-a-time).

If I want to read a technical paper typically typeset in 8.5x11
format, I have to use a larger display -- or, flip the ereader to
landscape mode (so the display is 8.5" wide) and scroll through
the image.  This is tedious for multicolumn layouts.  But, I
could also view them "full size" on a PC's display (24" diagonal
displays are about 11 inches tall) or on a small (~14") laptop,
"sideways".

Eventually, I will buy a larger tablet and install all of these
documents in its internal memory; so, the tablet will be my "library".

Even larger pages (e.g., B-size foldout scehmatics) require even larger
displays.  But, you also would likely want the ability to easily
zoom and pan the display to examine the finer details in such documents.

> Illustrations of course have to be handled seperately.  I run
> any page with illustrations through a graphics programs, such as
> GIMP to do any cleanup, such as cropping the image to provide
> only the illustration.  Then I reinsert the illustration into
> the text file at the appropriate location.

Yes, I have to do this with "foldout" pages that exceed the
capabilities of the "small" scanner.  This makes scanning service
manuals a bit tedious as they may have five 8.5x11 pages followed
by three 11x17 foldouts followed by more 8.5x11's, etc.

But, assembling the final document (PDF) is relatively easy in Acrobat;
I just import ALL of the images and then rearrange their order using
the graphical thumbnails.  If a page got scanned upside down, I can
flip it.  If pages were typeset to be read in landscape mode (e.g.,
rotate the document to read the table on page 27), I can perform that
rotation in the PDF so the user doesn't have to turn the screen
sideways.

It's also helpful as I can add other content to the "container"
to preserve it as originally packaged.  E.g., audio files that
accompany the text or program listings that really want to be
*attachments* and not "in-lined".

>>> Also I don't want to destroy my paper books.  I like reading
>>> books on paper. After all that is how I grew up.
>>
>> Agreed.  But, if you are proactively safeguarding your collection against
>> the possibility of downsizing into a different living situation, you've
>> already decided that they will be discarded -- even if not "destroyed".
>
> As I say, I prefer real books, and have the space to keep them.

Then you are scanning as a preemptive action in the hope that
when you need to be rid of the paper, *someone* will be able
to do that (I make my plans assuming that I may not have the
same physical or mental competencies as I do, now).

E.g., SWMBO would curse me up and down if *she* had to sort
through all of my books -- even if she KNEW that they should
all be discarded ("Why the hell didn't HE do this??")

Ditto my business records, software archive, financial
records, etc.  (I've seen too many people "rushed" by
"unexpected events" that have had to take a broad brush
approach to discarding "stuff" because they didn't have
the time or abilities to more selectively filter it)

========== REMAINDER OF ARTICLE TRUNCATED ==========