Deutsch English Français Italiano |
<v698an$3c5jp$2@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Don Y <blockedofcourse@foo.invalid> Newsgroups: sci.electronics.design Subject: Re: hobby electronics Date: Fri, 5 Jul 2024 09:51:26 -0700 Organization: A noiseless patient Spider Lines: 150 Message-ID: <v698an$3c5jp$2@dont-email.me> References: <j5a88jhm7pge920n2io4jnhs101i8ntb2g@4ax.com> <v635o1$24goj$1@dont-email.me> <v63k0i$271d8$1@dont-email.me> <v63ldd$26rbm$2@dont-email.me> <v667qj$2p9gt$4@dont-email.me> <v66doo$2q0be$1@dont-email.me> <v68tfj$3abt3$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 05 Jul 2024 18:51:37 +0200 (CEST) Injection-Info: dont-email.me; posting-host="84f4481ffa32c5eaa549edc266280f56"; logging-data="3544697"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19OiP9BaWSVR4Jucv0Xmf7u" User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Cancel-Lock: sha1:dCthbjdVN4rmtOCNOaYVkgt6gbs= In-Reply-To: <v68tfj$3abt3$1@dont-email.me> Content-Language: en-US Bytes: 8224 On 7/5/2024 6:46 AM, BillGill wrote: > I have a large paper library. I am also getting old. Ditto -- on both counts. When I moved here, I had *80* "xerox paper" cartons (the sort that hold ten 500 sheet reams) full of just paperback novels. The older paperbacks were tiny things -- maybe 250pp. So, I would read several each week. (Having bought the title, it was silly not to KEEP it) > I may have to go into some sort of assisted living > when I can't go on living by myself. When I do I will > not be able to take my library with me. So I have In my case, I simply didn't have the room for all that paper. Yet, wanted to retain access to the *content* as I am often looking for some story that I'd read "some time ago" and one easy way to find it is to look through MY titles (if I read it, I still *have* it!) > been building a digital library. Most books that I have > are available in digital format. But I realized that many > of the older books are not available. They are mainly fiction, > mysteries, SF, even a few romances. And mostly from > the time when books were mostly a one time event. A Agreed. Or, are oddball titles: _Mouthsounds_, _Ben & Jerry's Ice Cream & Dessert Book_, _Joyce Chen Cookbook_, _The Fabulous Furry Freak Brothers in 'The Idiots Abroad'_, _TeXniques_, _Optimal Strategy for Pai Gow Poker_, etc. Adopting the PDF container means I can preserve any illustrations in the texts, as well. I also scan paper documents ("research papers") that are no longer available on-line. And, a variety of different "manuals" (I had a few cubic feet of MULTICS manuals that now occupy zero space on my shelf! :> ) These tend to be larger page sizes so I need to view them on a larger screen than my eReaders -- I will eventually buy an oversized tablet to use for this (instead of my monitors). Thankfully, a lot of other "reference" titles were published in "Perfect" bindings. As with the paperbacks, it's easy to chop the binding edge off of the book (I have a paper cutter that will cut up to a 1" thick stack of paper, "straight" -- the "slicing" kind will leave you with different size pages!). Anything too thick for the cutter is manually cut (or "sliced" with a box cutter) along the *inside* of the binding to produce 1" thick chunks. Then, place the stack on the scanner and let it scan them, sequentially (both sides) to TIFFs and package those in PDFs. If all of the pages are similar size (true for most things except service manuals with larger fold-outs) *and* the same "type" (i.e., all B&W print instead of some "color inserts"), then they can be scanned pretty quickly. I think the main scanner that I use does 20 or 30 double-sided pages per minute. (If I have to scan an 11x17 "fold out", I have to do so on a manual, flatbed scanner -- which takes MINUTES by the time you set the ONE page in place) The "small" scanner claims I have scanned 94931 double-sided sheets (i.e., ~190K images) For the already small page size (of old paperbacks), my eReaders can display PDFs at full size -- or larger. > few old time authors, such as Agatha Christie, are still > in print and available as print or digital, but many > are not. So I decided to digitize those books for myself. > While most of them are in copyright, I have no idea how > to get permission. I think you can probably argue that they are for your own use and, having had the originals, there is no difference in having PHOTOGRAPHS of the original pages. I think *distributing* same would run the risk of some legal action. I save the front covers as "proof" of having owned the book (a stack of covers takes up relatively little space) > I suspect that is why many of them are > not in digital format. So I have been digitizing them for > my own use. I will not distribute them in any way. They > are strictly for my own use. If any of them show up in > digital format I will buy that edition. I made a systematic effort to find "original" (PDF) copies of most of the research papers in my collection. That's where *my* paper copies originated -- I just failed to preserve the PDFs in favor of print copies, "back then". For each title found, I would discard my paper copy in favor of the digital version -- regardless of whether it was a low resolution scan, "true" PDF, etc. I did this mainly to get "cleaner" copies of the documents (not stained/dog-eared). > So I have been doing non-destructive scanning. This is a > rather long process, since I am creating epub formatted books > epub is a format based on HTML so that it can be automatically > reformatted to fit on any screen. Yes, but this only works well with "pure text" documents (e.g., old "pocket" paperbacks). Anything with illustrations, tables, etc. tend to be poorly suited for epubs. As my goal is just to replace the paper, a "collection of TIFFs" achieves that goal *quickly*. [Depending on the material and the size of the typeface, I scan at 600 or 1200 dpi -- so I can postprocess the TIFFs with OCR /at a later date/, if I choose to do so] > But that means extra > work. It takes anyplace for 3 days to a week, depending on > the size and quality of the book. First I scan it using > my DIY scanner. This involves taking a photo of each page, > then converting the photos to text, using Optical Character > Recognition (OCR) software. After that is the slow part. Ah, I would consider capturing the images in this manner to be slow. You have to manually flip pages and reposition the book in the scanner -- ? It's got to take 10+ (20+??) seconds to perform that action? So, even a 250p "pocket paperback" would be > 1200 (2400??) seconds just to scan! And then "collect"? [I.e., 95K scans would have taken 950K seconds -- 16000 minutes (~250 hours)] > I insert the text into a word processor and proof it to > correct all the many errors the OCR makes in the process. The (my) scanner can do the OCR but it leaves you (me) with these problems you've outlined. If you forego the ability to do searches, then having a "photo" of the page and relying on your own brain for the OCR seems more expedient. > How many errors depends partly on the quality of the source. > Then it is fairly simple to convert it to the epub format, > or into the AZW3 format that can be read by kindle. But, you still have those books lying around? Here, you could donate them to the local library -- but, they will simply be sold ($1/each) to raise funds for "other uses". Their content will only be available to a person who stumbles upon the title on the "for sale" rack. (I'd rather just donate monies and discard the "paper") Good luck with your effort! I can recall digitizing 35mm slides -- a similarly slow process. Thankfully, I didn't have more than a few hundred to process...