Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Don Y Newsgroups: sci.electronics.design Subject: Re: OT: central limit theorem Date: Mon, 13 May 2024 09:19:10 -0700 Organization: A noiseless patient Spider Lines: 95 Message-ID: References: <662bf69c$0$8484$882e4bbb@reader.netnews.com> <662bffdf$0$8488$882e4bbb@reader.netnews.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Mon, 13 May 2024 18:19:12 +0200 (CEST) Injection-Info: dont-email.me; posting-host="d01be735601f3705df1d3036dc0d99b8"; logging-data="3747584"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MDJtv/bLLQ50kOyzeNyPR" User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Cancel-Lock: sha1:YGxtYPORq61pLlcgfqSDz5nT+LA= Content-Language: en-US In-Reply-To: Bytes: 5830 On 5/13/2024 3:48 AM, albert@spenarnc.xs4all.nl wrote: > In article , > Don Y wrote: >> On 4/27/2024 3:15 AM, albert@spenarnc.xs4all.nl wrote: >>>>> I eventually scanned everything with a Perfect Binding and now fit those >>>>> same books on a single microSD card (in a Nook; PDFs on a 12" tablet). >>>> >>>> PDFs are a dreadful format! Maybe there's a high-end e-ink that >>>> processes them effectively but they look like shit on the cheaper ones >>>> like most of the Kindles with e-ink displays. >>> >>> PDF's are fearful. It is under control of one company, Adobe. >>> It is changed without notice. I regret the change away from PostScript >>> that at least was defined. >> >> How is this any different than other file formats "controlled" by their >> originators? MS can't even access THEIR older versions of THEIR format. >> I have PCB layout tools that can't read THEIR earlier (one version) >> files, etc. > > PDF is better that WORD. See my other response. PDF is somewhat open and stable. Most other formats are closed and subject tot he whims of their creators/owners. >> You could always render your document to a TIFF (and then encapsulate it >> in a PDF!), losing the textual nature in the process... > > What? That is stupid. I don't go for the looks. PDF is ALL about the page layout. If all you care about is the *content* (and not the format/layout), then you could use HTML to encapsulate the document (assuming you have other media besides just "ASCII text") > OTOH convert a pdf document into UTF8 rather than a graphical > format. There are tools that will make these conversions (scanned images OCRed, PDF/PS to text, etc.). But, you lose all of the non-text content. I use PDFs as a versatile container format that lets me show content exactly how I want it presented, include graphics, audio, video/animation, etc. I can describe a piece of code and "attach" the code to the explanation (without having to "include" all of that text IN the presentation). How do I -- in prose -- describe the different audio characteristics of speech created with two different glottal waveform generators? And, be reasonably sure that the reader truly understands the (audible) pros and cons of each. Or, illustrate which classes of cubic beziers exhibit discontinuities? Which have degenerate forms? You can describe these mathematically... but, it is far simpler to just SHOW them, graphically. > I'm writing a program to read TIFF's in behalf of ocr. There are tools that will already do this for you. You can have an invisible "text" layer that sits "under" the corresponding TIFF imagery in a PDF. These are funky documents to use as selecting text based on the *visible* imagery actually highlights the regions occupied by the INvisible text. So, highlighting "these words" may actually show "these words ma" as highlighted -- even though pasting that selection will deliver the expected results! :< WYSInWYG! > In TIFF there are several compression schemes that are possible, e.g. You can also render to different pixel depths - 1, 4, 16, etc. I created some documents with a *2* bit pixel representation (which seemed "legal" per the definition of the TIF format) but were not recognized by most tools available, at that time. So, I had to "inflate" them to a 4b representation in order to render them. > one of those is the black and white Fax machines. TIFF is worse than > PDF, and you couldn't search it for text content. You use TIFF as a semi-photographic rendering of the page. I often request technical documents from my local public library. Invariably, these are *FAXed* to the library. Then, printed for delivery to me. So, the original document was scanned (at some resolution), FAXed (with some potential for resampling in the FAX software), printed (yet another resolution/resampling) and, finally, *I* scan it (so I don't have to keep track of piles of paper) in yet another resampling. OTOH, I end up with a readable document, including any illustrations that it may have had (often, color -- and greyscale -- is stripped in the processing). This is far preferable to a searchable document that I *don't* have! :-/ (File names for documents are really important. Folks who deliver documents with names like C484915.pdf should be flogged and then shot! Are they hosting those documents on an MSDOS FAT12 filesystem that can't handle long DESCRIPTIVE file names????)