Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: OT: central limit theorem
Date: Mon, 13 May 2024 09:19:10 -0700
Organization: A noiseless patient Spider
Lines: 95
Message-ID: <v1tei0$3ibo0$2@dont-email.me>
References: <662bf69c$0$8484$882e4bbb@reader.netnews.com>
 <662bffdf$0$8488$882e4bbb@reader.netnews.com>
 <nnd$09c3fa57$3f7b285d@1c24b118c7da3bd9> <v0jhmd$h25f$3@dont-email.me>
 <nnd$7a1499b4$09efd010@2879689e07bdffe9>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 13 May 2024 18:19:12 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d01be735601f3705df1d3036dc0d99b8";
	logging-data="3747584"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19MDJtv/bLLQ50kOyzeNyPR"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:YGxtYPORq61pLlcgfqSDz5nT+LA=
Content-Language: en-US
In-Reply-To: <nnd$7a1499b4$09efd010@2879689e07bdffe9>
Bytes: 5830

On 5/13/2024 3:48 AM, albert@spenarnc.xs4all.nl wrote:
> In article <v0jhmd$h25f$3@dont-email.me>,
> Don Y  <blockedofcourse@foo.invalid> wrote:
>> On 4/27/2024 3:15 AM, albert@spenarnc.xs4all.nl wrote:
>>>>> I eventually scanned everything with a Perfect Binding and now fit those
>>>>> same books on a single microSD card (in a Nook; PDFs on a 12" tablet).
>>>>
>>>> PDFs are a dreadful format! Maybe there's a high-end e-ink that
>>>> processes them effectively but they look like shit on the cheaper ones
>>>> like most of the Kindles with e-ink displays.
>>>
>>> PDF's are fearful. It is under control of one company, Adobe.
>>> It is changed without notice. I regret the change away from PostScript
>>> that at least was defined.
>>
>> How is this any different than other file formats "controlled" by their
>> originators?  MS can't even access THEIR older versions of THEIR format.
>> I have PCB layout tools that can't read THEIR earlier (one version)
>> files, etc.
> 
> PDF is better that WORD. See my other response.

PDF is somewhat open and stable.  Most other formats are closed and
subject tot he whims of their creators/owners.

>> You could always render your document to a TIFF (and then encapsulate it
>> in a PDF!), losing the textual nature in the process...
> 
> What? That is stupid. I don't go for the looks.

PDF is ALL about the page layout.  If all you care about is the *content*
(and not the format/layout), then you could use HTML to encapsulate
the document (assuming you have other media besides just "ASCII text")

> OTOH convert a pdf document into UTF8 rather than a graphical
> format.

There are tools that will make these conversions (scanned images OCRed,
PDF/PS to text, etc.).  But, you lose all of the non-text content.

I use PDFs as a versatile container format that lets me show content
exactly how I want it presented, include graphics, audio, video/animation,
etc.  I can describe a piece of code and "attach" the code to the explanation
(without having to "include" all of that text IN the presentation).

How do I -- in prose -- describe the different audio characteristics
of speech created with two different glottal waveform generators?
And, be reasonably sure that the reader truly understands the (audible)
pros and cons of each.

Or, illustrate which classes of cubic beziers exhibit discontinuities?
Which have degenerate forms?  You can describe these mathematically... but,
it is far simpler to just SHOW them, graphically.

> I'm writing a program to read TIFF's in behalf of ocr.

There are tools that will already do this for you.  You can have an
invisible "text" layer that sits "under" the corresponding TIFF imagery
in a PDF.  These are funky documents to use as selecting text based on
the *visible* imagery actually highlights the regions occupied by the
INvisible text.  So, highlighting "these words" may actually show
"these words ma" as highlighted -- even though pasting that selection
will deliver the expected results!  :<   WYSInWYG!

> In TIFF there are several compression schemes that are possible, e.g.

You can also render to different pixel depths - 1, 4, 16, etc.  I created
some documents with a *2* bit pixel representation (which seemed "legal"
per the definition of the TIF format) but were not recognized by most tools
available, at that time.  So, I had to "inflate" them to a 4b representation
in order to render them.

> one of those is the black and white Fax machines. TIFF is worse than
> PDF, and you couldn't search it for text content.

You use TIFF as a semi-photographic rendering of the page.  I often
request technical documents from my local public library.  Invariably,
these are *FAXed* to the library.  Then, printed for delivery to me.

So, the original document was scanned (at some resolution), FAXed
(with some potential for resampling in the FAX software), printed
(yet another resolution/resampling) and, finally, *I* scan it (so
I don't have to keep track of piles of paper) in yet another
resampling.

OTOH, I end up with a readable document, including any illustrations
that it may have had (often, color -- and greyscale -- is stripped
in the processing).  This is far preferable to a searchable document
that I *don't* have!  :-/

(File names for documents are really important.  Folks who deliver documents
with names like C484915.pdf should be flogged and then shot!  Are they
hosting those documents on an MSDOS FAT12 filesystem that can't handle
long DESCRIPTIVE file names????)