Article <usqjih$h74g$1@dont-email.me>

Deutsch English Français Italiano
<usqjih$h74g$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!feeder1.cambriumusenet.nl!feed.tweak.nl!217.73.144.44.MISMATCH!feeder.ecngs.de!ecngs!feeder2.ecngs.de!144.76.237.92.MISMATCH!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: Chinese downloads overloading my website
Date: Tue, 12 Mar 2024 15:05:00 -0700
Organization: A noiseless patient Spider
Lines: 89
Message-ID: <usqjih$h74g$1@dont-email.me>
References: <7qujui58fjds1isls4ohpcnp5d7dt20ggk@4ax.com>
 <6lekuihu1heui4th3ogtnqk9ph8msobmj3@4ax.com> <usec35$130bu$1@solani.org>
 <u14quid1e74r81n0ajol0quthaumsd65md@4ax.com> <usjiog$15kaq$1@solani.org>
 <t7rrui5ohh07vlvn5vnl277eec6bmvo4p9@4ax.com> <usm6v6$17e2c$1@solani.org>
 <gabuui56k0fn9iovps09um30lhiqhvc61t@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 12 Mar 2024 22:05:06 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="cd61f5e18330181594e65cc325aef3d5";
	logging-data="564368"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18Mhe6GvfVVrXHa3x2tLRR4"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:HNbTuRcHFR9cL3Ej0zkfEtQOo9c=
Content-Language: en-US
In-Reply-To: <gabuui56k0fn9iovps09um30lhiqhvc61t@4ax.com>
Bytes: 5348

On 3/11/2024 9:48 AM, legg wrote:
>>>> When I ask google for "how to add a captcha to your website"
>>>> I see many solutions, for example this:
>>>> https://www.oodlestechnologies.com/blogs/create-a-captcha-validation-in-html-and-javascript/
>>>>
>>>> Maybe some html guru here nows?
>>>
>>> That looks like it's good for accessing an html page.
>>> So far the chinese are accessing the top level index, where
>>> files are offered for download at a click.
>>>
>>> Ideally, if they can't access the top level, a direct address
>>> access to the files might be prevented?
> 
> Using barebones (Netscape) Seamonkey Compser, the Oodlestech
> script generates a web page with a 4-figure manually-entered
> human test.
> 
> How do I get a correct response to open the protected web page?

Why not visit a page that uses it and inspect the source?

>> What I am doing now is using a html://mywebsite/pub/ directory
>> with lots of files in it that I want to publish in for example this newsgroup,
>> I then just post a direct link to that file.
>> So it has no index file and no links to it from the main site.
>> It has many sub directories too.
>> https://panteltje.nl/pub/GPS_to_USB_module_component_site_IXIMG_1360.JPG
>> https://panteltje.nl/pub/pwfax-0.1/README
>>
>> So you need the exact link to access anything
>> fine for publishing here...
> <snip>
> 
> The top (~index) web page of my site has lists of direct links
> to subdirectories, for double-click download by user.

You could omit the actual links and just leave the TEXT for a link
present (i.e., highlight text, copy, paste into address bar) to
see if the "clients" are exploring all of your *links* or are
actually parsing the *text*.

> It also has limks to other web pages that, in turn, offer links or
> downloads to on-site and off-site locations. A great number of

Whether or not you choose to "protect" those assets is a separate
issue that only you can resolve (what's your "obligation" to a site that
you've referenced on YOUR page?)

> off-site links are invalid, after ~10-20years of neglect. They'll
> probably stay that way until something or somebody convinces me
> that it's all not just a waste of time.
> 
> At present, I only maintain data links or electronic publications
> that need it. This may not be neccessary, as the files are generally
> small enough for the Wayback machine to have scooped up most of the
> databases and spreadsheets. They're also showing up in other places,
> with my blessing. Hell - Wayback even has tube curve pages from the
> 'Conductance Curve Design Manual' - they've got to be buried 4 folders
> deep - and each is a hefty image.

You can see if bitsavers has an interest in preserving them in a
more "categorical" framework.

> Somebody, please tell me the the 'Internet Archive' is NOT owned
> by Google?
> 
> Some off-site links for large image-bound mfr-logo-ident web pages
> (c/o geek@scorpiorising) seem already to have introduced a
> captcha-type routine. Wouldn't need many bot hits to bump that
> location into a data limit. Those pages take a long time
> simply to load.

There is an art to designing all forms of documentation
(web pages just being one).  Too abridged and folks spend forever
chasing links (even if it's as easy as "NEXT").  Too verbose and
the page takes a long time to load.

OTOH, when I'm looking to scrape documentation for <whatever>,
I will always take the "one large document" option, if offered.
It's just too damn difficult to rebuild a site's structure,
off-line, in (e.g.) a PDF.  And, load times for large LOCAL documents
is insignificant.
> Anyway - how to get the Oodlestech script to open the appropriate
> page, after vetting the user as being human?

No examples, there?