Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: Chinese downloads overloading my website
Date: Sun, 10 Mar 2024 13:48:54 -0700
Organization: A noiseless patient Spider
Lines: 68
Message-ID: <usl6bn$35gfh$1@dont-email.me>
References: <7qujui58fjds1isls4ohpcnp5d7dt20ggk@4ax.com>
 <6lekuihu1heui4th3ogtnqk9ph8msobmj3@4ax.com> <usec35$130bu$1@solani.org>
 <u14quid1e74r81n0ajol0quthaumsd65md@4ax.com> <usjiog$15kaq$1@solani.org>
 <t7rrui5ohh07vlvn5vnl277eec6bmvo4p9@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 10 Mar 2024 20:48:55 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7291c033c852c6280a1642118965cfc1";
	logging-data="3326449"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18F3b9rjJuaOJoI8iaFC9Qr"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:/5aK1s1Gatp6nQjvPQdzaH1mDQ0=
In-Reply-To: <t7rrui5ohh07vlvn5vnl277eec6bmvo4p9@4ax.com>
Content-Language: en-US
Bytes: 4261

On 3/10/2024 10:47 AM, legg wrote:
> So far the chinese are accessing the top level index, where
> files are offered for download at a click.
> 
> Ideally, if they can't access the top level, a direct address
> access to the files might be prevented?

Many file sharing services deliberately do NOT offer access
to a "folder index" for similar reasons.  This allows the
owner of the file(s) to publish specific links to individual files
while keeping the folder, itself, hidden.

This is done by creating unique URLs for each file.
I.e., instead of ..../foldername/filename you publish
..../foldername/pseudorandomappearingstring/filename
where "foldername" is some bogus sequence of characters
and pseudorandomappearingstring varies from file to file!

> The website's down after a fifth excursion pushed volumes above
> 85g on a 70G temporary extension. What's the bet it was 17G
> accumulated in 262 'visits'.
> 
> Can't ID that final hosts IP address while I'm locked out.
> 
> Luckily (~) for users, you can still access most of the usefull
> files, updated in January 2024, through the Wayback Machine.
> 
> https://web.archive.org/web/20240000000000*/http://www.ve3ute.ca/
> 
> Probably the best place for it, in some people's opinion, anyways.

There's no guarantee that the *files* will be accessible via those
links.  I have often gone looking for something that has disappeared
from its original home and able to find the *pages* that reference
them but not the actual *payloads*.  (this happened as recently as
yesterday)

Pages take up far less space than payloads, typically, so it is
understandable that they would capture the page but not the
files referenced from it.

> YOU can make stuff available to others, in the future, by 'suggesting'
> relevent site addresses to the Internet Archive, if they're not
> already being covered.
> 
> Once a 'captcha' or other security device is added, you can kiss
> Wayback updates goodbye, as most bots will get the message.
> I don't mind bots - thay can do good work.
> 
> Pity you can't just put stuff up in the public domain without
> this kind of bullshit.

Making it accessible to *all* means you have to expect *all* to
access it.  Hard to blame your ISP for wanting to put a limit on the
traffic to the site (my AUP forbids me from operating a public
server so I have to use more clandestine means of "publishing")

If demand is low enough (you can determine that by looking at past
"legitimate" traffic), you can insert yourself in the process by
requesting a form completion:  "These are the things that I have
available.  Type the name of the item into the box provided"

This eliminates LINKS on the page and requires someone who can
read the text to identify the item(s) of interest.  This allows
you to intervene even if the "user" is not a 'bot but a poorly
paid urchin trying to harvest content.