Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Don Y Newsgroups: sci.electronics.design Subject: Re: Chinese downloads overloading my website Date: Sun, 10 Mar 2024 13:48:54 -0700 Organization: A noiseless patient Spider Lines: 68 Message-ID: References: <7qujui58fjds1isls4ohpcnp5d7dt20ggk@4ax.com> <6lekuihu1heui4th3ogtnqk9ph8msobmj3@4ax.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sun, 10 Mar 2024 20:48:55 -0000 (UTC) Injection-Info: dont-email.me; posting-host="7291c033c852c6280a1642118965cfc1"; logging-data="3326449"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18F3b9rjJuaOJoI8iaFC9Qr" User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Cancel-Lock: sha1:/5aK1s1Gatp6nQjvPQdzaH1mDQ0= In-Reply-To: Content-Language: en-US Bytes: 4261 On 3/10/2024 10:47 AM, legg wrote: > So far the chinese are accessing the top level index, where > files are offered for download at a click. > > Ideally, if they can't access the top level, a direct address > access to the files might be prevented? Many file sharing services deliberately do NOT offer access to a "folder index" for similar reasons. This allows the owner of the file(s) to publish specific links to individual files while keeping the folder, itself, hidden. This is done by creating unique URLs for each file. I.e., instead of ..../foldername/filename you publish ..../foldername/pseudorandomappearingstring/filename where "foldername" is some bogus sequence of characters and pseudorandomappearingstring varies from file to file! > The website's down after a fifth excursion pushed volumes above > 85g on a 70G temporary extension. What's the bet it was 17G > accumulated in 262 'visits'. > > Can't ID that final hosts IP address while I'm locked out. > > Luckily (~) for users, you can still access most of the usefull > files, updated in January 2024, through the Wayback Machine. > > https://web.archive.org/web/20240000000000*/http://www.ve3ute.ca/ > > Probably the best place for it, in some people's opinion, anyways. There's no guarantee that the *files* will be accessible via those links. I have often gone looking for something that has disappeared from its original home and able to find the *pages* that reference them but not the actual *payloads*. (this happened as recently as yesterday) Pages take up far less space than payloads, typically, so it is understandable that they would capture the page but not the files referenced from it. > YOU can make stuff available to others, in the future, by 'suggesting' > relevent site addresses to the Internet Archive, if they're not > already being covered. > > Once a 'captcha' or other security device is added, you can kiss > Wayback updates goodbye, as most bots will get the message. > I don't mind bots - thay can do good work. > > Pity you can't just put stuff up in the public domain without > this kind of bullshit. Making it accessible to *all* means you have to expect *all* to access it. Hard to blame your ISP for wanting to put a limit on the traffic to the site (my AUP forbids me from operating a public server so I have to use more clandestine means of "publishing") If demand is low enough (you can determine that by looking at past "legitimate" traffic), you can insert yourself in the process by requesting a form completion: "These are the things that I have available. Type the name of the item into the box provided" This eliminates LINKS on the page and requires someone who can read the text to identify the item(s) of interest. This allows you to intervene even if the "user" is not a 'bot but a poorly paid urchin trying to harvest content.