Message-ID: <67d9f16a@news.ausics.net> From: not@telling.you.invalid (Computer Nerd Kev) Subject: Re: bad bot behavior Newsgroups: comp.misc References: User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586)) NNTP-Posting-Host: news.ausics.net Date: 19 Mar 2025 08:19:22 +1000 Organization: Ausics - https://newsgroups.ausics.net Lines: 28 X-Complaints: abuse@ausics.net Path: ...!weretis.net!feeder9.news.weretis.net!news.bbs.nz!news.ausics.net!not-for-mail Bytes: 1926 D Finnigan wrote: > On 3/18/25 10:17 AM, Ben Collver wrote: >> Please stop externalizing your costs directly into my face >> ========================================================== >> March 17, 2025 on Drew DeVault's blog >> >> Over the past few months, instead of working on our priorities at >> SourceHut, I have spent anywhere from 20-100% of my time in any given >> week mitigating hyper-aggressive LLM crawlers at scale. > > This is happening at my little web site, and if you have a web site, > it's happening to you too. Don't be a victim. Meh, my little Web site runs so light that even when Amazon's bot got stuck in a recursive loop grabbing the same dynamic page tens of times a second from different IPs, the server load was near nill as usual. The main problem that caused was access logs of hundreds of megabytes per day. Amazon is still scraping the hell out of everything I put online (even a mirror that's tens of GBs), and other bots squeeze into the logs too, maybe even a few humans view things sometimes? I don't care, they're welcome to it, and they helped me find the bug in the Apache configuration which allowed that recursive loop (though I still don't get why bots started forming such URLs in the first place). -- __ __ #_ < |\| |< _#