Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Lawrence D'Oliveiro <ldo@nz.invalid>
Newsgroups: comp.misc
Subject: Re: Emigration from Usenet [was: Re: PTD was the most-respected of
 the AUE regulars ...]
Date: Sun, 28 Jul 2024 01:55:16 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <v848e4$3kh8o$3@dont-email.me>
References: <uvej5e$34pfl$8@dont-email.me> <v7mdjl$pq7n$3@dont-email.me>
	<nbcu9j5d7r8gbdngudbti83dg4agsl6knb@4ax.com>
	<lg948dF3fs6U1@mid.individual.net>
	<od21ajlllvqe32gi543hkft3lnhv2n3tus@4ax.com>
	<lgbo1lFfno3U1@mid.individual.net>
	<20240724115828.5d9d85d9305fe8300a91db5d@g{oogle}mail.com>
	<v7te4f$r6l$1@nnrp.usenet.blueworldhosting.com>
	<v7tmng$2abtm$1@dont-email.me> <v7tvu0$2c8e9$1@dont-email.me>
	<66a2d000@news.ausics.net>
	<20240726013343.02805fe30e4853cf7cd40797@gmail.moc>
	<66a31b29@news.ausics.net>
	<cceeb788-c131-4a84-588d-044e917fa810@example.net>
	<66a39428@news.ausics.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 28 Jul 2024 03:55:16 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3b558a041f0a0aed486aeb8fa027d259";
	logging-data="3818776"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/1OUXUZEy8pQTMkweSJU6V"
User-Agent: Pan/0.159 (Vovchansk; )
Cancel-Lock: sha1:bbAxvmrp1XswawdLNZ+3e+x9YKg=
Bytes: 2142

On 26 Jul 2024 22:18:48 +1000, Computer Nerd Kev wrote:

> I'm not really sure whether a HTML parser
> library would be helpful or just a pointless extra layer of complexity.
> So far I've just used regular expressions for scraping webpages.

I learned about BeautifulSoup early on, and never looked back. I use it 
for all my web-scraping projects nowadays.

By the way, this is the kind of discussion you could not have on a 
platform like Discord. The last time I was on there, the server Ts&Cs had 
prohibitions against talking about web-scraping, since so many websites 
didn’t like it.