Deutsch   English   Français   Italiano  
<usvujn$1slrq$3@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: grrrrrr
Date: Thu, 14 Mar 2024 15:43:59 -0700
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <usvujn$1slrq$3@dont-email.me>
References: <rv94vitgeqri7369q5jr01563bhrpmu1te@4ax.com>
 <5io5vi93r96fk7qf4tn7mhbvm5j3jdbvno@4ax.com>
 <3246vittognt7olmn3skt2c1opsrrvj8vs@4ax.com>
 <65f34444$0$2350502$882e4bbb@reader.netnews.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 14 Mar 2024 22:44:08 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="11f1a6c097d5e8318048522ef22246c2";
	logging-data="1988474"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18ydkoKSQGiwdT1C2nezaAa"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:Z3USWfj+H1wvEgpyarZGvvSC060=
Content-Language: en-US
In-Reply-To: <65f34444$0$2350502$882e4bbb@reader.netnews.com>
Bytes: 1759

On 3/14/2024 11:39 AM, bitrex wrote:
> They also seem to like to host them on the slowest servers imaginable and still 
> use FTP like it's the 90s.

FTP would require each asset to have a unique file name.
AND, would let a client peruse the list of ALL available
assets -- scrape the server in one shot!

Many sites deliver a "document.pdf" after you have clicked
through a set of pages (including an acceptance of license).

For "true PDFs", a simpler strategy is to scrape the site
and store the entire set of documents indexed for FTS.