Article <tnrlljdo004q71l6qc34ndr9k7ssriiuhl@4ax.com>

Deutsch English Français Italiano
<tnrlljdo004q71l6qc34ndr9k7ssriiuhl@4ax.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.nobody.at!2.eu.feeder.erje.net!feeder.erje.net!feeds.news.ox.ac.uk!news.ox.ac.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.eyrie.org!beagle.ediacara.org!.POSTED.beagle.ediacara.org!not-for-mail
From: Martin Harran <martinharran@gmail.com>
Newsgroups: talk.origins
Subject: Re: Running out of data to train AI programs.
Date: Thu, 12 Dec 2024 14:17:34 +0000
Organization: A noiseless patient Spider
Lines: 53
Sender: to%beagle.ediacara.org
Approved: moderator@beagle.ediacara.org
Message-ID: <tnrlljdo004q71l6qc34ndr9k7ssriiuhl@4ax.com>
References: <vjd316$1o21s$2@dont-email.me> <90d51f69-3f1e-40ba-aef0-9b1de6ae4e28@gmail.com> <vjd5mk$1opdd$1@dont-email.me> <7bc90087-bcfc-48b1-8282-99bd309dea2b@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: beagle.ediacara.org; posting-host="beagle.ediacara.org:3.132.105.89";
	logging-data="90869"; mail-complaints-to="usenet@beagle.ediacara.org"
User-Agent: ForteAgent/8.00.32.1272
To: talk-origins@moderators.isc.org
Cancel-Lock: sha1:dRpDR5umdTy6K2Dljc4/opd9B18=
Bytes: 5159

On Wed, 11 Dec 2024 15:33:20 -0800, erik simpson
<eastside.erik@gmail.com> wrote:

>On 12/11/24 3:02 PM, RonO wrote:
>> On 12/11/2024 4:45 PM, erik simpson wrote:
>>> On 12/11/24 2:17 PM, RonO wrote:
>>>> https://www.nature.com/articles/d41586-024-03990-2
>>>>
>>>> The claim in this article is that soon the AI programers will run out 
>>>> of data to train their AI on.  If they want to improve their AI they 
>>>> will have to create their own data, but how are they going to do that?
>>>>
>>>> My guess is that they will identify what data would be most 
>>>> beneficial to have and try to generate it.  It could direct medical 
>>>> research into generating useful data.
>>>>
>>>> They could also spend decades weeding through the data that has 
>>>> already been used and throw out the trash data.  They could also go 
>>>> through the scientific experiments in a field and use the good data, 
>>>> but remove the conclusions of the researhers, and see what 
>>>> conclusions the AI can come up with, cross check the conclusions to 
>>>> see if there was anything missed by the original researchers and use 
>>>> that to train some other AI.
>>>>
>>>> Ron Okimoto
>>>>
>>> Why not let AI create its own data.  Then we wouldn't have to worry 
>>> about it.  And think of the great literature and movies!
>>>
>> That results in increased bogus output by the AI.
>> 
>> The AI start to "hallucinate" when fed data generated by other AI 
>> according to one article that I recall reading.  I think one poster 
>> responded that AI hallucination can be caused by other factors too. 
>> Maybe the AI generate data in a format that they were not trained to 
>> deal with, and that causes issues in assimilating AI generated data.
>> 
>> Ron Okimoto
>> 
>I was joking about AI -> AI generating anything sensible.  AI is very 
>useful detecting patterns in all kinds of data.  Applying in to weather 
>data could save millions if not billions of dollars.  AI navel-gazing 
>isn't going to work.

I realise you were being tongue-in-cheek, but I would actually worry
about the possibility of this.  We see something along the same lines
in Internet tracking where things you like are constantly monitored
and you are shown ever more things that are the same or similar. This
is a particular problem in social media where people are fed more and
more stuff to reinforce their political and cultural biases. I shudder
to think of what sort of stuff we might get if AI turns to a feeding
frenzy upon its own output, bearing in mind how people generally are
becoming less and less adept at sifting out the utter rubbish :(