New subject: AI Training is Copyright Infringement

Sept. 12, 2024

      A questo proposito

Il 12/09/2024 03:33, Giacomo Tesio ha scritto:
...
Commentavi questo articolo:https://arxiv.org/pdf/2301.13867
Leggiamo come è composto il dataset con cui hanno testato le
"competenze matematiche" dei LLM di OpenAI (pagina 4):
- books that are widely used in universities to teach upper
undergraduate or first-year graduate courses in a degree in
mathematics
- math.stackexchange.com, a collection of books, and the
MATH dataset
- the book Problem-Solving Strategies, that is often used
to prepare for mathematical competitions
- il dataset dihttps://arxiv.org/abs/1912.01412 che contiene
decine di esercizi... e le soluzioni.
Leggi con calma e rifletti: non noti niente?
Questi ricercatori NON hanno sottoposto a ChatGPT e GPT-4 problemi
inediti, ma problemi tratti da eserciziari disponibili in rete.
Ora, se immagini una qualche "intelligenza artificiale" alle prese con
tutti questi problemi, è ragionevole trovare i risultati sintetizzati
nell'abstract che avevo citato "strabilianti" [1].
Ma se hai chiaro il processo di compilazione / compressione dei testi
sorgente che produce il LLM, trovi quei risultati piuttosto ovvi: il
LLM ha prodotto in output le soluzioni codificate nelle matrici
eseguibili.
ritengo utile segnalare quanto ha scritto recentemente Francois Chollet 
in una serie di post su X (a partire da qua 
https://x.com/fchollet/status/1800577565717148143)

Here's the thing: what current AI (e.g. LLMs) is really doing is 
memorizing millions of patterns seen in human-generated data, and 
reapplying them on new inputs. That works great when you're dealing with 
a well-known problem – until you introduce any amount of novelty.

But the nature of intelligence is precisely to adapt to things you don't 
expect. To figure out what to do when you don't have a solution already 
memorized.
If your AI can't adapt to novelty, it will never be able to deal with 
the variability and fluidity of the real world.

And that's why LLMs aren't on the path to AGI. They cannot reason – they 
recite. They by-pass the need for intelligence by leveraging 
memorization instead – on a scale that boggles the mind.

Come "palestra" per misurare davvero l'intelligenza Chollet ha proposto 
lo "ARC Prize" http://arcprize.org cioè la risoluzione di problemi che 
sono semplici per gli esseri umani ma sono resistenti alla 
memorizzazione (quindi difficili per le LLM). Con le sue parole 
(https://x.com/fchollet/status/1800577423853195451)

ARC tasks are easy for humans. They aren't complex. They don't require 
specialized knowledge – a child can solve them. But modern AI struggles 
with them.
Because they have one very important property: they're designed to be 
resistant to memorization.

And ARC is like a flashing red light reminding you that we're missing 
something (intende per l'AGI = Artificial General Intelligence)

Spero contribuisca a far capire meglio quanto Giacomo sta argomentando.

Ciao, Enrico

-- 

-- EN

https://www.hoepli.it/libro/la-rivoluzione-informatica/9788896069516.html
	======================================================
Prof. Enrico Nardelli
Past President di "Informatics Europe"
Direttore del Laboratorio Nazionale "Informatica e Scuola" del CINI
Dipartimento di Matematica - Università di Roma "Tor Vergata"
Via della Ricerca Scientifica snc - 00133 Roma
home page: https://www.mat.uniroma2.it/~nardelli
blog: https://link-and-think.blogspot.it/
tel: +39 06 7259.4204 fax: +39 06 7259.4699
mobile: +39 335 590.2331 e-mail: nardelli@mat.uniroma2.it
online meeting: https://blue.meet.garr.it/b/enr-y7f-t0q-ont
======================================================

--

Re: [nexa] AI Training is Copyright Infringement

Enrico Nardelli

GC F

tags

participants (2)