Re: [nexa] IA, lavoro, immaginari

Oct. 16, 2023

      Bell’esempio che illustra perché l’Explainable AI non funziona, almeno non come vorrebbero i suoi sostenitori.

La spiegazione che viene fornita è una ricostruzione razionale della risposta, che è simile a quanto fanno gli esperti umani quando devono spiegare le loro risposte.
Infatti nessuno di noi applicherebbe pedissequamente le regole grammaticali per rispondere alla domanda: risponderebbe d’acchito.
Poi, se pressato a spiegare il perché, cercherebbe nelle sue reminiscenze di grammatica della scuola media per una giustificazione.
Anche se chiedi a un medico di giustificare la sua diagnosi, ci mette molte più parole e tempo di quanto ci ha messo per produrla, perché non è in quel modo che l’ha ottenuta.

Il Deep Learning opera a livello di System 1, secondo la terminologia di Daniel Kahneman, ossia di conoscenza “compilata" per una fruizione diretta.

— Beppe
...
On 15 Oct 2023, at 20:48, Guido Vetere <vetere.guido@gmail.com> wrote:
User
la frase "il topo è inseguito dal gatto" è attiva o passiva?
ChatGPT
La frase "il topo è inseguito dal gatto" è in forma passiva.
User
come fai a saperlo?
ChatGPT
Posso determinare che la frase "il topo è inseguito dal gatto" è in forma passiva seguendo alcune caratteristiche grammaticali tipiche delle frasi passive in italiano:
    Struttura della frase: Nelle frasi passive, il soggetto riceve l'azione piuttosto che eseguirla. Nell'esempio fornito, il topo (soggetto) riceve l'azione di essere inseguito.
    Verbo: La frase utilizza una forma del verbo "essere" seguita dal participio passato del verbo principale. In questo caso, "è inseguito".
    Agente della passiva: Nelle frasi passive, l'entità che esegue l'azione è spesso introdotta dalla preposizione "da". In questo esempio, l'agente dell'azione è "il gatto", e viene introdotto appunto con "dal".
Combinando queste caratteristiche, è chiaro che la frase è in forma passiva.
User
per determinare la forma di una frase, tu applichi esattamente le regole che hai appena elencato?
ChatGPT
No, non utilizzo una lista di regole in modo rigido e sequenziale come potrebbe fare un algoritmo tradizionale di elaborazione del linguaggio. Invece, il mio modello è basato su una vasta quantità di dati testuali che ho analizzato durante la mia formazione. Quando "valuto" una frase, riconosco schemi e strutture linguistiche basati su quel vasto corpus di dati.
Tuttavia, quando ti ho fornito quelle caratteristiche grammaticali, ho voluto spiegare in modo esplicito e comprensibile come si può riconoscere una frase passiva in italiano. Era un modo per fornirti una spiegazione chiara e didattica. Nel "pensiero" interno del modello, la determinazione avviene in maniera molto più fluida e basata sul riconoscimento di schemi nei dati.
Per ora è tutto, a voi la linea :-)
G.
On Sun, 15 Oct 2023 at 20:21, 380° <g380@biscuolo.net <mailto:g380@biscuolo.net>> wrote:
...
Buonasera,
Giuseppe Attardi <attardi@di.unipi.it <mailto:attardi@di.unipi.it>> writes:
...
C’è tutta un’area di ricerca, che passa sotto il nome di BERTology,
che analizza la questione e si dimostra ad esempio che dalle relazioni
presenti nella matrici di attention delle frasi, si può ricavare
l’intero albero sintattico della frase:
https://aclanthology.org/N19-1419.pdf
«A Structural Probe for Finding Syntax in Word Representations»
John Hewitt, Christopher D. Manning
Published 1 June 2019
(via
https://www.semanticscholar.org/paper/A-Structural-Probe-for-Finding-Syntax-...)
--8<---------------cut here---------------start------------->8---
Recent work has improved our ability to detect linguistic knowledge in
word representations. However, current methods for detecting syntactic
knowledge do not test whether syntax trees are represented in their
entirety. In this work, we propose a structural probe, which evaluates
whether syntax trees are embedded in a linear transformation of a neural
network’s word representation space. The probe identifies a linear
transformation under which squared L2 distance encodes the distance
between words in the parse tree, and one in which squared L2 norm
encodes depth in the parse tree. Using our probe, we show that such
transformations exist for both ELMo and BERT but not in baselines,
providing evidence that entire syntax trees are embedded implicitly in
deep models’ vector geometry.
--8<---------------cut here---------------end--------------->8---
Interessante, ma se davvero vogliamo parlare di BERTology allora, ripeto
[1] che nel Vol. 8 del Transactions of the Association for Computational
Linguistics del MIT (2020) è pubblicato questo articolo:
«A Primer in BERTology: What We Know About How BERT Works»
by Anna Rogers, Olga Kovaleva, Anna Rumshisky
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-...
che cita il paper sopra tra le fonti per descrivere le caratteristiche
di syntactic knowledge di BERT, mentre più avanti, nel capitolo "3.3
World Knowledge" e "3.4 Limitations" scrive:
--8<---------------cut here---------------start------------->8---
However, BERT cannot reason based on its world knowledge. Forbes et
al. (2019) show that BERT can “guess” the affordances and properties of
many objects, but cannot reason about the relationship between
properties and affordances. For example, it “knows” that people can walk
into houses, and that houses are big, but it cannot infer that houses
are bigger than people. Zhou et al. (2020) and Richardson and Sabharwal
(2019) also show that the performance drops with the number of necessary
inference steps. Some of BERT’s world knowledge success comes from
learning stereotypical associations (Poerner et al., 2019), for example,
a person with an Italian-sounding name is predicted to be Italian, even
when it is incorrect.
3.4 Limitations
Multiple probing studies in section 3 and section 4 report that BERT
possesses a surprising amount of syntactic, semantic, and world
knowledge. However, Tenney et al. (2019a) remark, “the fact that a
linguistic pattern is not observed by our probing classifier does not
guarantee that it is not there, and the observation of a pattern does
not tell us how it is used.” There is also the issue of how complex a
probe should be allowed to be (Liu et al., 2019a). If a more complex
probe recovers more information, to what extent are we still relying on
the original model?
Furthermore, different probing methods may lead to complementary or even
contradictory conclusions, which makes a single test (as in most
studies) insufficient (Warstadt et al., 2019). A given method might also
favor one model over another, for example, RoBERTa trails BERT with one
tree extraction method, but leads with another (Htut et al., 2019). The
choice of linguistic formalism also matters (Kuznetsov and Gurevych,
2020).
[...]
--8<---------------cut here---------------end--------------->8---
Nel 2020 i ricercatori citati sopra sostenevano che BERT non è in grado
di ragionare.
Sullo stesso numero della rivista è pubblicato anche questo articolo:
«What BERT Is Not: Lessons from a New Suite of Psycholinguistic
Diagnostics for Language Models»
by Allyson Ettinger
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00298/43535/What-BERT...
cito dalle conclusioni:
--8<---------------cut here---------------start------------->8---
In this paper we have introduced a suite of diagnostic tests for
language models to better our understanding of the linguistic
competencies acquired by pre-training via language modeling. We draw our
tests from psycholinguistic studies, allowing us to target a range of
linguistic capacities by testing word prediction accuracies and
sensitivity of model probabilities to linguistic distinctions. As a case
study, we apply these tests to analyze strengths and weaknesses of the
popular BERT model, finding that it shows sensitivity to role reversal
and same-category distinctions, albeit less than humans, and it succeeds
with noun hypernyms, but it struggles with challenging inferences and
role-based event prediction—and it shows clear failures with the meaning
of negation. We make all test sets and experiment code available (see
Footnote 1), for further experiments.
The capacities targeted by these test sets are by no means
comprehensive, and future work can build on the foundation of these
datasets to expand to other aspects of language processing. Because
these sets are small, we must also be conservative in the strength of
our conclusions—different formulations may yield different performance,
and future work can expand to verify the generality of these results. In
parallel, we hope that the weaknesses highlighted by these diagnostics
can help to identify areas of need for establishing robust and
generalizable models for language understanding.
--8<---------------cut here---------------end--------------->8---
Allora ripeto la mia domanda: ci sono nuovi studi che dimostrino che le
limitazioni evidenziate nei test sulla competenza logico/linguistica di
BERT siano stati risolti da altri LLM?
Perché credo che sulla _perfomance_ sintattica degli LLM nessuno abbia
proprio nulla da ridire.
Saluti, 380°
[...]
[1] Message-id: 87o7ux2i16.fsf@xelera.eu <mailto:87o7ux2i16.fsf@xelera.eu>
https://server-nexa.polito.it/pipermail/nexa/2022-September/049508.html
-- 
380° (Giovanni Biscuolo public alter ego)
«Noi, incompetenti come siamo,
 non abbiamo alcun titolo per suggerire alcunché»
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org <https://stallmansupport.org/>>.
_______________________________________________
nexa mailing list
nexa@server-nexa.polito.it <mailto:nexa@server-nexa.polito.it>
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa