Non sono i riferimenti bibliografici non sono integrati, ma non sono nemmeno integrabili, per lo meno all'interno del modello generativo di testo che è probabilistico, e non basato su alcuna banca dati.

Aggiungerei che 'correggere' gli errori del modello vuol dire lavorare gratis per migliorarlo. Si può anche scegliere di non farlo o fare l'opposto.


Trovo che sia molto interessante e tempestiva la policy di Wikipedia sull'uso dei LLM (large language models).

Immaginate quanto inquinante possa essere un generatore di pagine Wikipedia basato su questi metodi.

Ci sono spunti interessanti per una regolazione.


<https://en.wikipedia.org/wiki/Wikipedia:Large_language_models>

LLM risks and pitfalls

This clarifies key policies as they pertain to LLM application on the project, i.e. how the latter generally presents an issue with respect to the former, mostly when creating encyclopedic content is concerned.

As the technology continually advances, it may be claimed that a specific large language model has reached a point where it does, on its own, succeed in outputting text which is compatible with the encyclopedia's requirements, when given a well engineered prompt. However, not everyone will always use the most state-of-the-art and the most Wikipedia-compliant model, while also coming up with suitable prompts; at any given moment, individuals are probably using a range of generations and varieties of the technology, and the generation with regard to which these deficiencies have been recognized by the community may persist, if in lingering form, for a rather long time.

Using LLMs

Generating text

LLMs are assistive tools, and cannot replace human judgment.

Articles

LLMs are likely to make false claims. Their output is only a starting point, and must be considered inaccurate until proven otherwise. You must not publish the output of an LLM directly into a Wikipedia article without rigorously scrutinizing it for verifiability, neutrality, absence of original research, compliance for copyright, and compliance with all other applicable policies. If an LLM generates citations, you must personally check that they exist, and that they properly verify each statement. The use of language models must be clearly disclosed in your edit summary.

Even if you find reliable sources for every statement, you should still ensure that your additions do not give undue prominence to irrelevant details or minority viewpoints. You should ensure that your LLM-assisted edits reflect the weight placed by reliable sources on each aspect of a subject. You are encouraged to check what the most reliable sources have to say about a subject, and to ensure your edit follows their tone and balance.

Especially with respect to copyrights, editors should use extreme caution when adding significant portions of AI-generated texts, either verbatim or user-revised. It is their responsibility to ensure that their addition does not infringe anyone's copyrights. They have to familiarize themselves both with the copyright and sharing policies of their AI-provider.

Drafts

If an LLM is used to create the initial version of a draft or userspace draft, the user that created the draft must bring it into compliance with all applicable Wikipedia policies, add reliable sourcing, and rigorously check the draft's accuracy prior to submitting the draft for review. If such a draft is submitted for review without having been brought into compliance, it should be declined. Repeated submissions of unaltered (or insufficiently altered) LLM outputs may lead to a revocation of draft privileges.

Talk pages

While you may include an LLM's raw output in your talk page comments for the purposes of discussion, you should not use LLMs to "argue your case for you" in talk page discussions. Wikipedia editors want to interact with other humans, not with large language models.

Be constructive

Wikipedia relies on volunteer efforts to review new content for compliance with our core content policies. This is often time consuming. The informal social contract on Wikipedia is that editors will put significant effort into their contributions, so that other editors do not need to "clean up after them". Editors must ensure that their LLM-assisted edits are a net positive to the encyclopedia, and do not increase the maintenance burden on other volunteers. Repeated violations form a pattern of disruptive editing, and may lead to a block or ban.

Do not, under any circumstances, use LLMs to generate hoaxes or disinformation. This includes knowingly adding false information to test our ability to detect and remove it. Repeated misuse of LLMs may be considered disruptive and lead to a block or ban.

Wikipedia is not a testing ground for LLM development. Entities and people associated with LLM development are prohibited from running experiments or trials on Wikipedia. Edits to Wikipedia are made to advance the encyclopedia, not a technology. This is not meant to prohibit editors from responsibly experimenting with LLMs in their userspace for the purposes of improving Wikipedia.

Declare LLM use

Every edit which incorporates LLM output must be marked as LLM-assisted in the edit summary. This applies to all namespaces. If you make significant LLM-assisted changes (a paragraph or more) to an article or draft, add the – {{AI generated notification}} – template to its talk page, in addition to mentioning your use of an LLM in your edit summary.

Additionally, AI providers may have their own policies requiring in-text attribution at the bottom of the page, not just attribution in the edit summary. A template is currently available for providing attribution to OpenAI – {{OpenAI|[GPT-3, ChatGPT etc.]}}.[a]

Experience is required

LLM-assisted edits should comply with Wikipedia policies. Before using an LLM, editors should have substantial prior experience doing the same or a more advanced task without LLM assistance.[b] Editors are expected to familiarize themselves with a given LLM's limitations, and to use careful judgment to determine whether that LLM is appropriate for a given purpose. Inexperienced editors should be especially careful when using these tools; if needed, do not hesitate to ask for help at the Wikipedia:Teahouse.

Editors should have enough familiarity with the subject matter to recognize when an LLM is providing false information – if an LLM is asked to paraphrase something (i.e. source material or existing article content), editors should not assume that it will retain the meaning.

High-speed editing

Human editors are expected to pay attention to the edits they make, and ensure that they do not sacrifice quality in the pursuit of speed or quantity. For the purpose of dispute resolution, it is irrelevant whether high-speed or large-scale edits that a) are contrary to consensus or b) cause errors an attentive human would not make are actually being performed by a bot, by a human assisted by a script, or even by a human without any programmatic assistance. No matter the method, the disruptive editing must stop or the user may end up blocked. However, merely editing quickly, particularly for a short time, is not by itself disruptive. Consequently, if you are using LLMs to edit Wikipedia, you must do so in a manner that complies with Wikipedia:Bot policy, specifically WP:MEATBOT.

Productive uses of LLMs

If you are using LLMs to edit Wikipedia, you must overcome their inherent limitations, and ensure your edits comply with relevant guidelines and policies.

Despite the aforementioned limitations of LLMs, it is assumed that experienced editors may be able to offset LLM deficiencies with a reasonable amount of effort to create compliant edits for some scenarios:

Riskier use cases

The following use cases are tolerated, not recommended, since they pose higher risks (see the §LLM risks and pitfalls section). They are reserved for experienced editors, who take full responsibility for their edits' compliance with Wikipedia policies:

Handling suspected LLM-generated content

Identification and tagging

Editors who identify LLM-originated content that does not to comply with our core content policies should consider placing {{AI-generated|date=February 2023}} at the top of the affected article or draft, unless they are capable of immediately resolving the identified issues themselves.

This template should not be used in biographies of living persons. In BLPs, such non-compliant content should be removed immediately and without waiting for discussion.

Verification

All known or suspected LLM output must be checked for accuracy and is assumed to be fabricated until proven otherwise. LLM models are known to falsify sources such as books, journal articles and web URLs, so be sure to first check that the referenced work actually exists. All factual claims must then be verified against the provided sources. LLM-originated content that is contentious or fails verification must be removed immediately.

Deletion

If removal as described above would result in deletion of the entire contents of the article, it then becomes a candidate for deletion. If the entire article appears to be factually incorrect or relies on fabricated sources, speedy deletion via WP:G3 (Pure vandalism and blatant hoaxes) may be appropriate.

Citing LLM-generated content

For the purposes of sourcing: It is assumed that any LLM-generated material is not reliable, unless it appears from the circumstances of publication that it is significantly a human work insofar an entity with a reputation for fact-checking and accuracy took care that the output was modified in every way needed to ensure that the work meets a usually high standard.

Any source (work) originating from entities (news organizations etc.) known to generally produce content using LLMs, for which there is no clear indication of human involvement or lack thereof, especially a publication which attempts to deceive readers by crediting content that appears to be primarily LLM-generated to human authors (named, unnamed, or fictitious), should be treated as unreliable.


On 19/02/23 19:51, Andrea Bolioli via nexa wrote:
Oggi ho avuto un'altra brutta sorpresa da ChatGPT e GPT-3 che vi segnalo: ha inventato riferimenti bibliografici a libri e articoli inesistenti, combinando autori, titolo, anno, rivista, editore a caso non corretti, a volte inesistenti.
Non riporto i dialoghi, che ho salvato. 
All'inizio mi hanno fatto ridere, sembravano le risposte di un simpatico cialtrone... ( - "Puoi indicarmi dei riferimenti a libri e articoli scientifici che parlano del tema XX?" - "Certamente! Bla bla bla"). Ho scritto a GPT che si sbagliava, si è scusato più volte e mi ha proposto altri riferimenti a libri e articoli, alcuni corretti, alcuni inesistenti. Ho provato in italiano e inglese, stesso comportamento.
Ho lasciato perdere.
Questo tipo di errore non me l'aspettavo, perché non è molto difficile controllare la correttezza (o perlomeno l'esistenza) dei riferimenti bibliografici. Evidentemente non era tra le priorità di OpenAI finora, non avranno ancora integrato banche dati bibliografiche?

Buona serata,
AB






Il giorno sab 18 feb 2023 alle ore 17:34 Alessandro Brolpito <abrolpito@gmail.com> ha scritto:
Grazie Guido per la chiarezza del tuo messaggio che esprime in una estrema sintesi l'ardire dei LLM, ma anche i limiti umani che abbiamo e che ho in prima persona nel maneggiare informazioni e ragionamenti.
Certo, io posso fare pochi danni mentre un sistema LLM su Internet è tutta un'altra potenza di fuoco.

Ma è un dato di fatto che i "dati" e la loro indicizzazione saranno sempre di più, e sempre più sofisticate: le buone domande saranno sempre più importanti delle risposte o meglio saranno importanti per avere delle risposte ragionevoli a chiunque saranno indirizzate. 

Alla resistenza vorrei aggiungere l'importanza dello sviluppo del pensiero critico, da coltivare nel percorso educativo, sin dall'inizio, dal 0-6 in avanti.
Ed è qui che si deve agire e con alcuni amici in lista ci si stiamo riflettendo sopra sul come.

Alessandro



innovation.h-farm.com / Linkedin

Roncade, H-FARM Campus, Via Olivetti, 1 – 31056 (TV)
Milano, Corso di Porta Romana, 15 – 20122
Torino, Via San Quintino, 31 – 10121

Our privacy policy


_______________________________________________
nexa mailing list
nexa@server-nexa.polito.it
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa