Re: [nexa] AI based GitHub copilot

July 3, 2021

      scusate, adesso la smetto ma ho un'ultima info da aggiungere

380° <g380@biscuolo.net> writes:

[...]
...
...
...
Immagino che in MS abbiano ragionato con attenzione sui profili legali
della cosa.
Si direbbe di no.
Sicuro Giacomo? Siamo sicuri che i termini d'uso di GitHub non includano
clausole che consentano a Microsoft (e eventuali terzi) di utilizzare il
codice ivi caricato per utilizzarlo come dati di training del loro
OpenAI (GPT-3)?
https://docs.github.com/en/github/site-policy/github-terms-of-service#4-lice...

dice:

--8<---------------cut here---------------start------------->8---

 This license includes the right to do things like copy it to our
 database and make backups; show it to you and other users; parse it
 into a search index or otherwise analyze it on our servers; share it
 with other users;

--8<---------------cut here---------------end--------------->8---

https://docs.github.com/en/github/site-policy/github-terms-of-service#7-mora...

--8<---------------cut here---------------start------------->8---

However, you waive these rights and agree not to assert them against us,
to enable us to reasonably exercise the rights granted in Section D.4,
but not otherwise.

To the extent this agreement is not enforceable by applicable law, you
grant GitHub the rights we need to use Your Content without attribution
and to make reasonable adaptations of Your Content as necessary to
render the Website and provide the Service.

--8<---------------cut here---------------end--------------->8---

Quindi (IANAL) Copilot analizza lo "user-generated content" (codice
sorgente compreso ovviamente), PUÓ ignorare l'attribuzione (diritto
morale) e condividere il codice con gli altri utenti attraverso il
plugin VSCode: è sufficiente per permettere loro di addestrare la rete
neurale e "fregarsene" del fatto che il modello risultante possa essere
opera derivata?

In questo articolo https://fosspost.org/github-copilot/
«Should GitHub Be Sued For Training Copilot on GPL Code?» leggo:

--8<---------------cut here---------------start------------->8---

That’s why we see that regardless of whether US courts see it as fair
use or not, it is OK from an ethical point to use publicly available
data to everyone to train a computational model to provide a service to
users, whether for free or profit. Since this data is normally
accessible to the everyday end-user then there should be nothing that
prevents a computational AI or bot from accessing it as well.

--8<---------------cut here---------------end--------------->8---

Quindi il ragionamento è: siccome POTENZIALMENTE potrebbero farlo tutti
noi non vediamo cosa ci sia di male.

Un indizio ve lo do io: il TRACKING dell'utente attraverso il plugin
VScode E il fatto che il tutto sia SaaS (con tanto di termini d'uso e di
privacy), ENTRAMBI indispensabili per usare il servizio e attraverso il
quale GitHub arricchisce la già enorme mole di dati che estorce... ehrm
acquisisce dai propri utenti.

Cioè: non stiamo parlando di una "appliance" da acquistare e "mettersi
in casa" per farsi aiutare a trovare gli snippet di codice (ammesso e
non concesso sia una roba utile), stiamo parlando - ANCORA - di regalare
un sacco di dati ai soliti noti.

In altre parole: OK mettere in discussione la liceità di ciò che fa
GitHub in merito alle licenze libere e al copyright, ma per favore non
tralasciamo quello che a mio modesto avviso è il VERO nocciolo della
questione.

[...]

Saluti, 380°

-- 
380° (Giovanni Biscuolo public alter ego)

«Noi, incompetenti come siamo,
 non abbiamo alcun titolo per suggerire alcunché»

Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>.