Salve Nexa, vi segnalo un nuovo thread [0] avviato dal community manager di OSI in cui si invita la comunità ad elencare i problemi ancora presenti nella bozza 0.0.9 della Open Source AI definition [1]. Il thread è titolato "We heard you: let’s focus on substantive discussion". [2] Si è rivelato piuttosto interessante, a tratti sorprendente, soprattutto per l'analisi dei meccanismi decisionali del gruppo di lavoro, tutt'altro che convenzionali. [3] Al momento comunque, i problemi emersi sono: - Data transparency: The data used to train an AI system should be openly available, as it’s essential for understanding and improving the model. - Pretraining dataset distribution: The dataset used for pre-training should also be accessible to ensure transparency and allow for further development. - Dataset documentation: The documentation for training datasets should be thorough and accurate to address potential issues. - Versioning: To maintain consistency and reproducibility, versioned data is crucial for training AI systems. - Open licensing: Data used to train Open Source AI systems should be licensed under an open license. - Reproducibility: an Open Source AI must be reproducible using the original training data, scripts, logs and everything else used by the original developer. - Inherent user (in)security: without access to the whole training data, it’s possible to plant undetectable backdoors in machine learning Models. - Implicit or Unspecified formal requirements: if ambiguities in the OSAID will be solved for each candidate AI system though a formal certificate issued by OSI, such formal requirement should be explicitly stated in the OSAID. - OSI as a single point of failure: since each new version of each candidate Open Source AI system world wide should undergo to the certification process again, this would turn OSI to a vulnerable bottleneck in AI development, that would be the target of unprecedented lobbying from the industry. - Open Washing AI: any definition that a black box could pass would both damage the credibility the whole open source ecosystem, and open a huge loophole in the european normative (the AI Act). Tutti i problemi in questione sono ampiamente documentati nel thread o negli altri thread collegati, tuttavia se avete osservato altri problemi o se voleste commentare su di essi, vi suggerisco di proporli al più presto. Giacomo PS: Guarda caso, tutti i problemi emersi sono risolvibili richiedendo la disponibilità dei dati di training, come proposto nel thread chiuso dallo stesso comunity manager dopo avermi silenziato [4] [0] https://discuss.opensource.org/t/we-heard-you-lets-focus-on-substantive-disc... [1] https://opensource.org/deepdive/drafts [2] dice proprio "ascoltare", ma alcuni utenti sono ancora silenziati [3] https://discuss.opensource.org/t/we-heard-you-lets-focus-on-substantive-disc... [4] https://discuss.opensource.org/t/rfc-separating-concerns-between-source-data...