[nexa] Factual error rate for the 4 new ChatGPT models

July 6, 2025

      Suppongo si tratti degli stessi modelli utilizzati a scopi militari (<https://defensescoop.com/2025/01/16/openais-gpt-4o-gets-green-light-for-top-...>).

Open AI study shows factual error rate for the 4 new ChatGPT models, with hallucinations getting much worse: 48- 90%.

GPT-4o-mini: 8.6% correct answers, 0.9% unanswered, and 90.5% incorrect.

01-mini: 8.1% correct answers, 28.5% unanswered, and 
63.4% incorrect.

GPT-40: 38.2% correct answers, 1.0% unanswered, and 60.8% incorrect.

01-preview: The top performer, with 42.7% correct answers, 9.2% unanswered, and 48% incorrect.

link here:
<https://openai.com/index/introducing-simpleqa/>

La segnalazione è di Ewan Morrison:
<https://xcancel.com/MrEwanMorrison/status/1941627600096366662>

[nexa] Factual error rate for the 4 new ChatGPT models

Daniela Tafani