I haven’t seen this mentioned in all the chatter about the impossibility of truly open large language models.
I heard about this in an interview with Bruce Schneier.
The Swiss have built a non-exploitative large language model called Apertus:
[quote]
Apertus was developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins.
[/quote]
https://www.swiss-ai.org/apertus
My memory of Schneier’s description is that they used idle cycles on an existing Swiss supercomputer, and I think also used renewable energy.
Technical report here:
https://github.com/swiss-ai/apertus-tech-report/raw/main/Apertus_Tech_Report.pdf
#llm #llms #opensource #opendata #ai #aiethics #generativeAI #genai