Here’s a summary of Microsoft’s new investments in European languages and culture for AI, according to the Neowin article: Key Points:
Preserving European Linguistic and Cultural Heritage
Microsoft announced two major initiatives in Paris to better represent Europe’s diverse languages and cultural assets in large language models (LLMs).
The initiative is part of Microsoft’s broader European Digital Commitments to expand AI/cloud infrastructure, strengthen data privacy, and support Europe’s digital competitiveness.
Problem: English Dominance in AI Training Data
Europe has over 200 languages with rich cultural legacies, but online content and AI training data are skewed towards English and American perspectives.
For example, the Llama 3.1 model scores over 15 points lower in Greek and 25 lower in Latvian compared to English.
Microsoft’s Response
Microsoft will set up teams in Strasbourg, France, at its Open Innovation Centre and AI for Good Lab to develop and curate multilingual datasets on Microsoft Azure.
The focus is on expanding training data in ten under-represented European languages, including Estonian, Alsatian, Slovak, Greek, and Maltese.
There’s a call for proposals to source digital texts, transcripts, etc. for AI development, with grants (Azure credits + technical support) from September 1st, 2025.
Cultural Project: Digital Replica of Notre Dame
This autumn, Microsoft will extend its Culture AI program to create a high-fidelity digital replica of Notre Dame Cathedral in collaboration with the French Ministry of Culture and Iconem.
Previous Culture AI projects include digital preservation of Ancient Olympia (Greece), Mount Saint-Michel (France), St Peter’s Basilica (Rome), and Normandy’s Allied landing sites.
Microsoft’s Language Experience
Microsoft has over 40 years of language localization experience.
Windows supports over 90 languages (all EU official languages + regional languages like Basque, Catalan, Galician, Luxembourgish, Valencian).
Microsoft 365 Office interfaces are available in 30+ European languages.
Approach
Microsoft states these steps are purely supportive, contributing open data, tools, and expertise, not proprietary assets.