In Inner Mongolia, a region in northern China, the Mongolian language is facing threats to its preservation. Authorities announced in 2020 that the language would no longer be used as the medium of instruction in schools, sparking fears among ethnic Mongolians that their language would disappear. In response, parents took to the streets to protest. However, these protests were different from the norm in China, as the Mongolian script proved to be a challenge for automated surveillance technologies, allowing protestors to coordinate with relative freedom.

The digitization of most writing systems has been facilitated a standardized code known as Unicode. However, Mongolian script has been poorly encoded, leading to limited usability. Consequently, Mongolian speakers resort to using various incompatible programs or sharing screenshots of text instead of typing in Mongolian. While this may be inconvenient, it also presents a challenge for authorities monitoring and censoring communications, as images are more difficult to monitor than text.

Mongolian is just one of the many “low-resource” languages that struggle to find representation on the internet. As technology advances, these languages may gain better access to online tools and markets. However, this also means they become more visible to surveillance and censorship. There is a trade-off between convenience and freedom from surveillance for these marginalized languages.

Efforts to suppress the Mongolian language within China have only grown stronger since the protests. A machine learning system being developed at Inner Mongolia University aims to enable computers to read Mongolian script, potentially being used for state security projects.

Artificial intelligence (AI) development has primarily focused on major languages with large amounts of available text. Companies like Google and Meta have recently announced projects to develop AI for under-resourced languages. However, the effectiveness and quality of AI output for these languages remain questionable. Language models trained on limited texts often lack cultural specificity and may miss important nuances. AI used for social media moderation in under-resourced languages faces challenges in understanding the cultural, historical, and political complexities of language usage.

Preserving endangered languages in the digital age requires ongoing investment in AI development and sensitivity to cultural nuances. With the advancements in technology, there is hope for marginalized languages to thrive online, but efforts must be made to ensure their preservation and protection.

