Decolonizing NLP for “Low-resource Languages”

Applying Abebe Birhane’s Relational Ethics


  • Tolúlọpẹ́ Ògúnrẹ̀mí Stanford University
  • Wilhelmina Onyothi Nekoto Masakhane
  • Saron Samuel Stanford University


Today African languages are spoken by more than a billion people, yet in the world of machine translation and natural language processing (NLP), these are considered “low-resource languages” (LRLs) because they lack the same level of data, linguistic resources, computerization, and researcher expertise as “high resource languages” such as French and English (Cieri, 2016). The reasons African languages remain still “low resource,” however, extend far beyond issues of data availability and instead reflect marginalization in a global society dominated by Western technology (Nekoto et al., 2020). Indeed, of the 7000 languages currently in use worldwide, over 2000 of these are African languages, yet machine translation focuses on a mere 20 global languages (Joshi, et al. 2020). As Africans build data sets for their languages, they continue to struggle to gain agency over their own data and stories (Abebe, et al., 2021). Given the history of African colonialism and its linguistic domination, Dr. Abebe Birhane’s article, “Algorithmic Injustice: A Relational Ethics Approach,” (Birhane, 2020) offers an important framework for developing machine translation for “low-resource” African languages. Our response to Birhane considers the impact of NLP on Africa, and applies Birhane’s ethics to support the project of decolonization of African data and data subjects.