Whether you use Google or Bing or any other search engine, Wikipedia is one of the websites that’s always close to the top of search results. This makes the website a primary source of information for a lot of internet users. In fact, over the years, ‘The free encyclopedia’ has had a profound impact on how topics are discussed online.
The platform provides exhaustive content in many languages, including English which has more than 50 Mn articles, with 563 articles added every day. But not every language is as lucky and some only have a few articles on Wikipedia for native speakers of those languages.
For example, Wikipedia’s content in Indian languages is rather limited. Nevertheless, numerous people across the country participate in Wikipedia ‘editathon’ where they add new content in regional languages every day.
In India, Wikipedia’s parent organisation Wikimedia had collaborated with Bengaluru-based Centre for Internet and Society-Access to Knowledge (CIS-A2K) in 2017 to start ‘Project Tiger’ with the aim of generating more content for Wikipedia in Indian regional languages.
The project kicked off in December 2017, and its first phase was completed in May 2018. Another phase of this project is scheduled to begin this month. With the help of contributors across the country, this project has generated content in several Indian languages such as Tamil, Telugu, Malayalam, Kannada, Marathi, Hindi, Bengali, Punjabi, Odia, Urdu, and Gujarati.
The Google Angle In Wikipedia Translations
While Wikipedia is run by an independent non-profit, it does work closely with Google to identify pages that need quicker updates or more translations based on search volumes. Google has had a say in ‘Project Tiger’ as well. It collaborated with Wikimedia Foundation and its partner organisation CIS-A2K to give a Chromebook to volunteers so that the task of contributing content to Wikipedia becomes easier. Besides, the Wikimedia Foundation offers contributors a stipend for an internet connection.
However, Google’s interest is not about translating all pages for Indian regional language internet users. At least not right now. The company has provided a list of articles which are searched more frequently by Indian regional language users. Based on this list, Indian Wikipedia contributors are tasked with translating popular articles in Indic languages.
“The list of articles that were provided were mostly trending articles on Google. However, we did a consultation, and we found out that a few active editors didn’t participate because they found the list to be irrelevant and later a local list was provided,” said Gopala Krishna A, a community advocate who has been associated with Project Tiger.
To address this gap, Google will provide a list of articles in different languages while Project Tiger will also translate some articles selected by the community.
“For Kannada, contributors may want to contribute articles on, say, places to see in Bengaluru. For Tamil people, it might be Tamil literature. So for the next phase, the contributions will also be based on the subjects of interest of the contributor,” added Gopala.
CIS-A2K told Inc42 that it has not yet received the grant for the next phase, but is expecting it soon. Currently, with the help of over 200 participants, there have been more than 4,400 articles contributed to Wikipedia in Indian languages that have garnered more than 2 Lakh expanded pageviews.
Filling The Gender Gap
Another major problem in Wikipedia’s latest translation efforts in India is the choice of topics, or rather the decision to not focus on certain topics. Dhanushree, an independent Wikipedia contributor based in Maharashtra who used to contribute to Wikipedia, feels that there is a gender gap in the translation community, and in the way articles or webpages are assigned for translation under Wikipedia or Project Tiger.
“I was looking at the Wikipedia Marathi content for a while, and I felt that the main area where people were not looking at were female sexual health and menstrual health. Some topics that which are considered taboo also need to be addressed,” Dhanushree said.
To address the gender gap, the Wikimedia Foundation has been trying to bring more women contributors to the platform to contribute more on topics related to women. Besides, the Wikipedia community is addressing the challenges of representation by budgeting money and resources for outreach to encourage people from underrepresented communities to edit Wikipedia.
“Starting from about 2014 there has been a transition to encourage all projects to improve the diversity of the contributor base and broaden the scope of content covered. Addressing the gender gap includes better representation of women and the LGBT+ community among editors and also the articles within Wikipedia. Wikipedia is not a wealthy organization that can spend money to fix all problems, but with the money and resources that we have, we allocate it in this way,” said Lane Rasberry, a representative of Wikimedia.
Gopala further explained how the organisation working towards bridging the gender gap. “In the last phase of Project Tiger, 15 out of 50 Chromebooks and 12 out of 44 internet connections were distributed to women contributors. different contributors out of 270+ requests. Out of that, we provided 15 Chromebooks to females. Out of 44 internet connection we provided, 12 of them are female. And any women female contributors can sign up for Project Tiger,” explained Gopala.
“In the last phase, we encouraged writing thematic topics related to women. There were also topics like endometrial cancer, infertility, and others. In the next phase, we will make sure to include more women- related topics,” Gopala said.
Why Focus On India Regional Languages?
Over time, apps and websites have adapted themselves to the linguistic diversity of India, adding many Indian regional languages to their platforms in the form of original content or translations or native apps.
The Indian regional language internet market is a big opportunity for startups, as around 69% of the Indian population lives in rural India, where only a fraction know and converse in English. In the 2001 census figures, around 10% of the entire population understood or spoke English — a majority of them residing in urban India. Hindi was used by less than half the population — 43.5% — so the rest is made up by speakers of languages other than Hindi and English, and given India’s population of over 1.3 Bn, that’s a sizeable number.
International OTT players are heavily investing in regional language content. In Q1 2019, digital media and entertainment startups received $167 Mn of funding through 11 deals, on the back of their commitment to the Indian market. US companies Netflix and Amazon Prime have turned their focus to Indian language content as regional language internet users are expected to grow to 536 Mn by 2021.