After the Cambridge Analytica scandal and numerous allegations of ignoring user privacy and rampant data collection, Facebook is once again caught in a controversy. This time around, the problem is due to the company not informing users about what it actually does with their posts, videos and photos.
For a long time, Facebook has admitted that it runs analysis and puts user content through data processing to improve the main feed as well as the kind of posts that users will see. A new report this week says that it’s not machines reading your posts.
According to a Reuters report, at least 260 third-party workers scan through user content on Facebook, and that’s just from one company in one location in India. These workers label images, status updates and other posts to understand the context of the post and surface related ads and improve the main News Feed algorithm.
The social network also gets these works to label private posts according to the report. Labelling or data annotation is one of the fastest growing industries as companies race to train AI and machine learning systems. Employees label images, text, logos, and symbols to help computers understand context and contents of the image or text. This is then used to develop consumer-facing AI features such as text OCR or object recognition inside camera apps.
Facebook’s Year of Controversies
Facebook has not yet recovered from the Cambridge Analytica scandal which broke out in early 2018 but had been brewing in the background since mid-2017.
Before 2016, British marketing analytics firm Cambridge Analytica allegedly mined data without user consent through an innocuous Facebook app, which was subsequently used by Russia’s Internet Research Agency (IRA) and the 2016 presidential campaign of Donald Trump to run ads, promote disinformation, and spread fake news and misleading content. Trump’s Facebook campaigns proved instrumental in leading him to the White House.
Many aspects of Facebook’s involvement with Cambridge Analytica are still under investigation. Facebook and CEO Mark Zuckerberg were grilled by legislators and lawmakers in the US and UK over its involvement with the British company.
Facebook has tried every step to recover from the PR nightmare that was the Cambridge Analytica scandal. Zuckerberg, at the F8 conference last week, said that Facebook is going to emphasise privacy in its features and services this year onwards. However, the Reuters reports suggest the company has not revealed the full scale of what it does with user data.
How Is Facebook User Data Being Labelled?
The report says that Indian company Wipro has a contract with Facebook for the labelling operations. Wipro runs this operation from its Hyderabad office and 260 ‘workers’ have been hired by Facebook, through Wipro, to manually tag or label photos, posts, links shared on timeline, stories and videos into several category items according to the ‘five dimensions’ that Facebook considers key for its AI datasets.
The datasets are used to improve the AI and machine learning algorithms on Facebook’s platform to improve content and ads suggestions. However, Facebook is running human labelling with no consent from the user. Although it may in the grey area of legality, annotation/labelling of personal posts on Facebook is certainly unethical on the grounds that a user has no idea their private posts are being read by other humans, who may also be Facebook users.
A Facebook spokesperson told Inc42 that “We’re building AI systems that help people across a variety of Facebook products, from reducing policy violating content to helping people with visual impairments connect better with their friends and family. Labelled data is important to train the models that make this possible.”
Further, on who has access to this data, it says “We provide information and content to vendors and service providers who support our business, such as by providing technical infrastructure services, analyzing how our products are used, providing customer service, facilitating payments or conducting surveys.”
To clarify to what extent data is shared with the likes of Wipro, the Facebook spokesperson further told Inc42, “We treat the privacy of our users with utmost importance when labelling content for the purposes of improving the user experience for all that use our products.”
What is Data Annotation
As more and more companies embrace AI, the data which goes into teaching the AI system is increasingly becoming proprietary. In such a scenario, the need for such ‘data annotation’ companies is only set to increase with time.
One such company is iMerit, a data-training startup which counts eBay, Getty Images and Microsoft as its clients. Over 1400 employees working in iMerit around the world are trained to label photos on behalf of the clients in a way which eliminates bias. iMerit’s 90% clients are US-based. Another company based out of Kerala, Infolks, which started just three years ago with an investment of INR 25K, now has enough cash to employ 200 people.
The industry, in its early stage is presenting a lot of employment or semi-employment opportunities in tier 2 and tier 3 cities as most of the annotation/labelling work is outsourced by major companies developing AI.
In March, the tech industry body, Nasscom’s senior vice president and chief strategy officer, Sangeeta Gupta said that “This is an emerging sector… in India and everybody has begun to realise the humongous opportunity it presents”
Commenting on the huge opportunity annotation presents in India she further added that “AI requires properly annotated, classified and anonymised data. For this, whether you like it or not, you will use automation but you will also have to use skilled human workforce, and that is the opportunity it presents for India.”
(This story has been written by Ankur Bhardwaj and Nikhil Subramanium)