Every couple of months you can expect a collage of plain-looking portraits of human faces driving up chatter on social media with the headline asking you to guess which among them are real people and which are generated by AI.
The technique which allows computer engineers to create these kinds of lifelike faces was invented in 2014 by Ian J Goodfellow, an AI researcher who at that point used to work at Google Brain and now is the director of machine learning at Apple.
Goodfellow’s seminal paper showed how Generative Adversarial Networks or GaNs can employ two AI models to play a game of cat and mouse — with one trying to create an image and the other trying to detect a true one from the computer generated one — to iteratively make AI-generated media more lifelike.
Building on this technique, Bengaluru-based startup Rephrase.ai is looking to help businesses create personalised marketing videos through a SaaS dashboard. Rephrase’s tech uses pre-recorded audio and video clips of models to create multimedia based on the user preferences.
The synthetic media startup was founded in March 2018 by Ashray Malhotra, Nisheeth Lahoti and Shivam Mangla. This is the second entrepreneurial venture of Malhotra and Lahoti, the first was SoundRex, a startup that worked on decreasing latencies in audio transmission using machine learning algorithms. Mangla was a software engineer at Facebook’s Menlo Park headquarters before founding Rephrase with his IIT Bombay batchmates.
“Nisheeth mentioned to me that he wanted to build an engine which can take text as input and create a Hollywood level movie as output so that we can just take a script and create a movie without ever shooting anything,” said Ashray Malhotra, who is also the CEO of Rephrase.ai.
While that is an ambition far out in the future when the technology advances enough, the startup’s decided to focus on the short-term monetisable goal of automating the creation of videos for businesses where just a front-facing model speaks into a camera.
A SaaS Dashboard Meets Video Automation
Rephrase.ai’s tech is built around existing footage where models for 10 minutes speak directly to a camera. This helps map facial expressions and lip movements to the speech — at 30 frames per second, this creates a database of 18,000 frames for each model. The startup’s ML algorithms based on GaNs, created by a team of six deep learning engineers, use predictive analytics to overlay any audio file on any video in the database to generate a lifelike video.
This means that a user can just select any model on the SaaS dashboard and type the text which the character will then speak. On top of this base of AI-generated video, a user can also select costumes, custom backgrounds, caption placements, and choose audio from a menu of 40 languages.
“In general circumstances, what you would have to do is first conduct model auditions, then get a cinematographer, rent a studio, shoot and then do post production shoot editing. We eliminated all of those tasks and made the entire process only a few minutes long,” said Malhotra.
Currently, the ease of video creation that Rephrase.ai offers has been adopted for four use cases — the primary function of marketing videos where clients can send out personalised marketing videos to prospective customers, creation of edtech video courses, converting text chatbots into video chatbots and A/B testing of advertising videos.
Malhotra said that Rephrase.ai has created a video course on edtech platform Udemy to showcase what it does in the sector. “Creating video content is a big cost for an edtech company which limits the scalability of the process. We can create hours of video content in just a matter of a few minutes with a simple input which can help edtech companies scale really well,” he claimed.
In digital marketing, budgets are often small and companies don’t have enough time and resources to test out different versions of an ad campaign. This is where Rephrase.ai found a problem it could fix. “We help companies create multiple versions of the same video at a very reasonable cost and very, very quickly. And they are then able to run all of these different videos on social media and see what works best for them,” said the founder.
While pooling in revenues from these use-cases, the company plans to further develop its AI algorithms to progressively add more variations such as multiple emotions, hand gestures, head movements, different camera angles and dramatically different backgrounds.
“With each of these small tech jumps, you’re going to see us get a step closer to the virtual movie idea,” Malhotra said.
Facing Competition In Two Markets
Since Rephrase.ai has its eyes mainly on the marketing content use-case, it has to contend with competition from established players such as Vidyard, Loom, Wochit, Animoto and Wibbitz, among others. Though these platforms too automate video creation at scale with multiple variations such as text, background tracks, visual templates etc, they can’t be used to generate videos of human models using AI.
On the synthetic media front, a London-based startup Synthesia, founded in 2017, offers the same set of features as Rephrase.ai. Moreover, it also boasts of a funding kitty of over $4 Mn and a stellar list of clientele such as Reuters, WPP, McCann, BBC and Accenture. In comparison, the Bengaluru-based startup has raised $1.5 Mn in seed funding led by Lightspeed Ventures and AV8 Ventures — and has yet to reveal any major clients.
However, generative AI startups are still in the early stages and most of these offer text, voice or animation-based solutions. While GPT-3 is the best example of the text generating AI, voice synthesis is something that companies like IBM, Google, Amazon and Apple have mastered over time with their text to speech engines.
Using AI to create human-like characters is generally done by creating a 3D character and projecting movements of body parts on it, which leads to the uncanny valley problem where the models look very close to real humans and yet there’s a difference which can make people feel uneasy.
The way this problem is dealt with is VFX where multiple camera angles are shot of a model and then the footage is reconciled with the animated character which turns out to be an expensive affair. “Now instead of something which used to cost a hundred thousand dollars, and require dedicated VFX artists, we’re able to make it affordable for everyday use cases across a bunch of other different industries and across a bunch of different sized companies,” said Malhotra.
Dealing With The Danger Of Fake Videos
Every company that creates media using AI is faced with the problem of deep fakes — content that can be real to the point of misleading the audience into believing in its authenticity. According to a Georgetown University report, the Russian disinformation ploy in the 2016 US presidential election could be a template for both foreign governments and non-state actors to use AI/ML deepfakes that destabilise countries.
Since the accepted definition of deepfakes is limited to swapping facial expressions, body movements or dialogue between two characters, Rephrase.ai technology doesn’t technically fall under the deepfake category.
But what’s to stop a user from manipulating hundreds of hours of camera facing footage and audio tracks of political leaders to create a fake video? Malhotra said that the startup doesn’t allow any political content to be created on its platform and businesses are made to sign an agreement to not use it for any ulterior motives.
Besides, all requests for video creation are monitored by the company as of now and as the number of videos produced on the platform scales up, Rephrase.ai would deploy technology to track all the content generated. Moreover, though companies are allowed to onboard footage of their own sales representatives to the platform, each of those are vetted by the startup.
“As founders we left our earlier jobs because we see that there’s a big enough commercial value for good uses of this technology. And that’s where we want to build the company,” said Malhotra.