We have all, at some point, been fascinated by what technology can do these days. Whether it is Artificial Intelligence (AI) used in fun photography apps or machine learning to detect cancer, the use cases of technology these days extend to every sector. And Bengaluru-based startup, Deepsync, is hoping to leverage these emerging technologies and deep learning to solve a major issue for audio content creators and publishers.
Deepsync’s software allows creators and publishers to clone voices and automatically create content without any manual intervention or recording.
Thanks to the highly affordable data rates and localised content, everyone these days, no matter where they are in the world, can consume the content that resonates with them with ease. But with this comes the exponential demand for content and the need to change and develop the way that content is being produced. Currently, India is still in the stage where the content is produced manually and not only does it take a long time but it’s also a costly affair.
It, thus, becomes extremely hard for the businesses working in the sector to scale without compromising on the quality. And throw in multiple languages into the mix and businesses find it hard today to keep up and match with the demand of the Indian consumers. This is where Deepsync believes it has found a solution.
“We use voice-cloning to essentially produce audio-content in a particular individual’s voice without manual intervention, bringing down cost and time by 90% but at a quality that is comparable to a studio. We capture the voice with all the intricacies from the speed of speaking to emotions. We are aiming at the audio-first market in India,” explained Deepsync cofounder and CEO Ishan Sharma.
Sharma met his cofounder and CTO, Rishikesh Kumar, through GitHub, and incorporated Deepsync in 2018. “We met and instantly connected on the belief that we need to bring this technology to the Indian market. We both believe that India as a nation is driven by stories and thus audio will prove to be a very great medium,” added Sharma.
Since it’s building content-production technology, the startup believes that it isn’t competing with the audio-production softwares in the country.
“Potentially, audio-creators from brands to book publishers who are building for audio will get a better deal with us because we can produce studio level quality at a speed and cost that is 90% faster and cheaper. This means companies such as Gaana (stories), Spotify (podcasts) and Audible will be much better off,” added Sharma.
While the startup is currently only providing its services in English, it wishes to produce high-quality content in more Indian languages and genres. Sharma further told Inc42 that Deepsync’s clients have not been able to distinguish between the cloned voice and the real one in unbiased tests on many voice samples — 95% of people find the cloned and the original voice similar. “We currently work with prominent edtech companies and are in the process of working with audio-creators going forward,” he added.
While startups work on the journey of providing innovative tech solutions to their users, helping them in this journey are cloud infrastructure services providers such as DigitalOcean, who not only offer smart and secure storage for sensitive audio data but also cloud solutions for processing and service delivery. This ensures seamless operations and also helps in building robust infrastructure provisioning, management, maintenance and more.
With voice data being personal for many users, startups have a tedious task of maintaining high levels of security as well. This is crucial as well as resource-intensive if done in-house. Using solutions such as firewalls and server security from DigitalOcean, enabled Deepsync to focus on its product and achieve scalability when it was ready to scale up. Thanks to these factors, Sharma told us that moving to the cloud is a choice every startup needs to make from day one of operations—that’s what Deepsync did.
Having been a part of the DigitalOcean’s global startup programme, Hatch, Sharma said the credits enabled Deepsync to save a lot of costs. After a year of working with DigitalOcean, Sharma credits its affordability and flexibility for continuing with it.
“We have benefited a lot by being a part of DO’s Hatch Programme, one of the most essential parts where it has helped us is in minimising the cost of operations. For instance, one of their programmes allows startups to pay only 50% of the cost for six months, which has been very helpful,” he added.
When asked about the data privacy and security, Sharma explained, “We process terabytes of data and use complex encryption to ensure privacy for our users and their voices.”
Here are a few excerpts from Inc42’s interaction with Sharma which explained Deepsync’s journey and how it is revolutionising the content consumption in India.
Inc42: You are a team of two. How do you manage the operations and the work among the two of you?
Ishan Sharma: We really believe that no team members are better than bad team members because they slow you down. We both have experience in building both technology and products, so we take care of it ourselves. We have spent a lot of time crafting our core AI in the last one year and parallelly spent time speaking to stakeholders in India, which led us to get paying customers. We were part of many prominent groups such as Lightspeed’s Extreme Entrepreneurs programme, where we got to learn a lot.
India does not have a very rich history of technology innovation but I believe that is changing with our generation. We are currently in the process of closing our seed round and plan to expand then. This means hiring smart people who are looking to work on challenging problems.
Inc42: How are you helping your clients exactly? Could you give us a brief idea of the work you have done so far?
Ishan Sharma: Our focus is on audio-production and making it 10x faster and cheaper. We do this by cloning a person’s voice. Currently, we have cloned voices of many voice actors for our clients across multiple languages and accents. Once cloned, they are able to produce the same studio-level quality faster and for far less money. Eventually, this means getting rid of manual production and equipment required to do so.
Inc42: Take us through the journey and challenges of building a startup in the audio production sector, which is fairly new in India.
Ishan Sharma: Challenges for a startup are mostly the same irrespective of the industry but building a deeptech startup in India is definitely difficult, if not outright impossible. India has seen business model innovations but not many technological breakthroughs, fortunately, this is now slowly changing.
Investors, being a source of capital, are now focussing on deeptech which will push startups to take bigger risks and seek bigger rewards.
We have to keep in mind that it’s mostly the exponential technologies (zero to one) that pay off in the long run. We are focussed on content production and use AI to augment production which means building intelligent speech algorithms. We have designed it ourselves from ground-up in the last 1.5 years of operation and are now live with paying customers. We are definitely proud that we did this with a small funding amount of $20K from an accelerator.
Inc42: Deepfakes are all the rage these days. What is your take on this and how do you plan to counter this with your startup?
Ishan Sharma: Technology is as good as the people behind it want it to be. It all comes down to one’s intention. My view is that we are witnessing one of the greatest moments in human history when AI begins to become creative in the areas of video and audio, in art and eventually in science. If we got Da Vinci to today’s world, he would definitely be stunned to witness non-biological systems creating art. Deepfakes are only a small hiccup for a very bright future ahead.
But this does not mean we should not be careful. We must develop a decentralised infrastructure where each consumer can be sure that they are looking at real data. Deepfakes are a blessing in disguise because people can now realise that they need to be more conscious of what they see or hear on the internet. I believe blockchain technology can offer a future where we can put trust into the internet like we were supposed to.
Inc42: Cloud technology is revolutionising things across operations for companies. Tell us how DigitalOcean is helping you on that front?
Ishan Sharma: Cloud is a definite plus that enables startups in providing the best service to their clients as it allows one to focus on their own competencies and not worry about managing data across devices. From the Deepsync perspective, cloud has allowed us to focus on things which we are good at and not reinvent the wheel in managing server and storage complexity. We manage terabytes of data which is not possible to do without cloud. Cloud also offers redundancy for backups and most of all a seamless experience for any consumer around the globe with very little latency.
We use DigitalOcean for many operations ranging from hosting our website to our storage needs. They have helped us a lot, apart from their fantastic cloud technology and redundancy measures (we have never experienced a down-time) we also love the easy-to-use and friendly user interface.
Inc42: The podcast industry is growing at an exponential rate in the global market today. Where does the industry stand in the Indian context?
Ishan Sharma: India has always been a nation driven by stories, and podcasts are just a fancy name for it. Given India’s diverse language and culture, stories are an essential part of our daily routine. The important distinction is that this form of audio-content is aimed at an audience that is willing to pay more and thus understand the value of content.
My view is that audio will eventually overtake video consumption on a per hour basis, which we are seeing signs for today. Imagine listening to an audiobook in the background while you are working on your 8-hour job instead of random music. That’s 30 audiobooks in a month. You are now more intelligent than you were a month ago and all this in the form of passive consumption. You can’t do that with video.
Inc42: With the advent of startups such as yours, how do you see the industry growing and evolving?
Ishan Sharma: Our industry is called synthetic media which is a fancy name for AI-generated media. If you search for it online, you will mostly find things related to Deepfakes. But in the background, companies working on this in forms of audio and video have raised millions of dollars because of their potential.
Imagine this, today all major work from movies to music to art to architecture is done by humans. Once AI is able to create original content, we humans will be highly augmented in all areas of our daily creative work. It’s inevitable.
This article is part of the Inc42 and DigitalOcean ongoing series — “Conversations On Cloud: Driving The New Wave Of Disruption”. As part of the series, we present you the stories of startups that are leveraging cloud computing to create digital disruptions.