The last decade of tech was to a large part defined by the advent of Deep Supervised Learning (DL). The availability of cheap data at scale, computational power, and researcher interest have made it the de-facto school of algorithms used for most pattern recognition problems. Face recognition on social media, product recommendations on sites, voice assistants like Google Assistant, Alexa, and Siri are some examples largely powered by DL.
The issue with deep learning is that the resources that led to its rise are also giving rise to inequities. Today, it is tough for startups to beat ‘big tech’ like Apple, Google, Amazon, and Microsoft in deep learning through better research capabilities or better data.
My prediction that in the 2020s, we shall see this inequity broken down. This shall be due to the rise of Deep Reinforcement Learning (RL) as a prominent algorithm for such problems.
RL, in essence, is mimicking what humans do. Let’s take the example of a kid learning to ride a bike. The kid has no understanding of what steps to take. But it tries to ride the bike for longer without falling down and learns in the process. You can’t explain how you ride a bike, just that you can ride it. RL works in a similar way. Given an environment, it learns to optimise for a goal through multiple trials and errors.
“… I believe that in some sense reinforcement learning is the future of AI … an intelligent system must be able to learn on its own, without constant supervision …” – Richard Sutton, Founding Father of Reinforcement Learning
To go a bit deeper into the tech in a watered-down way, RL has three components – the state, the policy, and the action. The state is a description of what the environment is like right now. The policy evaluates the state and finds an optimal path to the goal set for the algorithm.
The action is the step suggested by the policy and taken by the algorithm to reach the goal. RL algorithms iteratively run through states, use their policy to generate an action, run the action, and given the environment’s feedback – called reward – optimise the policy to give more goal-oriented actions.
In this manner, RL allows us to solve many problems without actually needing as much supervised/labelled data as a traditional DL model does – since it keeps generating its own data. Of course, there’s the caveat that RL doesn’t solve the same set of problems as DL – but there is a strong intersection. In this manner, RL can level the playing fields as Data may not necessarily be the moat it earlier was.
The biggest application of RL that we’ve seen until now has been in games – AlphaGo Zero, Deepmind’s expert-level AI to play the board game Go; DeepMind’s efforts to master a multi-agent game like StarCraft called AlphaStar; OpenAI’s research that shows multiple agents playing Hide And Seek. – these all leverage RL.
In the future I see RL changing how Control Systems are built for complex machines. Machines will leverage RL for 3-dimensional path and motion planning. RL will improve systems that tend to have conversational interfaces, leveraging each conversation to improve the policy. RL could potentially be used for most decision making processes in extremely complex environments with low precedent data. This will be the decade of RL.