Reinforcement Learning: The Algorithms Changing How Computers Make Decisions

Reinforcement Learning: The Algorithms Changing How Computers Make Decisions

SUMMARY

The issue with Deep Learning is that the resources that led to its rise are also giving rise to inequities

Reinforcement Learning is mimicking what humans do

The biggest application of RL that we’ve seen until now has been in games

The last decade of tech was to a large part defined by the advent of Deep Supervised Learning (DL). The availability of cheap data at scale, computational power, and researcher interest have made it the de-facto school of algorithms used for most pattern recognition problems. Face recognition on social media, product recommendations on sites, voice assistants like Google Assistant, Alexa, and Siri are some examples largely powered by DL.

The issue with deep learning is that the resources that led to its rise are also giving rise to inequities. Today, it is tough for startups to beat ‘big tech’ like Apple, Google, Amazon, and Microsoft in deep learning through better research capabilities or better data.

My prediction that in the 2020s, we shall see this inequity broken down. This shall be due to the rise of Deep Reinforcement Learning (RL) as a prominent algorithm for such problems.

RL, in essence, is mimicking what humans do. Let’s take the example of a kid learning to ride a bike. The kid has no understanding of what steps to take. But it tries to ride the bike for longer without falling down and learns in the process. You can’t explain how you ride a bike, just that you can ride it. RL works in a similar way. Given an environment, it learns to optimise for a goal through multiple trials and errors.

“…  I believe that in some sense reinforcement learning is the future of AI … an intelligent system must be able to learn on its own, without constant supervision …” – Richard Sutton, Founding Father of Reinforcement Learning

To go a bit deeper into the tech in a watered-down way, RL has three components – the state, the policy, and the action. The state is a description of what the environment is like right now. The policy evaluates the state and finds an optimal path to the goal set for the algorithm.

The action is the step suggested by the policy and taken by the algorithm to reach the goal. RL algorithms iteratively run through states, use their policy to generate an action, run the action, and given the environment’s feedback – called reward – optimise the policy to give more goal-oriented actions.

In this manner, RL allows us to solve many problems without actually needing as much supervised/labelled data as a traditional DL model does – since it keeps generating its own data. Of course, there’s the caveat that RL doesn’t solve the same set of problems as DL – but there is a strong intersection. In this manner, RL can level the playing fields as Data may not necessarily be the moat it earlier was.

The biggest application of RL that we’ve seen until now has been in games – AlphaGo Zero, Deepmind’s expert-level AI to play the board game Go; DeepMind’s efforts to master a multi-agent game like StarCraft called AlphaStar; OpenAI’s research that shows multiple agents playing Hide And Seek. – these all leverage RL.

In the future I see RL changing how Control Systems are built for complex machines. Machines will leverage RL for 3-dimensional path and motion planning. RL will improve systems that tend to have conversational interfaces, leveraging each conversation to improve the policy. RL could potentially be used for most decision making processes in extremely complex environments with low precedent data. This will be the decade of RL.

Step up your startup journey with BHASKAR! From resources to networking, BHASKAR connects Indian innovators with everything they need to succeed. Join today to access a platform built for innovation, growth, and community.

Note: The views and opinions expressed are solely those of the author and does not necessarily reflect the views held by Inc42, its creators or employees. Inc42 is not responsible for the accuracy of any of the information supplied by guest bloggers.

You have reached your limit of free stories
Become An Inc42 Plus Member

Become a Startup Insider in 2024 with Inc42 Plus. Join our exclusive community of 10,000+ founders, investors & operators and stay ahead in India’s startup & business economy.

2 YEAR PLAN
₹19999
₹7999
₹333/Month
UNLOCK 60% OFF
Cancel Anytime
1 YEAR PLAN
₹9999
₹4999
₹416/Month
UNLOCK 50% OFF
Cancel Anytime
Already A Member?
Discover Startups & Business Models

Unleash your potential by exploring unlimited articles, trackers, and playbooks. Identify the hottest startup deals, supercharge your innovation projects, and stay updated with expert curation.

Reinforcement Learning: The Algorithms Changing How Computers Make Decisions-Inc42 Media
How-To’s on Starting & Scaling Up

Empower yourself with comprehensive playbooks, expert analysis, and invaluable insights. Learn to validate ideas, acquire customers, secure funding, and navigate the journey to startup success.

Reinforcement Learning: The Algorithms Changing How Computers Make Decisions-Inc42 Media
Identify Trends & New Markets

Access 75+ in-depth reports on frontier industries. Gain exclusive market intelligence, understand market landscapes, and decode emerging trends to make informed decisions.

Reinforcement Learning: The Algorithms Changing How Computers Make Decisions-Inc42 Media
Track & Decode the Investment Landscape

Stay ahead with startup and funding trackers. Analyse investment strategies, profile successful investors, and keep track of upcoming funds, accelerators, and more.

Reinforcement Learning: The Algorithms Changing How Computers Make Decisions-Inc42 Media
Reinforcement Learning: The Algorithms Changing How Computers Make Decisions-Inc42 Media
You’re in Good company