The [course_title] course is designed for the people who have a keen interest in machine learning. Throughout the course, you will explore an area of Machine learning called Reinforcement Learning.

Developed from the concept of behaviourist psychology, Reinforcement theory explains how software agents ought to take actions in an environment to maximise some notion of cumulative reward. The course helps you to examine efficient algorithms, where they exist, for single-agent and multi-agent planning along with the approaches to learning near-optimal decisions from experience.

Upon completion, you will be able to replicate a result from a published paper in reinforcement learning.

**Assessment**

This course does not involve any written exams. Students need to answer 5 assignment questions to complete the course, the answers will be in the form of written work in pdf or word. Students can write the answers in their own time. Each answer need to be 200 words (1 Page). Once the answers are submitted, the tutor will check and assess the work.

**Certification**

Edukite courses are free to study. To successfully complete a course you must submit all the assignment of the course as part of assessment. Upon successful completion of a course, you can choose to make your achievement formal by obtaining your Certificate at a cost of £49.

Having an Official Edukite Certification is a great way to celebrate and share your success. You can:

- Add the certificate to your CV or resume and brighten up your career
- Show it to prove your success

Course Credit: Brown University

### Course Curriculum

Introduction to Reinforcement Learning | |||

Introduction | 00:04:00 | ||

Smoov & Curly's Bogus Journey | |||

Let’s Do the Time Warp Agaaaain! | 00:02:00 | ||

Introduction | 00:02:00 | ||

Decision Making & Reinforcement Learning | 00:03:00 | ||

The World – 1 | 00:03:00 | ||

The World – 2 | 00:02:00 | ||

Markov Decision Processes – 1 | 00:03:00 | ||

Markov Decision Processes – 2 | 00:05:00 | ||

Markov Decision Processes – 3 | 00:05:00 | ||

Markov Decision Processes – 4 | 00:07:00 | ||

More About Rewards – 1 | 00:05:00 | ||

More About Rewards – 2 | 00:06:00 | ||

More About Rewards – 3 | 00:02:00 | ||

Sequences of Rewards – 1 | 00:10:00 | ||

Sequences of Rewards – 2 | 00:02:00 | ||

Sequences of Rewards – 3 | 00:01:00 | ||

Sequences of Rewards – 4 | 00:08:00 | ||

Assumptions | 00:03:00 | ||

Policies – 1 | 00:05:00 | ||

Policies – 2 | 00:06:00 | ||

Finding Policies – 1 | 00:04:00 | ||

Finding Policies – 2 | 00:05:00 | ||

Finding Policies – 3 | 00:02:00 | ||

Finding Policies – 4 | 00:06:00 | ||

Back to the Future | 00:01:00 | ||

The Bellman Equations – 1 | 00:03:00 | ||

The Bellman Equations – 2 | 00:03:00 | ||

The Bellman Equations – 3 | 00:02:00 | ||

The Third Bellman Equation | 00:02:00 | ||

Bellman Equation Relations | 00:01:00 | ||

What Have We Learned? | 00:02:00 | ||

Reinforcement Learning Basics | |||

Introduction | 00:03:00 | ||

Mystery Game – 1 | 00:04:00 | ||

Mystery Game – 2 | 00:06:00 | ||

Behavior Structures – 1 | 00:04:00 | ||

Behavior Structures – 2 | 00:03:00 | ||

Evaluating a Policy | 00:05:00 | ||

Evaluating a Learner | 00:04:00 | ||

What Have We Learned? | 00:02:00 | ||

TD and Friends | |||

Temporal Difference Learning | 00:01:00 | ||

RL Context – 1 | 00:01:00 | ||

RL Context – 2 | 00:04:00 | ||

TD Lambda | 00:02:00 | ||

Value Computation Example | 00:01:00 | ||

Estimating from Data | 00:01:00 | ||

Computing Estimates Incrementally | 00:00:00 | ||

Properties of Learning Rates | 00:02:00 | ||

Selecting Learning Rates | 00:01:00 | ||

TD(1) Rule | 00:03:00 | ||

TD(1) Example – 1 | 00:04:00 | ||

TD(1) Example – 2 | 00:04:00 | ||

TD(1) Example – 3 | 00:02:00 | ||

Why TD(1) Is “Wrong” | 00:03:00 | ||

TD(0) Rule | 00:04:00 | ||

TD(Lambda) Rule | 00:07:00 | ||

K-Step Estimators | 00:04:00 | ||

K-Step Estimators and TD(Lambda) | 00:06:00 | ||

TD(Lambda) Empirical Performance | 00:03:00 | ||

What Have We Learned? | 00:07:00 | ||

Convergence | |||

Convergence: TD with Control | 00:01:00 | ||

Bellman Equations | 00:02:00 | ||

Bellman Equations with Actions | 00:05:00 | ||

Bellman Operator – 1 | 00:02:00 | ||

Bellman Operator – 2 | 00:01:00 | ||

Contraction Mappings | 00:03:00 | ||

Contraction Mapping Quiz | 00:01:00 | ||

Contraction Properties | 00:04:00 | ||

The Bellman Operator Contracts – 1 | 00:03:00 | ||

The Bellman Operator Contracts – 2 | 00:03:00 | ||

Max Is a Non-expansion | 00:02:00 | ||

Proof That Max Is a Non-expansion – 1 | 00:04:00 | ||

Proof That Max Is a Non-expansion – 2 | 00:03:00 | ||

Convergence – 1 | 00:02:00 | ||

Convergence – 2 | 00:04:00 | ||

Convergence Theorem Explained – 1 | 00:04:00 | ||

Convergence Theorem Explained – 2 | 00:05:00 | ||

Generalized MDPs | 00:05:00 | ||

Generalized MDPs – Solution – 1 | 00:04:00 | ||

Generalized MDPs – Solution – 2 | 00:05:00 | ||

Generalized MDPs – Solution – 3 | 00:02:00 | ||

What Have We Learned? | 00:05:00 | ||

Advanced Algorithmic Analysis | |||

Introduction | 00:01:00 | ||

More on Value Iteration – 1 | 00:04:00 | ||

More on Value Iteration – 2 | 00:05:00 | ||

More on Value Iteration – 3 | 00:01:00 | ||

Linear Programming – 1 | 00:04:00 | ||

Linear Programming – 2 | 00:03:00 | ||

Linear Programming – 3 | 00:05:00 | ||

Policy Iteration | 00:04:00 | ||

Domination | 00:02:00 | ||

Why Does Policy Iteration Work? | 00:01:00 | ||

B_2 Is Monotonic | 00:04:00 | ||

Another Property in Policy Iteration – 1 | 00:05:00 | ||

Policy Iteration Proof | 00:01:00 | ||

Another Property in Policy Iteration – 2 | 00:05:00 | ||

What Have We Learned? | 00:06:00 | ||

Messing with Rewards | |||

Introduction | 00:06:00 | ||

Changing the Reward Function | 00:02:00 | ||

Multiplying by a Scalar | 00:02:00 | ||

Adding a Scalar | 00:01:00 | ||

Reward Shaping | 00:03:00 | ||

Shaping in RL | 00:06:00 | ||

Potential-based Shaping in RL | 00:02:00 | ||

State-based Bonuses | 00:03:00 | ||

Potential-based Shaping – 1 | 00:04:00 | ||

Potential-based Shaping – 2 | 00:03:00 | ||

Q-Learning with Potentials – 1 | 00:03:00 | ||

Q-Learning With Potentials – 2 | 00:04:00 | ||

What Have We Learned? | 00:03:00 | ||

Exploring Exploration | |||

Introduction | 00:01:00 | ||

K-armed Bandits – 1 | 00:04:00 | ||

K-armed Bandits – 2 | 00:01:00 | ||

Confidence-based Exploration – 1 | 00:02:00 | ||

Confidence-based Exploration – 2 | 00:06:00 | ||

Metrics for Bandits – 1 | 00:02:00 | ||

Metrics for Bandits – 2 | 00:03:00 | ||

Metrics for Bandits – 3 | 00:05:00 | ||

Metrics for Bandits – 4 | 00:01:00 | ||

Find Best Implies Few Mistakes | 00:03:00 | ||

Few Mistakes Implies Do Well – 1 | 00:02:00 | ||

Few Mistakes Implies Do Well – 2 | 00:04:00 | ||

Do Well Implies Find Best | 00:05:00 | ||

Putting It Together | 00:02:00 | ||

Hoeffding | 00:04:00 | ||

Combining Arm Info – 1 | 00:01:00 | ||

Combining Arm Info – 2 | 00:04:00 | ||

Combining Arm Info – 3 | 00:02:00 | ||

How Many Samples? – 1 | 00:01:00 | ||

How Many Samples? – 2 | 00:01:00 | ||

Exploring Deterministic MDPs – 1 | 00:03:00 | ||

MDP Optimization Criteria | 00:04:00 | ||

Exploring Deterministic MDPs – 2 | 00:06:00 | ||

Exploring Deterministic MDPs – 3 | 00:01:00 | ||

Rmax Analysis – 1 | 00:02:00 | ||

Rmax Analysis – 2 | 00:03:00 | ||

Rmax Analysis – 3 | 00:06:00 | ||

Lower Bound | 00:06:00 | ||

General Stochastic MDPs | 00:01:00 | ||

General Rmax | 00:02:00 | ||

Simulation Lemma – 1 | 00:03:00 | ||

Simulation Lemma – 2 | 00:05:00 | ||

Explore-or-Exploit Lemma | 00:04:00 | ||

What Have We Learned? | 00:04:00 | ||

Generalization | |||

Introduction | 00:01:00 | ||

Example: Taxi | 00:04:00 | ||

Generalization Idea | 00:04:00 | ||

Basic Update Rule | 00:04:00 | ||

Linear Value Function Approximation | 00:03:00 | ||

Calculus! | 00:01:00 | ||

Does It Work? – 1 | 00:03:00 | ||

Does It Work? – 2 | 00:04:00 | ||

Does It Work? – 3 | 00:03:00 | ||

Baird’s Counterexample – 1 | 00:03:00 | ||

Baird’s Counterexample – 2 | 00:02:00 | ||

Bad Update Sequence – 1 | 00:01:00 | ||

Bad Update Sequence – 2 | 00:01:00 | ||

Bad Update Sequence – 3 | 00:03:00 | ||

Bad Update Sequence – 4 | 00:01:00 | ||

Averagers – 1 | 00:04:00 | ||

Averagers – 2 | 00:02:00 | ||

Averagers – 3 | 00:01:00 | ||

Connection to MDPs | 00:04:00 | ||

What Have We Learned? – 1 | 00:05:00 | ||

What Have We Learned? – 2 | 00:04:00 | ||

Partially Observable MDPs | |||

Introduction | 00:01:00 | ||

POMDPs | 00:03:00 | ||

POMDPs Generalize MDPs | 00:01:00 | ||

POMDP Example – 1 | 00:02:00 | ||

POMDP Example – 2 | 00:02:00 | ||

State Estimation – 1 | 00:04:00 | ||

State Estimation – 2 | 00:04:00 | ||

Value Iteration in POMDPs – 1 | 00:05:00 | ||

Value Iteration in POMDPs – 2 | 00:01:00 | ||

Piecewise-Linear & Convex – 1 | 00:01:00 | ||

Piecewise-Linear & Convex – 2 | 00:02:00 | ||

Piecewise-Linear & Convex – 3 | 00:04:00 | ||

Piecewise-Linear & Convex – 4 | 00:04:00 | ||

Algorithmic Approach | 00:04:00 | ||

Domination | 00:01:00 | ||

RL for POMDPs – 1 | 00:03:00 | ||

RL for POMDPs – 2 | 00:03:00 | ||

Learning a POMDP | 00:01:00 | ||

Learning Memoryless Policies – 1 | 00:02:00 | ||

Learning Memoryless Policies – 2 | 00:02:00 | ||

Learning Memoryless Policies – 3 | 00:04:00 | ||

Bayesian RL – 1 | 00:04:00 | ||

Bayesian RL – 2 | 00:01:00 | ||

Bayesian RL – 3 | 00:03:00 | ||

Predictive State Representation | 00:03:00 | ||

PSR Example – 1 | 00:05:00 | ||

PSR Example – 2 | 00:01:00 | ||

PSR Theorem | 00:05:00 | ||

What Have We Learned? – 1 | 00:05:00 | ||

What Have We Learned? – 2 | 00:02:00 | ||

Options | |||

Generalizing Generalizing | 00:01:00 | ||

What Makes RL Hard? | 00:06:00 | ||

Temporal Abstraction – 1 | 00:05:00 | ||

Temporal Abstraction – 2 | 00:03:00 | ||

Temporal Abstraction – 3 | 00:04:00 | ||

Temporal Abstraction Options – 1 | 00:03:00 | ||

Temporal Abstraction Options – 2 | 00:03:00 | ||

Temporal Abstraction Option Function – 1 | 00:04:00 | ||

Temporal Abstraction Option Function – 2 | 00:05:00 | ||

Temporal Abstraction Option Function – 3 | 00:03:00 | ||

Temporal Abstraction Option Function – 4 | 00:03:00 | ||

Temporal Abstraction Option Function – 5 | 00:05:00 | ||

Pac-Man Problems – 1 | 00:04:00 | ||

Pac-Man Problems – 2 | 00:06:00 | ||

Pac-Man Problems – 3 | 00:02:00 | ||

Pac-Man Problems – 4 | 00:04:00 | ||

How It Comes Together – 1 | 00:03:00 | ||

How It Comes Together – 2 | 00:03:00 | ||

Goal Abstraction – 1 | 00:04:00 | ||

Goal Abstraction – 2 | 00:04:00 | ||

Goal Abstraction – 3 | 00:05:00 | ||

Goal Abstraction – 4 | 00:03:00 | ||

Goal Abstraction – 5 | 00:04:00 | ||

Monte Carlo Tree Search – 1 | 00:05:00 | ||

Monte Carlo Tree Search – 2 | 00:05:00 | ||

Monte Carlo Tree Search – 3 | 00:05:00 | ||

Monte Carlo Tree Search – 4 | 00:04:00 | ||

Monte Carlo Tree Search – 5 | 00:04:00 | ||

Monte Carlo Tree Properties – 1 | 00:04:00 | ||

Monte Carlo Tree Properties – 2 | 00:02:00 | ||

What Have We Learned? – 1 | 00:03:00 | ||

What Have We Learned? – 2 | 00:04:00 | ||

Game Theory | |||

Scooby Dooby Doo! | 00:01:00 | ||

Game Theory | 00:01:00 | ||

What Is Game Theory? | 00:04:00 | ||

A Simple Game – 1 | 00:06:00 | ||

A Simple Game – 2 | 00:01:00 | ||

A Simple Game – 3 | 00:03:00 | ||

Minimax | 00:03:00 | ||

Fundamental Result | 00:05:00 | ||

Game Tree – 1 | 00:03:00 | ||

Game Tree – 2 | 00:01:00 | ||

Von Neumann | 00:02:00 | ||

Minipoker | 00:05:00 | ||

Minipoker Tree | 00:04:00 | ||

Mixed Strategy | 00:03:00 | ||

Lines | 00:01:00 | ||

Center Game | 00:08:00 | ||

Snitch – 1 | 00:05:00 | ||

Snitch – 2 | 00:07:00 | ||

Snitch – 3 | 00:03:00 | ||

A Beautiful Equilibrium – 1 | 00:04:00 | ||

A Beautiful Equilibrium – 2 | 00:01:00 | ||

A Beautiful Equilibrium – 3 | 00:03:00 | ||

The Two-Step | 00:04:00 | ||

2Step2Furious | 00:06:00 | ||

What Have We Learned? | 00:07:00 | ||

Game Theory Reloded | |||

The Sequencing | 00:01:00 | ||

Iterated Prisoner’s Dilemma | 00:02:00 | ||

Uncertain End | 00:03:00 | ||

Tit-for-Tat – 1 | 00:02:00 | ||

Tit-for-Tat – 2 | 00:01:00 | ||

Facing TfT | 00:03:00 | ||

Finite State Strategy | 00:05:00 | ||

Best Responses in IPD | 00:01:00 | ||

Folk Theorem | 00:04:00 | ||

Repeated Games – 1 | 00:02:00 | ||

Repeated Games – 2 | 00:01:00 | ||

Minmax Profile | 00:04:00 | ||

Security Level Profile | 00:02:00 | ||

Folksy Theorem | 00:02:00 | ||

Grim Trigger | 00:02:00 | ||

Implausible Threats | 00:05:00 | ||

TfT vs. TfT | 00:01:00 | ||

Pavlov | 00:02:00 | ||

Pavlov vs. Pavlov | 00:01:00 | ||

Pavlov Is Subgame Perfect | 00:04:00 | ||

Computational Folk Theorem | 00:03:00 | ||

Stochastic Games and Multiagent RL | 00:07:00 | ||

Stochastic Games | 00:02:00 | ||

Models & Stochastic Games | 00:01:00 | ||

Zero-Sum Stochastic Games – 1 | 00:05:00 | ||

Zero-Sum Stochastic Games – 2 | 00:03:00 | ||

General-Sum Games | 00:04:00 | ||

Lots of Ideas | 00:03:00 | ||

What Have We Learned? | 00:03:00 | ||

Game Theory Revolutions | |||

Game Theory Revolutions | 00:01:00 | ||

Solution Concepts | 00:03:00 | ||

General Tso Chicken – 1 | 00:03:00 | ||

General Tso Chicken – 2 | 00:01:00 | ||

General Tso Chicken – 3 | 00:04:00 | ||

Correlated GTC – 1 | 00:04:00 | ||

Correlated GTC – 2 | 00:01:00 | ||

Correlated GTC – 3 | 00:01:00 | ||

Correlated Facts | 00:02:00 | ||

Solution Concepts Revisited | 00:01:00 | ||

Coco Values – 1 | 00:05:00 | ||

Coco Values – 2 | 00:02:00 | ||

Coco Definition | 00:05:00 | ||

Coco Example | 00:04:00 | ||

Coco Properties | 00:05:00 | ||

Mechanism Design | 00:04:00 | ||

Peer Teaching | 00:04:00 | ||

Peer Teaching – 2 | 00:04:00 | ||

Peer Teaching – 3 | 00:04:00 | ||

Peer Teaching – 4 | 00:05:00 | ||

Peer Teaching – 5 | 00:01:00 | ||

King Solomon – 1 | 00:04:00 | ||

King Solomon – 2 | 00:04:00 | ||

King Solomon – 3 | 00:03:00 | ||

King Solomon – 4 | 00:04:00 | ||

King Solomon – 5 | 00:04:00 | ||

King Solomon – 6 | 00:04:00 | ||

King Solomon – 7 | 00:02:00 | ||

What Have We Learned? – 1 | 00:03:00 | ||

What Have We Learned? – 2 | 00:03:00 | ||

CCC | |||

Introduction | 00:02:00 | ||

Coordinating and Communicating | 00:03:00 | ||

DEC-POMDP | 00:03:00 | ||

DEC-POMDP Properties | 00:02:00 | ||

DEC-POMDP Example | 00:04:00 | ||

Communicating and Coaching | 00:02:00 | ||

Inverse Reinforcement Learning | 00:01:00 | ||

Inverse Reinforcement Learning Example | 00:04:00 | ||

Output of MLIRL | 00:03:00 | ||

What Have We Learned… (or Have We?) | 00:02:00 | ||

Curly, Beam Me Up | 00:01:00 | ||

What We Will Have Learned | 00:03:00 | ||

Not Reward Shaping | 00:04:00 | ||

Policy Shaping – 1 | 00:04:00 | ||

Policy Shaping – 2 | 00:04:00 | ||

Policy Shaping – 3 | 00:01:00 | ||

Policy Shaping – 4 | 00:02:00 | ||

Policy Shaping – 5 | 00:04:00 | ||

Policy Shaping – 6 | 00:03:00 | ||

Policy Shaping – 7 | 00:02:00 | ||

Multiple Sources – 1 | 00:04:00 | ||

Multiple Sources – 2 | 00:03:00 | ||

Multiple Sources – 3 | 00:01:00 | ||

Drama Management | 00:05:00 | ||

Drama Management – 2 | 00:05:00 | ||

Trajectories as MDPs | 00:04:00 | ||

Trajectories as TTD MDPs – 1 | 00:05:00 | ||

Trajectories as TTD MDPs – 2 | 00:03:00 | ||

What Have We Learned? | 00:04:00 | ||

Outroduction to Reinforcement Learning | |||

Outroduction – Part 1 | 00:04:00 | ||

Outroduction – Part 2 | 00:06:00 | ||

Assessment | |||

Submit Your Assignment | 00:00:00 | ||

Certification | 00:00:00 |

### Course Reviews

No Reviews found for this course.

**8 STUDENTS ENROLLED**