• LOGIN
  • No products in the cart.

You must be logged in to take this course  →   LOGIN | REGISTER NOW

The [course_title] course is designed for the people who have a keen interest in machine learning. Throughout the course, you will explore an area of Machine learning called Reinforcement Learning.

Developed from the concept of behaviourist psychology, Reinforcement theory explains how software agents ought to take actions in an environment to maximise some notion of cumulative reward. The course helps you to examine efficient algorithms, where they exist, for single-agent and multi-agent planning along with the approaches to learning near-optimal decisions from experience.

Upon completion, you will be able to replicate a result from a published paper in reinforcement learning.

Assessment

This course does not involve any written exams. Students need to answer 5 assignment questions to complete the course, the answers will be in the form of written work in pdf or word. Students can write the answers in their own time. Each answer need to be 200 words (1 Page). Once the answers are submitted, the tutor will check and assess the work.

Certification

Edukite courses are free to study. To successfully complete a course you must submit all the assignment of the course as part of assessment. Upon successful completion of a course, you can choose to make your achievement formal by obtaining your Certificate at a cost of £49.

Having an Official Edukite Certification is a great way to celebrate and share your success. You can:

  • Add the certificate to your CV or resume and brighten up your career
  • Show it to prove your success

 

 

Course Credit: Brown University 

Course Curriculum

Introduction to Reinforcement Learning
Introduction 00:04:00
Smoov & Curly's Bogus Journey
Let’s Do the Time Warp Agaaaain! 00:02:00
Introduction 00:02:00
Decision Making & Reinforcement Learning 00:03:00
The World – 1 00:03:00
The World – 2 00:02:00
Markov Decision Processes – 1 00:03:00
Markov Decision Processes – 2 00:05:00
Markov Decision Processes – 3 00:05:00
Markov Decision Processes – 4 00:07:00
More About Rewards – 1 00:05:00
More About Rewards – 2 00:06:00
More About Rewards – 3 00:02:00
Sequences of Rewards – 1 00:10:00
Sequences of Rewards – 2 00:02:00
Sequences of Rewards – 3 00:01:00
Sequences of Rewards – 4 00:08:00
Assumptions 00:03:00
Policies – 1 00:05:00
Policies – 2 00:06:00
Finding Policies – 1 00:04:00
Finding Policies – 2 00:05:00
Finding Policies – 3 00:02:00
Finding Policies – 4 00:06:00
Back to the Future 00:01:00
The Bellman Equations – 1 00:03:00
The Bellman Equations – 2 00:03:00
The Bellman Equations – 3 00:02:00
The Third Bellman Equation 00:02:00
Bellman Equation Relations 00:01:00
What Have We Learned? 00:02:00
Reinforcement Learning Basics
Introduction 00:03:00
Mystery Game – 1 00:04:00
Mystery Game – 2 00:06:00
Behavior Structures – 1 00:04:00
Behavior Structures – 2 00:03:00
Evaluating a Policy 00:05:00
Evaluating a Learner 00:04:00
What Have We Learned? 00:02:00
TD and Friends
Temporal Difference Learning 00:01:00
RL Context – 1 00:01:00
RL Context – 2 00:04:00
TD Lambda 00:02:00
Value Computation Example 00:01:00
Estimating from Data 00:01:00
Computing Estimates Incrementally 00:00:00
Properties of Learning Rates 00:02:00
Selecting Learning Rates 00:01:00
TD(1) Rule 00:03:00
TD(1) Example – 1 00:04:00
TD(1) Example – 2 00:04:00
TD(1) Example – 3 00:02:00
Why TD(1) Is “Wrong” 00:03:00
TD(0) Rule 00:04:00
TD(Lambda) Rule 00:07:00
K-Step Estimators 00:04:00
K-Step Estimators and TD(Lambda) 00:06:00
TD(Lambda) Empirical Performance 00:03:00
What Have We Learned? 00:07:00
Convergence
Convergence: TD with Control 00:01:00
Bellman Equations 00:02:00
Bellman Equations with Actions 00:05:00
Bellman Operator – 1 00:02:00
Bellman Operator – 2 00:01:00
Contraction Mappings 00:03:00
Contraction Mapping Quiz 00:01:00
Contraction Properties 00:04:00
The Bellman Operator Contracts – 1 00:03:00
The Bellman Operator Contracts – 2 00:03:00
Max Is a Non-expansion 00:02:00
Proof That Max Is a Non-expansion – 1 00:04:00
Proof That Max Is a Non-expansion – 2 00:03:00
Convergence – 1 00:02:00
Convergence – 2 00:04:00
Convergence Theorem Explained – 1 00:04:00
Convergence Theorem Explained – 2 00:05:00
Generalized MDPs 00:05:00
Generalized MDPs – Solution – 1 00:04:00
Generalized MDPs – Solution – 2 00:05:00
Generalized MDPs – Solution – 3 00:02:00
What Have We Learned? 00:05:00
Advanced Algorithmic Analysis
Introduction 00:01:00
More on Value Iteration – 1 00:04:00
More on Value Iteration – 2 00:05:00
More on Value Iteration – 3 00:01:00
Linear Programming – 1 00:04:00
Linear Programming – 2 00:03:00
Linear Programming – 3 00:05:00
Policy Iteration 00:04:00
Domination 00:02:00
Why Does Policy Iteration Work? 00:01:00
B_2 Is Monotonic 00:04:00
Another Property in Policy Iteration – 1 00:05:00
Policy Iteration Proof 00:01:00
Another Property in Policy Iteration – 2 00:05:00
What Have We Learned? 00:06:00
Messing with Rewards
Introduction 00:06:00
Changing the Reward Function 00:02:00
Multiplying by a Scalar 00:02:00
Adding a Scalar 00:01:00
Reward Shaping 00:03:00
Shaping in RL 00:06:00
Potential-based Shaping in RL 00:02:00
State-based Bonuses 00:03:00
Potential-based Shaping – 1 00:04:00
Potential-based Shaping – 2 00:03:00
Q-Learning with Potentials – 1 00:03:00
Q-Learning With Potentials – 2 00:04:00
What Have We Learned? 00:03:00
Exploring Exploration
Introduction 00:01:00
K-armed Bandits – 1 00:04:00
K-armed Bandits – 2 00:01:00
Confidence-based Exploration – 1 00:02:00
Confidence-based Exploration – 2 00:06:00
Metrics for Bandits – 1 00:02:00
Metrics for Bandits – 2 00:03:00
Metrics for Bandits – 3 00:05:00
Metrics for Bandits – 4 00:01:00
Find Best Implies Few Mistakes 00:03:00
Few Mistakes Implies Do Well – 1 00:02:00
Few Mistakes Implies Do Well – 2 00:04:00
Do Well Implies Find Best 00:05:00
Putting It Together 00:02:00
Hoeffding 00:04:00
Combining Arm Info – 1 00:01:00
Combining Arm Info – 2 00:04:00
Combining Arm Info – 3 00:02:00
How Many Samples? – 1 00:01:00
How Many Samples? – 2 00:01:00
Exploring Deterministic MDPs – 1 00:03:00
MDP Optimization Criteria 00:04:00
Exploring Deterministic MDPs – 2 00:06:00
Exploring Deterministic MDPs – 3 00:01:00
Rmax Analysis – 1 00:02:00
Rmax Analysis – 2 00:03:00
Rmax Analysis – 3 00:06:00
Lower Bound 00:06:00
General Stochastic MDPs 00:01:00
General Rmax 00:02:00
Simulation Lemma – 1 00:03:00
Simulation Lemma – 2 00:05:00
Explore-or-Exploit Lemma 00:04:00
What Have We Learned? 00:04:00
Generalization
Introduction 00:01:00
Example: Taxi 00:04:00
Generalization Idea 00:04:00
Basic Update Rule 00:04:00
Linear Value Function Approximation 00:03:00
Calculus! 00:01:00
Does It Work? – 1 00:03:00
Does It Work? – 2 00:04:00
Does It Work? – 3 00:03:00
Baird’s Counterexample – 1 00:03:00
Baird’s Counterexample – 2 00:02:00
Bad Update Sequence – 1 00:01:00
Bad Update Sequence – 2 00:01:00
Bad Update Sequence – 3 00:03:00
Bad Update Sequence – 4 00:01:00
Averagers – 1 00:04:00
Averagers – 2 00:02:00
Averagers – 3 00:01:00
Connection to MDPs 00:04:00
What Have We Learned? – 1 00:05:00
What Have We Learned? – 2 00:04:00
Partially Observable MDPs
Introduction 00:01:00
POMDPs 00:03:00
POMDPs Generalize MDPs 00:01:00
POMDP Example – 1 00:02:00
POMDP Example – 2 00:02:00
State Estimation – 1 00:04:00
State Estimation – 2 00:04:00
Value Iteration in POMDPs – 1 00:05:00
Value Iteration in POMDPs – 2 00:01:00
Piecewise-Linear & Convex – 1 00:01:00
Piecewise-Linear & Convex – 2 00:02:00
Piecewise-Linear & Convex – 3 00:04:00
Piecewise-Linear & Convex – 4 00:04:00
Algorithmic Approach 00:04:00
Domination 00:01:00
RL for POMDPs – 1 00:03:00
RL for POMDPs – 2 00:03:00
Learning a POMDP 00:01:00
Learning Memoryless Policies – 1 00:02:00
Learning Memoryless Policies – 2 00:02:00
Learning Memoryless Policies – 3 00:04:00
Bayesian RL – 1 00:04:00
Bayesian RL – 2 00:01:00
Bayesian RL – 3 00:03:00
Predictive State Representation 00:03:00
PSR Example – 1 00:05:00
PSR Example – 2 00:01:00
PSR Theorem 00:05:00
What Have We Learned? – 1 00:05:00
What Have We Learned? – 2 00:02:00
Options
Generalizing Generalizing 00:01:00
What Makes RL Hard? 00:06:00
Temporal Abstraction – 1 00:05:00
Temporal Abstraction – 2 00:03:00
Temporal Abstraction – 3 00:04:00
Temporal Abstraction Options – 1 00:03:00
Temporal Abstraction Options – 2 00:03:00
Temporal Abstraction Option Function – 1 00:04:00
Temporal Abstraction Option Function – 2 00:05:00
Temporal Abstraction Option Function – 3 00:03:00
Temporal Abstraction Option Function – 4 00:03:00
Temporal Abstraction Option Function – 5 00:05:00
Pac-Man Problems – 1 00:04:00
Pac-Man Problems – 2 00:06:00
Pac-Man Problems – 3 00:02:00
Pac-Man Problems – 4 00:04:00
How It Comes Together – 1 00:03:00
How It Comes Together – 2 00:03:00
Goal Abstraction – 1 00:04:00
Goal Abstraction – 2 00:04:00
Goal Abstraction – 3 00:05:00
Goal Abstraction – 4 00:03:00
Goal Abstraction – 5 00:04:00
Monte Carlo Tree Search – 1 00:05:00
Monte Carlo Tree Search – 2 00:05:00
Monte Carlo Tree Search – 3 00:05:00
Monte Carlo Tree Search – 4 00:04:00
Monte Carlo Tree Search – 5 00:04:00
Monte Carlo Tree Properties – 1 00:04:00
Monte Carlo Tree Properties – 2 00:02:00
What Have We Learned? – 1 00:03:00
What Have We Learned? – 2 00:04:00
Game Theory
Scooby Dooby Doo! 00:01:00
Game Theory 00:01:00
What Is Game Theory? 00:04:00
A Simple Game – 1 00:06:00
A Simple Game – 2 00:01:00
A Simple Game – 3 00:03:00
Minimax 00:03:00
Fundamental Result 00:05:00
Game Tree – 1 00:03:00
Game Tree – 2 00:01:00
Von Neumann 00:02:00
Minipoker 00:05:00
Minipoker Tree 00:04:00
Mixed Strategy 00:03:00
Lines 00:01:00
Center Game 00:08:00
Snitch – 1 00:05:00
Snitch – 2 00:07:00
Snitch – 3 00:03:00
A Beautiful Equilibrium – 1 00:04:00
A Beautiful Equilibrium – 2 00:01:00
A Beautiful Equilibrium – 3 00:03:00
The Two-Step 00:04:00
2Step2Furious 00:06:00
What Have We Learned? 00:07:00
Game Theory Reloded
The Sequencing 00:01:00
Iterated Prisoner’s Dilemma 00:02:00
Uncertain End 00:03:00
Tit-for-Tat – 1 00:02:00
Tit-for-Tat – 2 00:01:00
Facing TfT 00:03:00
Finite State Strategy 00:05:00
Best Responses in IPD 00:01:00
Folk Theorem 00:04:00
Repeated Games – 1 00:02:00
Repeated Games – 2 00:01:00
Minmax Profile 00:04:00
Security Level Profile 00:02:00
Folksy Theorem 00:02:00
Grim Trigger 00:02:00
Implausible Threats 00:05:00
TfT vs. TfT 00:01:00
Pavlov 00:02:00
Pavlov vs. Pavlov 00:01:00
Pavlov Is Subgame Perfect 00:04:00
Computational Folk Theorem 00:03:00
Stochastic Games and Multiagent RL 00:07:00
Stochastic Games 00:02:00
Models & Stochastic Games 00:01:00
Zero-Sum Stochastic Games – 1 00:05:00
Zero-Sum Stochastic Games – 2 00:03:00
General-Sum Games 00:04:00
Lots of Ideas 00:03:00
What Have We Learned? 00:03:00
Game Theory Revolutions
Game Theory Revolutions 00:01:00
Solution Concepts 00:03:00
General Tso Chicken – 1 00:03:00
General Tso Chicken – 2 00:01:00
General Tso Chicken – 3 00:04:00
Correlated GTC – 1 00:04:00
Correlated GTC – 2 00:01:00
Correlated GTC – 3 00:01:00
Correlated Facts 00:02:00
Solution Concepts Revisited 00:01:00
Coco Values – 1 00:05:00
Coco Values – 2 00:02:00
Coco Definition 00:05:00
Coco Example 00:04:00
Coco Properties 00:05:00
Mechanism Design 00:04:00
Peer Teaching 00:04:00
Peer Teaching – 2 00:04:00
Peer Teaching – 3 00:04:00
Peer Teaching – 4 00:05:00
Peer Teaching – 5 00:01:00
King Solomon – 1 00:04:00
King Solomon – 2 00:04:00
King Solomon – 3 00:03:00
King Solomon – 4 00:04:00
King Solomon – 5 00:04:00
King Solomon – 6 00:04:00
King Solomon – 7 00:02:00
What Have We Learned? – 1 00:03:00
What Have We Learned? – 2 00:03:00
CCC
Introduction 00:02:00
Coordinating and Communicating 00:03:00
DEC-POMDP 00:03:00
DEC-POMDP Properties 00:02:00
DEC-POMDP Example 00:04:00
Communicating and Coaching 00:02:00
Inverse Reinforcement Learning 00:01:00
Inverse Reinforcement Learning Example 00:04:00
Output of MLIRL 00:03:00
What Have We Learned… (or Have We?) 00:02:00
Curly, Beam Me Up 00:01:00
What We Will Have Learned 00:03:00
Not Reward Shaping 00:04:00
Policy Shaping – 1 00:04:00
Policy Shaping – 2 00:04:00
Policy Shaping – 3 00:01:00
Policy Shaping – 4 00:02:00
Policy Shaping – 5 00:04:00
Policy Shaping – 6 00:03:00
Policy Shaping – 7 00:02:00
Multiple Sources – 1 00:04:00
Multiple Sources – 2 00:03:00
Multiple Sources – 3 00:01:00
Drama Management 00:05:00
Drama Management – 2 00:05:00
Trajectories as MDPs 00:04:00
Trajectories as TTD MDPs – 1 00:05:00
Trajectories as TTD MDPs – 2 00:03:00
What Have We Learned? 00:04:00
Outroduction to Reinforcement Learning
Outroduction – Part 1 00:04:00
Outroduction – Part 2 00:06:00
Assessment
Submit Your Assignment 00:00:00
Certification 00:00:00

Course Reviews

4.7

4.7
9 ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

8 STUDENTS ENROLLED
©2021 Edukite. All Rights Resereved
Edukite is A Part Of Ebrahim College, Charity Commission
Reg No 110841