Name: Reinforcement Learning
Rating: 4.7 (9 reviews)

You must be logged in to take this course → LOGIN | REGISTER NOW

The [course_title] course is designed for the people who have a keen interest in machine learning. Throughout the course, you will explore an area of Machine learning called Reinforcement Learning.

Developed from the concept of behaviourist psychology, Reinforcement theory explains how software agents ought to take actions in an environment to maximise some notion of cumulative reward. The course helps you to examine efficient algorithms, where they exist, for single-agent and multi-agent planning along with the approaches to learning near-optimal decisions from experience.

Upon completion, you will be able to replicate a result from a published paper in reinforcement learning.

Assessment

This course does not involve any written exams. Students need to answer 5 assignment questions to complete the course, the answers will be in the form of written work in pdf or word. Students can write the answers in their own time. Each answer need to be 200 words (1 Page). Once the answers are submitted, the tutor will check and assess the work.

Certification

Edukite courses are free to study. To successfully complete a course you must submit all the assignment of the course as part of assessment. Upon successful completion of a course, you can choose to make your achievement formal by obtaining your Certificate at a cost of £49.

Having an Official Edukite Certification is a great way to celebrate and share your success. You can:

Add the certificate to your CV or resume and brighten up your career
Show it to prove your success

Course Credit: Brown University

Course Curriculum

Introduction to Reinforcement Learning
	Introduction	00:04:00
Smoov & Curly's Bogus Journey
	Let’s Do the Time Warp Agaaaain!	00:02:00
	Introduction	00:02:00
	Decision Making & Reinforcement Learning	00:03:00
	The World – 1	00:03:00
	The World – 2	00:02:00
	Markov Decision Processes – 1	00:03:00
	Markov Decision Processes – 2	00:05:00
	Markov Decision Processes – 3	00:05:00
	Markov Decision Processes – 4	00:07:00
	More About Rewards – 1	00:05:00
	More About Rewards – 2	00:06:00
	More About Rewards – 3	00:02:00
	Sequences of Rewards – 1	00:10:00
	Sequences of Rewards – 2	00:02:00
	Sequences of Rewards – 3	00:01:00
	Sequences of Rewards – 4	00:08:00
	Assumptions	00:03:00
	Policies – 1	00:05:00
	Policies – 2	00:06:00
	Finding Policies – 1	00:04:00
	Finding Policies – 2	00:05:00
	Finding Policies – 3	00:02:00
	Finding Policies – 4	00:06:00
	Back to the Future	00:01:00
	The Bellman Equations – 1	00:03:00
	The Bellman Equations – 2	00:03:00
	The Bellman Equations – 3	00:02:00
	The Third Bellman Equation	00:02:00
	Bellman Equation Relations	00:01:00
	What Have We Learned?	00:02:00
Reinforcement Learning Basics
	Introduction	00:03:00
	Mystery Game – 1	00:04:00
	Mystery Game – 2	00:06:00
	Behavior Structures – 1	00:04:00
	Behavior Structures – 2	00:03:00
	Evaluating a Policy	00:05:00
	Evaluating a Learner	00:04:00
	What Have We Learned?	00:02:00
TD and Friends
	Temporal Difference Learning	00:01:00
	RL Context – 1	00:01:00
	RL Context – 2	00:04:00
	TD Lambda	00:02:00
	Value Computation Example	00:01:00
	Estimating from Data	00:01:00
	Computing Estimates Incrementally	00:00:00
	Properties of Learning Rates	00:02:00
	Selecting Learning Rates	00:01:00
	TD(1) Rule	00:03:00
	TD(1) Example – 1	00:04:00
	TD(1) Example – 2	00:04:00
	TD(1) Example – 3	00:02:00
	Why TD(1) Is “Wrong”	00:03:00
	TD(0) Rule	00:04:00
	TD(Lambda) Rule	00:07:00
	K-Step Estimators	00:04:00
	K-Step Estimators and TD(Lambda)	00:06:00
	TD(Lambda) Empirical Performance	00:03:00
	What Have We Learned?	00:07:00
Convergence
	Convergence: TD with Control	00:01:00
	Bellman Equations	00:02:00
	Bellman Equations with Actions	00:05:00
	Bellman Operator – 1	00:02:00
	Bellman Operator – 2	00:01:00
	Contraction Mappings	00:03:00
	Contraction Mapping Quiz	00:01:00
	Contraction Properties	00:04:00
	The Bellman Operator Contracts – 1	00:03:00
	The Bellman Operator Contracts – 2	00:03:00
	Max Is a Non-expansion	00:02:00
	Proof That Max Is a Non-expansion – 1	00:04:00
	Proof That Max Is a Non-expansion – 2	00:03:00
	Convergence – 1	00:02:00
	Convergence – 2	00:04:00
	Convergence Theorem Explained – 1	00:04:00
	Convergence Theorem Explained – 2	00:05:00
	Generalized MDPs	00:05:00
	Generalized MDPs – Solution – 1	00:04:00
	Generalized MDPs – Solution – 2	00:05:00
	Generalized MDPs – Solution – 3	00:02:00
	What Have We Learned?	00:05:00
Advanced Algorithmic Analysis
	Introduction	00:01:00
	More on Value Iteration – 1	00:04:00
	More on Value Iteration – 2	00:05:00
	More on Value Iteration – 3	00:01:00
	Linear Programming – 1	00:04:00
	Linear Programming – 2	00:03:00
	Linear Programming – 3	00:05:00
	Policy Iteration	00:04:00
	Domination	00:02:00
	Why Does Policy Iteration Work?	00:01:00
	B_2 Is Monotonic	00:04:00
	Another Property in Policy Iteration – 1	00:05:00
	Policy Iteration Proof	00:01:00
	Another Property in Policy Iteration – 2	00:05:00
	What Have We Learned?	00:06:00
Messing with Rewards
	Introduction	00:06:00
	Changing the Reward Function	00:02:00
	Multiplying by a Scalar	00:02:00
	Adding a Scalar	00:01:00
	Reward Shaping	00:03:00
	Shaping in RL	00:06:00
	Potential-based Shaping in RL	00:02:00
	State-based Bonuses	00:03:00
	Potential-based Shaping – 1	00:04:00
	Potential-based Shaping – 2	00:03:00
	Q-Learning with Potentials – 1	00:03:00
	Q-Learning With Potentials – 2	00:04:00
	What Have We Learned?	00:03:00
Exploring Exploration
	Introduction	00:01:00
	K-armed Bandits – 1	00:04:00
	K-armed Bandits – 2	00:01:00
	Confidence-based Exploration – 1	00:02:00
	Confidence-based Exploration – 2	00:06:00
	Metrics for Bandits – 1	00:02:00
	Metrics for Bandits – 2	00:03:00
	Metrics for Bandits – 3	00:05:00
	Metrics for Bandits – 4	00:01:00
	Find Best Implies Few Mistakes	00:03:00
	Few Mistakes Implies Do Well – 1	00:02:00
	Few Mistakes Implies Do Well – 2	00:04:00
	Do Well Implies Find Best	00:05:00
	Putting It Together	00:02:00
	Hoeffding	00:04:00
	Combining Arm Info – 1	00:01:00
	Combining Arm Info – 2	00:04:00
	Combining Arm Info – 3	00:02:00
	How Many Samples? – 1	00:01:00
	How Many Samples? – 2	00:01:00
	Exploring Deterministic MDPs – 1	00:03:00
	MDP Optimization Criteria	00:04:00
	Exploring Deterministic MDPs – 2	00:06:00
	Exploring Deterministic MDPs – 3	00:01:00
	Rmax Analysis – 1	00:02:00
	Rmax Analysis – 2	00:03:00
	Rmax Analysis – 3	00:06:00
	Lower Bound	00:06:00
	General Stochastic MDPs	00:01:00
	General Rmax	00:02:00
	Simulation Lemma – 1	00:03:00
	Simulation Lemma – 2	00:05:00
	Explore-or-Exploit Lemma	00:04:00
	What Have We Learned?	00:04:00
Generalization
	Introduction	00:01:00
	Example: Taxi	00:04:00
	Generalization Idea	00:04:00
	Basic Update Rule	00:04:00
	Linear Value Function Approximation	00:03:00
	Calculus!	00:01:00
	Does It Work? – 1	00:03:00
	Does It Work? – 2	00:04:00
	Does It Work? – 3	00:03:00
	Baird’s Counterexample – 1	00:03:00
	Baird’s Counterexample – 2	00:02:00
	Bad Update Sequence – 1	00:01:00
	Bad Update Sequence – 2	00:01:00
	Bad Update Sequence – 3	00:03:00
	Bad Update Sequence – 4	00:01:00
	Averagers – 1	00:04:00
	Averagers – 2	00:02:00
	Averagers – 3	00:01:00
	Connection to MDPs	00:04:00
	What Have We Learned? – 1	00:05:00
	What Have We Learned? – 2	00:04:00
Partially Observable MDPs
	Introduction	00:01:00
	POMDPs	00:03:00
	POMDPs Generalize MDPs	00:01:00
	POMDP Example – 1	00:02:00
	POMDP Example – 2	00:02:00
	State Estimation – 1	00:04:00
	State Estimation – 2	00:04:00
	Value Iteration in POMDPs – 1	00:05:00
	Value Iteration in POMDPs – 2	00:01:00
	Piecewise-Linear & Convex – 1	00:01:00
	Piecewise-Linear & Convex – 2	00:02:00
	Piecewise-Linear & Convex – 3	00:04:00
	Piecewise-Linear & Convex – 4	00:04:00
	Algorithmic Approach	00:04:00
	Domination	00:01:00
	RL for POMDPs – 1	00:03:00
	RL for POMDPs – 2	00:03:00
	Learning a POMDP	00:01:00
	Learning Memoryless Policies – 1	00:02:00
	Learning Memoryless Policies – 2	00:02:00
	Learning Memoryless Policies – 3	00:04:00
	Bayesian RL – 1	00:04:00
	Bayesian RL – 2	00:01:00
	Bayesian RL – 3	00:03:00
	Predictive State Representation	00:03:00
	PSR Example – 1	00:05:00
	PSR Example – 2	00:01:00
	PSR Theorem	00:05:00
	What Have We Learned? – 1	00:05:00
	What Have We Learned? – 2	00:02:00
Options
	Generalizing Generalizing	00:01:00
	What Makes RL Hard?	00:06:00
	Temporal Abstraction – 1	00:05:00
	Temporal Abstraction – 2	00:03:00
	Temporal Abstraction – 3	00:04:00
	Temporal Abstraction Options – 1	00:03:00
	Temporal Abstraction Options – 2	00:03:00
	Temporal Abstraction Option Function – 1	00:04:00
	Temporal Abstraction Option Function – 2	00:05:00
	Temporal Abstraction Option Function – 3	00:03:00
	Temporal Abstraction Option Function – 4	00:03:00
	Temporal Abstraction Option Function – 5	00:05:00
	Pac-Man Problems – 1	00:04:00
	Pac-Man Problems – 2	00:06:00
	Pac-Man Problems – 3	00:02:00
	Pac-Man Problems – 4	00:04:00
	How It Comes Together – 1	00:03:00
	How It Comes Together – 2	00:03:00
	Goal Abstraction – 1	00:04:00
	Goal Abstraction – 2	00:04:00
	Goal Abstraction – 3	00:05:00
	Goal Abstraction – 4	00:03:00
	Goal Abstraction – 5	00:04:00
	Monte Carlo Tree Search – 1	00:05:00
	Monte Carlo Tree Search – 2	00:05:00
	Monte Carlo Tree Search – 3	00:05:00
	Monte Carlo Tree Search – 4	00:04:00
	Monte Carlo Tree Search – 5	00:04:00
	Monte Carlo Tree Properties – 1	00:04:00
	Monte Carlo Tree Properties – 2	00:02:00
	What Have We Learned? – 1	00:03:00
	What Have We Learned? – 2	00:04:00
Game Theory
	Scooby Dooby Doo!	00:01:00
	Game Theory	00:01:00
	What Is Game Theory?	00:04:00
	A Simple Game – 1	00:06:00
	A Simple Game – 2	00:01:00
	A Simple Game – 3	00:03:00
	Minimax	00:03:00
	Fundamental Result	00:05:00
	Game Tree – 1	00:03:00
	Game Tree – 2	00:01:00
	Von Neumann	00:02:00
	Minipoker	00:05:00
	Minipoker Tree	00:04:00
	Mixed Strategy	00:03:00
	Lines	00:01:00
	Center Game	00:08:00
	Snitch – 1	00:05:00
	Snitch – 2	00:07:00
	Snitch – 3	00:03:00
	A Beautiful Equilibrium – 1	00:04:00
	A Beautiful Equilibrium – 2	00:01:00
	A Beautiful Equilibrium – 3	00:03:00
	The Two-Step	00:04:00
	2Step2Furious	00:06:00
	What Have We Learned?	00:07:00
Game Theory Reloded
	The Sequencing	00:01:00
	Iterated Prisoner’s Dilemma	00:02:00
	Uncertain End	00:03:00
	Tit-for-Tat – 1	00:02:00
	Tit-for-Tat – 2	00:01:00
	Facing TfT	00:03:00
	Finite State Strategy	00:05:00
	Best Responses in IPD	00:01:00
	Folk Theorem	00:04:00
	Repeated Games – 1	00:02:00
	Repeated Games – 2	00:01:00
	Minmax Profile	00:04:00
	Security Level Profile	00:02:00
	Folksy Theorem	00:02:00
	Grim Trigger	00:02:00
	Implausible Threats	00:05:00
	TfT vs. TfT	00:01:00
	Pavlov	00:02:00
	Pavlov vs. Pavlov	00:01:00
	Pavlov Is Subgame Perfect	00:04:00
	Computational Folk Theorem	00:03:00
	Stochastic Games and Multiagent RL	00:07:00
	Stochastic Games	00:02:00
	Models & Stochastic Games	00:01:00
	Zero-Sum Stochastic Games – 1	00:05:00
	Zero-Sum Stochastic Games – 2	00:03:00
	General-Sum Games	00:04:00
	Lots of Ideas	00:03:00
	What Have We Learned?	00:03:00
Game Theory Revolutions
	Game Theory Revolutions	00:01:00
	Solution Concepts	00:03:00
	General Tso Chicken – 1	00:03:00
	General Tso Chicken – 2	00:01:00
	General Tso Chicken – 3	00:04:00
	Correlated GTC – 1	00:04:00
	Correlated GTC – 2	00:01:00
	Correlated GTC – 3	00:01:00
	Correlated Facts	00:02:00
	Solution Concepts Revisited	00:01:00
	Coco Values – 1	00:05:00
	Coco Values – 2	00:02:00
	Coco Definition	00:05:00
	Coco Example	00:04:00
	Coco Properties	00:05:00
	Mechanism Design	00:04:00
	Peer Teaching	00:04:00
	Peer Teaching – 2	00:04:00
	Peer Teaching – 3	00:04:00
	Peer Teaching – 4	00:05:00
	Peer Teaching – 5	00:01:00
	King Solomon – 1	00:04:00
	King Solomon – 2	00:04:00
	King Solomon – 3	00:03:00
	King Solomon – 4	00:04:00
	King Solomon – 5	00:04:00
	King Solomon – 6	00:04:00
	King Solomon – 7	00:02:00
	What Have We Learned? – 1	00:03:00
	What Have We Learned? – 2	00:03:00
CCC
	Introduction	00:02:00
	Coordinating and Communicating	00:03:00
	DEC-POMDP	00:03:00
	DEC-POMDP Properties	00:02:00
	DEC-POMDP Example	00:04:00
	Communicating and Coaching	00:02:00
	Inverse Reinforcement Learning	00:01:00
	Inverse Reinforcement Learning Example	00:04:00
	Output of MLIRL	00:03:00
	What Have We Learned… (or Have We?)	00:02:00
	Curly, Beam Me Up	00:01:00
	What We Will Have Learned	00:03:00
	Not Reward Shaping	00:04:00
	Policy Shaping – 1	00:04:00
	Policy Shaping – 2	00:04:00
	Policy Shaping – 3	00:01:00
	Policy Shaping – 4	00:02:00
	Policy Shaping – 5	00:04:00
	Policy Shaping – 6	00:03:00
	Policy Shaping – 7	00:02:00
	Multiple Sources – 1	00:04:00
	Multiple Sources – 2	00:03:00
	Multiple Sources – 3	00:01:00
	Drama Management	00:05:00
	Drama Management – 2	00:05:00
	Trajectories as MDPs	00:04:00
	Trajectories as TTD MDPs – 1	00:05:00
	Trajectories as TTD MDPs – 2	00:03:00
	What Have We Learned?	00:04:00
Outroduction to Reinforcement Learning
	Outroduction – Part 1	00:04:00
	Outroduction – Part 2	00:06:00
Assessment
	Submit Your Assignment	00:00:00
	Certification	00:00:00

Course Reviews

4.7

4.7

9 ratings

5 stars0
4 stars0
3 stars0
2 stars0
1 stars0

No Reviews found for this course.

Related Courses

The Nature of Causation
FREE
7
4.7
Environmental Change and Management
FREE
16
4.7
Water and Human Health
FREE
7
4.7

8 STUDENTS ENROLLED

LOGIN

Reinforcement Learning

Assessment

Certification

Course Curriculum

Course Reviews

4.7

Related Courses

The Nature of Causation

Environmental Change and Management

Water and Human Health