The goal of this course is to give solid foundations for developing, analyzing, and implementing parallel and locality-efficient algorithms. This course covers theoretical underpinnings. To give a practical feeling for how algorithms map to and behave on real systems, we’ll supplement algorithmic theory with hands-on exercises on modern HPC systems, such as Cilk Plus or OpenMP on shared memory nodes and CUDA for graphics co-processors (GPUs).
Assessment
This course does not involve any written exams. Students need to answer 5 assignment questions to complete the course, the answers will be in the form of written work in pdf or word. Students can write the answers in their own time. Each answer needs to be 200 words (1 Page). Once the answers are submitted, the tutor will check and assess the work.
Certification
Edukite courses are free to study. To successfully complete a course you must submit all the assignment of the course as part of assessment. Upon successful completion of a course, you can choose to make your achievement formal by obtaining your Certificate at a cost of £49.
Having an Official Edukite Certification is a great way to celebrate and share your success. You can:
- Add the certificate to your CV or resume and brighten up your career
- Show it to prove your success
Course Credit: Georgia Institute of Technology
Course Curriculum
Introduction to High Performance Computing | |||
Welcome | 00:01:00 | ||
Philosophy and Logistics | 00:02:00 | ||
What is HPC? | 00:01:00 | ||
Supercomputing in a Nutshell | 00:02:00 | ||
Topics | 00:04:00 | ||
Intro to the Work-Span Model | |||
Introduction | 00:02:00 | ||
The Multithreaded DAG Model | 00:03:00 | ||
Example Sample Reduction | 00:04:00 | ||
Quiz: A Reduction Tree | 00:01:00 | ||
Work and Span | 00:03:00 | ||
Quiz: Work and Span Quiz | 00:01:00 | ||
Quiz: Work and Span for Reduction | 00:01:00 | ||
Basic Work Span Laws | 00:03:00 | ||
Brent’s Theorem – Part 1 | 00:05:00 | ||
Quiz: Brent’s Theorem Aside | 00:01:00 | ||
Brent’s Theorem – Part 2 | 00:03:00 | ||
Quiz: Applying Brent’s Theorem | 00:01:00 | ||
Quiz: The Slack in Brent’s Bound – Part 1 | 00:02:00 | ||
Desiderata Speedup, Work Optimality and Weak scaling | 00:06:00 | ||
Quiz: Which Parallel Algorithm is Better | 00:01:00 | ||
Basic Concurrency Primitives | 00:05:00 | ||
Quiz: A Subtle Point about Spawn | 00:01:00 | ||
Basic Analysis of Work and Span | 00:03:00 | ||
Quiz: Solve a Recurrence | 00:01:00 | ||
Desiderata For Work and Span | 00:02:00 | ||
Concurrency Primitive Parallel For | 00:02:00 | ||
Quiz: Implementing Par For | 00:01:00 | ||
Implementing Par For – Part 2 | 00:02:00 | ||
Quiz: Matrix Vector Multiply | 00:03:00 | ||
Data Races and Race Conditions | 00:02:00 | ||
Quiz: Putting It All Together 1 | 00:01:00 | ||
Quiz: Putting It All Together 2 | 00:01:00 | ||
Vector Notation | 00:03:00 | ||
Conclusion | 00:02:00 | ||
Comparison-based Sorting | |||
Introduction | 00:01:00 | ||
Comparator Networks | 00:03:00 | ||
Quiz: Sort 4 Values | 00:01:00 | ||
Bitonic Sequences | 00:02:00 | ||
Quiz: Bitonic Sequences Quiz | 00:01:00 | ||
Bitonic Splits | 00:02:00 | ||
Quiz: Bitonic Split Quiz | 00:01:00 | ||
Bitonic Splits A Parallel Scheme | 00:01:00 | ||
Bitonic Merge | 00:02:00 | ||
Quiz: Bitonic Merge Networks | 00:02:00 | ||
Quiz: Generate a Bitonic Sequence | 00:01:00 | ||
Bitonic Sort | 00:02:00 | ||
Conclusion | 00:02:00 | ||
Scans and List Ranking | |||
Introduction | 00:03:00 | ||
Prefix Sums Definitions | 00:01:00 | ||
Quiz: Prefix Sums Quiz | 00:01:00 | ||
Scans | 00:02:00 | ||
Quiz: Parallel Scan | 00:01:00 | ||
Parallel Scans | 00:03:00 | ||
Quiz: Parallel Scans Quiz | 00:01:00 | ||
Parallel Scan Analysis | 00:02:00 | ||
Parallel Quicksort | 00:02:00 | ||
Quiz: Parallel Partitioning Quiz | 00:01:00 | ||
Conditional Gathers Using Scans | 00:05:00 | ||
Segmented Scans | 00:03:00 | ||
Quiz: Scan Ingredients | 00:01:00 | ||
List Ranking Definitions | 00:02:00 | ||
Quiz: List Ranking Quiz | 00:01:00 | ||
Linked Lists as Array Pools | 00:03:00 | ||
Quiz: Linked Lists as Array Pools Quiz | 00:01:00 | ||
Quiz: What Does This Do | 00:01:00 | ||
A Parallel List Ranker | 00:05:00 | ||
Quiz: How Many Jumps | 00:01:00 | ||
A Parallel List Ranker (Formal) | 00:01:00 | ||
Quiz: A Parallel List Ranker Quiz | 00:01:00 | ||
Conclusion | 00:01:00 | ||
Tree Computations | |||
Introduction | 00:02:00 | ||
Tree Warm Up | 00:04:00 | ||
Quiz: Parallel Root Finder | 00:01:00 | ||
Quiz: Work-Optimal List Scan/ Prefix-Sum – Part 1 | 00:02:00 | ||
Parallel Independent Sets – Part 1 | 00:04:00 | ||
Quiz: Parallel Independent Sets – Part 2 | 00:01:00 | ||
Quiz: Parallel Independent Sets – Part 3 | 00:01:00 | ||
Quiz: Parallel Independent Sets – Part 4 | 00:01:00 | ||
Work-Optimal List Scan/ Prefix-Sum – Part 2 | 00:03:00 | ||
Quiz: Work-Optimal List Scan/ Prefix-Sum – Part 3 | 00:01:00 | ||
A Seemingly Sequential Tree Algorithm | 00:02:00 | ||
Quiz: Another Tree Traversal | 00:01:00 | ||
The Euler Tour Technique | 00:03:00 | ||
Quiz: The Span of a Euler Tour | 00:01:00 | ||
Quiz: Computing Levels | 00:02:00 | ||
Implementing Euler Tours | 00:03:00 | ||
Quiz: Successor or Failure-or | 00:01:00 | ||
Conclusion | 00:01:00 | ||
Shared Memory Parallel BFS | |||
Introduction | 00:01:00 | ||
BFS 101 | 00:02:00 | ||
BFS 101 (cont) | 00:03:00 | ||
Quiz: BFS Example | 00:02:00 | ||
Analysis -Is BFS Inherently Sequential | 00:02:00 | ||
Intuition -Why we might do better | 00:02:00 | ||
Quiz: BFS Example 2 | 00:01:00 | ||
High Level Approach to Parallel BFS | 00:02:00 | ||
Bags: Key Properties | 00:02:00 | ||
Pennants, Building Blocks For Bags | 00:03:00 | ||
Quiz: Pennants | 00:19:00 | ||
Combining Pennants Into Bags | 00:02:00 | ||
Duality Between Bags and Binary Math | 00:02:00 | ||
Quiz: What is the Cost of Insertion | 00:01:00 | ||
Quiz: Combining Two Bags | 00:01:00 | ||
Quiz: Cost of Other Operations | 00:01:00 | ||
Bag Splitting | 00:02:00 | ||
Finishing the Parallel BFS with Bags | 00:03:00 | ||
Conclusion | 00:01:00 | ||
Intro to Dist. Memory Models | |||
Introduction | 00:02:00 | ||
Quiz: Simulating the Brain | 00:02:00 | ||
A Basic Model of Distributed Memory | 00:05:00 | ||
Quiz: Getting a Feel for the Alpha Beta Model | 00:02:00 | ||
Applying the Rules | 00:01:00 | ||
Quiz: Scenario 2 | 00:01:00 | ||
Quiz: Scenario 3 | 00:01:00 | ||
Quiz: Scenario 4 | 00:01:00 | ||
Collective Operations – Part 1 | 00:04:00 | ||
Point to Point | 00:04:00 | ||
Point to Point Completion Semantics | 00:03:00 | ||
Quiz: Send and Recieve in Action | 00:01:00 | ||
Quiz: Send and Recieve Revisited | 00:01:00 | ||
All to One Reduce Pseudocode | 00:02:00 | ||
Quiz: All to One Reduce Pseudocode | 00:01:00 | ||
Vector Reductions | 00:01:00 | ||
Quiz: Vector Reductions | 00:01:00 | ||
More Collectives | 00:03:00 | ||
A Pseudocode API for Collectives | 00:05:00 | ||
Quiz: All Gather, From Building Blocks | 00:01:00 | ||
Collectives Lower Bounds | 00:03:00 | ||
Quiz: All Gather Quiz | 00:01:00 | ||
Quiz: Implement Scatter Quiz | 00:01:00 | ||
Implementing Scatter and Gather – Part 2 | 00:03:00 | ||
Quiz: When to Use Tree Based Reduce | 00:01:00 | ||
Quiz: What’s Wrong with Tree Based Reduce | 00:01:00 | ||
Bucketing Algorithms for Collectives | 00:03:00 | ||
Quiz: Bandwidth Optimal Broadcast | 00:01:00 | ||
Quiz: All Reduce | 00:01:00 | ||
Conclusion | 00:01:00 | ||
Intro to MPI | |||
Introduction | 00:01:00 | ||
Topology | |||
Introduction | 00:01:00 | ||
Intro to Network Models Links and Diameter | 00:03:00 | ||
Quiz: Improve the Diameter of a 2-D Mesh | 00:01:00 | ||
Bisection (Band)Width | 00:03:00 | ||
Some Other Network Topologies | 00:05:00 | ||
Quiz: Diameter and Bisection | 00:01:00 | ||
Mappings and Congestion | 00:05:00 | ||
Quiz: 2-D to 1-D Congestion | 00:01:00 | ||
A Lower Bound on Congestion | 00:03:00 | ||
Exploiting Higher Dimensions | 00:03:00 | ||
Quiz: 2D Broadcast | 00:01:00 | ||
All to All Personalized Exchange | 00:06:00 | ||
Quiz: All to All in Higher Dimensions | 00:01:00 | ||
Conclusion | 00:01:00 | ||
Dist. Dense Matrix Multiply | |||
Introduction | 00:01:00 | ||
Matrix Multiply Basic Defiitions | 00:04:00 | ||
Quiz: Definitions Check | 00:01:00 | ||
A Geometrical View | 00:03:00 | ||
Quiz: Applying Loomis Whitney | 00:01:00 | ||
1D Algorithms | 00:04:00 | ||
Quiz: 1D Algorithm Cost | 00:01:00 | ||
Quiz: 1D Algorithm Cost – Part 2 | 00:01:00 | ||
Efficiency and the 1D Algorithm | 00:03:00 | ||
Quiz: Isoefficiency | 00:01:00 | ||
A 2D Algorithm SUMMA | 00:03:00 | ||
Quiz: SUMMA Communication Time | 00:01:00 | ||
Efficiency of 2D SUMMA | 00:02:00 | ||
Quiz: SUMMA Memory | 00:01:00 | ||
A Lower Bound on Communication | 00:07:00 | ||
Quiz: A Lower Bound on Communication Quiz | 00:01:00 | ||
Matching (Or Beating!) The Lower Bounds | 00:05:00 | ||
Conclusion | 00:02:00 | ||
Dist. Memory Sorting | |||
Introduction | 00:02:00 | ||
Distributed Bitonic Merge Via Binary Exchange | 00:04:00 | ||
Quiz: Pick a Network | 00:01:00 | ||
Quiz: Communication Cost of a Bitonic Merge | 00:01:00 | ||
Bitonic Merge Via Transposes | 00:03:00 | ||
Quiz: Butterfly Trivia | 00:01:00 | ||
Bitonic Sort Cost Computation | 00:02:00 | ||
Quiz: Bitonic Sort Cost Communication | 00:01:00 | ||
Linear Time Distributed Sort – Part 1 | 00:03:00 | ||
Quiz: Distributed Bucket Sort | 00:02:00 | ||
Linear Time Distributed Sort – Part 2 | 00:03:00 | ||
Quiz: Cost of Distributed Sample Sort | 00:01:00 | ||
Conclusion | 00:02:00 | ||
Distributed BFS | |||
Introduction | 00:01:00 | ||
Graphs and Adjacency Matrices | 00:02:00 | ||
Quiz: The Adjacency Matrix of a Directed Graph | 00:01:00 | ||
Quiz: Losing Your Direction | 00:01:00 | ||
Breadth First Search Review | 00:02:00 | ||
Matrix Based BFS | 00:03:00 | ||
Quiz: Matrix based BFS The Quiz! | 00:01:00 | ||
1 D Distributed BFS | 00:03:00 | ||
Quiz: 2 D Distributed BFS Quiz! | 00:01:00 | ||
Conclusion | 00:01:00 | ||
Graph Partitioning | |||
Introduction | 00:01:00 | ||
The Graph Partitioning Problem | 00:03:00 | ||
Quiz: Do You Really Want a Graph Partition? | 00:01:00 | ||
Graph Bisection and Planar Separators | 00:03:00 | ||
Quiz: Partitioning via Breadth First Search | 00:01:00 | ||
Kernighan Lin, Part 1: No Gain is Pain | 00:03:00 | ||
Quiz: Kernighan Lin Algorithm Quiz! | 00:01:00 | ||
Kernighan Lin Algorithm | 00:03:00 | ||
Graph Coarsening | 00:04:00 | ||
Quiz: Coarsen Me, Baby | 00:01:00 | ||
Maximal and Maximum Matchings | 00:02:00 | ||
Quiz: Find the Maximal Matching | 00:01:00 | ||
Quiz: A Fact about Maximal Matchings | 00:01:00 | ||
Computing a Maximal Matching | 00:03:00 | ||
Fine-to Coarse and Back Again | 00:02:00 | ||
Quiz: Partition Refinement | 00:01:00 | ||
Spectral Partitioning, Part 1: The Graph Laplacian | 00:04:00 | ||
Quiz: Graph Laplacian | 00:01:00 | ||
Spectral Partitioning. Part 2: Springs Fling | 00:04:00 | ||
Quiz: A 2-D Laplacian | 00:01:00 | ||
Quiz: Counting Edge Cuts | 00:02:00 | ||
Spectral Partitioning, Part 4 | 00:03:00 | ||
Conclusion | 00:02:00 | ||
Basic Model of Locality | |||
Introduction | 00:01:00 | ||
A First, Basic Model | 00:04:00 | ||
Quiz: Two Level Memories | 00:01:00 | ||
Quiz: Alignment | 00:01:00 | ||
Quiz: Minimum Transfers to Sort | 00:01:00 | ||
Quiz: Minimum Transfers to Multiply Matricies | 00:01:00 | ||
I/O Example Reduction | 00:03:00 | ||
Quiz: Matrix Vector Multiply | 00:03:00 | ||
Algorithmic Design Goals | 00:02:00 | ||
Quiz: Which is Better? | 00:01:00 | ||
Intensity, Balance and Time | 00:03:00 | ||
Quiz: Roofline Plots | 00:02:00 | ||
Quiz: Intensity of Conventional Matrix Multiply | 00:02:00 | ||
Quiz: Intensity of Conventional Matrix Multiply, Part 2 | 00:02:00 | ||
Quiz: Informing the Architecture | 00:02:00 | ||
Conclusion | 00:02:00 | ||
Algorithmic Time, Energy and Power | |||
Introduction | 00:01:00 | ||
Quiz: Speed Trends | 00:01:00 | ||
Quiz: Speed Limits | 00:02:00 | ||
Quiz: Space Limits | 00:01:00 | ||
Quiz: Balance in Time | 00:02:00 | ||
Balance Principles | 00:06:00 | ||
Quiz: Double, Double Toil and Trouble | 00:02:00 | ||
Power Limits | 00:03:00 | ||
The Dynamic Power Equation | 00:04:00 | ||
Quiz: Power Motivates Parallelism | 00:01:00 | ||
Quiz: Power Knobs | 00:01:00 | ||
Quiz: Powerless to Choose | 00:01:00 | ||
Quiz: Exploiting DVFS | 00:01:00 | ||
Quiz: Algorithmic Energy | 00:02:00 | ||
Quiz: Algorithmic Dynamic Power | 00:01:00 | ||
Quiz: Parallelism and DVFS | 00:02:00 | ||
Conclusion (Lesson 18) | 00:02:00 | ||
I/O-Avoiding Algorithms | |||
Introduction | 00:01:00 | ||
Quiz: A Sense of Scale | 00:03:00 | ||
External Memory Mergesort | 00:02:00 | ||
Quiz: Partitioned Sorting Step Analysis | 00:02:00 | ||
Two Way External Memory Merging | 00:03:00 | ||
Quiz: External Memory Mergesort | 00:01:00 | ||
Quiz: What’s Wrong with 2 Way Merging | 00:02:00 | ||
Mulitway Merging | 00:04:00 | ||
Quiz: Cost of Multiway Merge | 00:02:00 | ||
A Lower Bound on External Memory Sorting | 00:03:00 | ||
Quiz: How Many Transfers in Binary Search | 00:01:00 | ||
Quiz: Lower Bounds for Search | 00:01:00 | ||
Quiz: I/O Efficient Data Structures | 00:01:00 | ||
Conclusion | 00:02:00 | ||
Cache-Oblivious Algorithms | |||
Introduction | 00:01:00 | ||
An Introductory Example | 00:03:00 | ||
The Ideal-Cache Model | 00:05:00 | ||
Quiz: What Would an Ideal Cache Do? | 00:02:00 | ||
Quiz: LRU Replacement | 00:01:00 | ||
How Ideal is an Ideal Cache? | 00:03:00 | ||
Proof of the LRU-OPT Competitiveness Lemma | 00:04:00 | ||
Quiz: LRU Sorting | 00:01:00 | ||
The Tall-Cache Assumption | 00:03:00 | ||
Quiz: Which “Cache” is Tall? | 00:01:00 | ||
Cache-Oblivious Matrix Multiply | 00:05:00 | ||
Cache-Oblivious Binary Search | 00:04:00 | ||
Quiz: Van Emde Boas Layout | 00:01:00 | ||
Conclusion | 00:01:00 | ||
Conclusion | |||
So long and thanks for all the…spawns and syncs | 00:02:00 | ||
Thank you and goodnight | 00:02:00 | ||
Assessment | |||
Submit Your Assignment | 00:00:00 | ||
Certification | 00:00:00 |
Course Reviews
No Reviews found for this course.