Building Batch Data Analytics Solutions on AWS
This 1-day course is a look into the batch analytics tools available in a modern data architecture on AWS.
In this course, you will learn how to create batch data analytics solutions using Amazon EMR, which is a managed service for Apache Spark and Apache Hadoop. The training covers how Amazon EMR works with open-source tools like Apache Hive, Hue, and HBase, as well as AWS services such as AWS Glue and AWS Lake Formation. You will explore various aspects of data handling, including collection, ingestion, cataloging, storage, and processing, specifically focusing on Spark and Hadoop. The course will also teach you how to use EMR Notebooks for analytics and machine learning tasks, and you will learn best practices for security, performance, and cost management when using Amazon EMR.
This is an intermediate-level course that lasts 1 day and includes presentations, interactive demos, practice labs, discussions, and exercises. By the end of the course, you will be able to compare data warehouses and data lakes, design and implement batch data analytics solutions, optimize data storage, select the right tools for data ingestion and transformation, and understand how data storage impacts analysis and visualization. You will also learn how to secure data, monitor analytics workloads, and apply cost management strategies.
The course is aimed at data platform engineers and architects who manage data analytics pipelines. To benefit from this course, you should have at least one year of experience with open-source data frameworks such as Apache Spark or Apache Hadoop. It is recommended that you complete the AWS Hadoop Fundamentals course if you need a refresher. Additionally, you should have completed either AWS Technical Essentials or Architecting on AWS, and either Building Data Lakes on AWS or Getting Started with AWS Glue.