Big Data Engineering

Master Spark, Hadoop, and cloud platforms to process and analyze massive datasets at scale. Build robust data pipelines for enterprise applications.

Course Overview

Dive into the world of big data engineering and learn to build scalable systems that handle massive datasets. Master the tools and technologies used by top tech companies to process petabytes of data efficiently.

What You'll Learn

  • Apache Spark for distributed computing
  • Hadoop ecosystem and HDFS
  • Cloud platforms (AWS, Azure, GCP)
  • Stream processing with Kafka
  • Data pipeline orchestration
  • NoSQL databases (MongoDB, Cassandra)

Curriculum

Weeks 1-2: Big Data Fundamentals

Introduction to big data concepts, distributed systems, and data engineering principles

Weeks 3-5: Hadoop Ecosystem

HDFS, MapReduce, Hive, HBase, and cluster management

Weeks 6-8: Apache Spark

Spark Core, SQL, Streaming, MLlib, and performance optimization

Weeks 9-10: Cloud & Streaming

Cloud data services, Kafka, real-time processing, and microservices

Week 11: NoSQL & Data Lakes

NoSQL databases, data lake architecture, and modern data stack

Week 12: Capstone Project

Build and deploy a complete big data solution with real-time analytics

Course Details

Duration: 12 weeks
Level: Advanced
Students: 1,156
Rating:
4.5 (156)
Price: $379

Your Instructor

MT

Michael Thompson

Principal Data Engineer at Netflix

MS in Computer Science from Stanford, 12+ years in big data, architect of Netflix's recommendation data pipeline serving 200M+ users.

Prerequisites

  • Strong programming skills (Python/Java)
  • Database and SQL knowledge
  • Basic distributed systems concepts
  • Linux command line experience

Hands-on Projects

Real-time Analytics Pipeline

Build a streaming data pipeline using Kafka and Spark for real-time user behavior analysis.

Kafka + Spark

Data Lake Architecture

Design and implement a cloud-based data lake using AWS services for petabyte-scale storage.

AWS + Hadoop

ML Pipeline at Scale

Create an end-to-end machine learning pipeline that processes millions of records daily.

Spark MLlib

Student Success Stories

"This course was exactly what I needed to advance from traditional databases to big data. Now I'm leading data architecture at Spotify!"

Ryan Chen
Senior Data Engineer, Spotify

"The hands-on projects were incredible. Building real-time pipelines with Michael's guidance prepared me perfectly for my role at Uber."

Priya Sharma
Data Platform Engineer, Uber