Roadmap avatar
12 days ago
7.5 kB

Data Engineering 0 to 100

Road Map

flowchart LR Python{{1. Python}} --> PythonA(Basic Python Tutorial) PythonA --> PythonB(Intermediate Python Tutorial) --> PythonC(OOP with Python) PythonC --> Python2{{1.2 Data Wrangling}} --> Numpy(Numpy) --> Pandas(Pandas) --> Pyarrow(Pyarrow) PythonC --> Logging(Logging) --> Testing(Testing) PythonC --> DataClass(Data Classes) PythonC --> Async(Async) Python{{1. Python}} ---> SQL{{2. SQL}} ---> Spark{{3. Spark}} ---> Scheduling{{4. Scheduling}} Spark{{Spark}} --> LearnSpark(Learning Spark Second Edition) Scheduling{{Scheduling}} --> Airflow(Apache Airflow)


  1. Basic Python Tutorial (Youtube Course)
  2. Intermediate Python Tutorial (Youtube Course)
  3. Object Oriented Programming with Python (Youtube Course)

Data Wrangling and Manipulation


  1. Numpy For Machine Learning (Youtube Course - Playlist)


  1. Pandas For Machine Learning (Youtube Course - Playlist)
  2. Pandas Illustrated: The Definitive Visual Guide to Pandas
  3. Modern Pandas
  4. Pandas GroupBy

Pyarrow (Intermediate)

  1. Apache Arrow Basics
  2. A gentle introduction to Apache Arrow with Apache Spark and Pandas


  1. Logging in Python
  2. Python Logging Tutorial (Youtube Short Tutorial)


  1. Getting Started With Testing in Python
  2. Pytest for Beginners
  3. Start Testing Your Python Code with pytest (Youtube Short Tutorial)

Other Subjects

  1. Data Classes (Youtube Short Tutorial)
  2. How to Create an Async API Call with asyncio (Youtube Short Tutorial)


1. SQL Tutorial

  1. SQL Tutorial for beginners by W3 (Interactive Course)
  2. SQL Tutorial (Interactive Course)
  3. SQL Tutorial with MySQL (Youtube Course - Playlist)
  4. SQL Joins (Blog Post)
  5. SQL Window Functions (Blog Post)

2. SQL Playgrounds

  1. SQLime
  2. Relational Dataset Repository


1. Apache Airflow

  1. Airflow Tutorial for Beginners (Youtube Course)
  2. Airflow Tips for Beginners (Youtube Short Tutorial)


1. Tutorials

  1. Learning Spark Second Edition (🆓 Free Book provided by Databricks)
  2. Pyspark by example
  3. Pyspark by example codes
  4. PySpark Tutorial (Youtube Course)

2. Spark Book Suggestions

  1. Essential PySpark for Scalable Data Analytics ( ℹ️ Highly Recommended)
  2. Spark: The Definitive Guide

Apache Nifi

  1. Apache NiFi Tutorial

Apache Kafka

  1. Apache Kafka® 101 for Beginners Course (Youtube Course - Playlist)
  2. Apache Kafka for Beginners (Youtube Course - Playlist)
  3. Apache Kafka Connect (Youtube Course - Playlist)
  4. Apache Kafka Streams (Youtube Course - Playlist)
  5. Apache Kafka KSQL for Stream Processing (Youtube Course - Playlist)


1. PowerBI

  1. Hands-On Power BI Tutorial (Youtube Course)
  2. Data Modeling for Power BI (Youtube Course)

2. Streamlit

  1. Streamlit (Dashboard/Web Application in Python) (Youtube Course - Playlist)


Azure Fundementals

  1. Azure Fundementals (Youtube Course - Playlist)
  2. Introduction to Azure Data Services (Youtube Course)


  1. Azure Databricks for Beginners (Youtube Course - Playlist)

Azure Synapse

  1. Azure Synapse

Azure Data Factory (Cloud Data Integration Tool)

  1. Azure Data Factory: Beginner to Pro (Youtube Course)
  2. Azure Data Factory Tutorial (Youtube Course)
  3. Azure Data Factory: Data Flows (Youtube Course - Playlist)

Data Cleaning

  1. Six Data Cleaning Checks
  2. Data cleaning for data sharing

Data Pipeline Design Patterns

  1. Data flow patterns
  2. Coding patterns in Python

Computer Science Fundemental

  1. Computer Networks
Leave a Comment