Roadmap
Data Engineering 0 to 100
Road Map
flowchart LR Python{{1. Python}} --> PythonA(Basic Python Tutorial) PythonA --> PythonB(Intermediate Python Tutorial) --> PythonC(OOP with Python) PythonC --> Python2{{1.2 Data Wrangling}} --> Numpy(Numpy) --> Pandas(Pandas) --> Pyarrow(Pyarrow) PythonC --> Logging(Logging) --> Testing(Testing) PythonC --> DataClass(Data Classes) PythonC --> Async(Async) Python{{1. Python}} ---> SQL{{2. SQL}} ---> Spark{{3. Spark}} ---> Scheduling{{4. Scheduling}} Spark{{Spark}} --> LearnSpark(Learning Spark Second Edition) Scheduling{{Scheduling}} --> Airflow(Apache Airflow)
Python
- Basic Python Tutorial (Youtube Course)
- Intermediate Python Tutorial (Youtube Course)
- Object Oriented Programming with Python (Youtube Course)
Data Wrangling and Manipulation
Numpy
- Numpy For Machine Learning (Youtube Course - Playlist)
Pandas
- Pandas For Machine Learning (Youtube Course - Playlist)
- Pandas Illustrated: The Definitive Visual Guide to Pandas
- Modern Pandas
- Pandas GroupBy
Pyarrow (Intermediate)
Logging
- Logging in Python
- Python Logging Tutorial (Youtube Short Tutorial)
Testing
- Getting Started With Testing in Python
- Pytest for Beginners
- Start Testing Your Python Code with pytest (Youtube Short Tutorial)
Other Subjects
- Data Classes (Youtube Short Tutorial)
- How to Create an Async API Call with asyncio (Youtube Short Tutorial)
SQL
1. SQL Tutorial
- SQL Tutorial for beginners by W3 (Interactive Course)
- SQL Tutorial (Interactive Course)
- SQL Tutorial with MySQL (Youtube Course - Playlist)
- SQL Joins (Blog Post)
- SQL Window Functions (Blog Post)
2. SQL Playgrounds
Scheduling
1. Apache Airflow
- Airflow Tutorial for Beginners (Youtube Course)
- Airflow Tips for Beginners (Youtube Short Tutorial)
Spark
1. Tutorials
- Learning Spark Second Edition (🆓 Free Book provided by Databricks)
- Pyspark by example
- Pyspark by example codes
- PySpark Tutorial (Youtube Course)
2. Spark Book Suggestions
- Essential PySpark for Scalable Data Analytics ( ℹ️ Highly Recommended)
- Spark: The Definitive Guide
Apache Nifi
Apache Kafka
- Apache Kafka® 101 for Beginners Course (Youtube Course - Playlist)
- Apache Kafka for Beginners (Youtube Course - Playlist)
- Apache Kafka Connect (Youtube Course - Playlist)
- Apache Kafka Streams (Youtube Course - Playlist)
- Apache Kafka KSQL for Stream Processing (Youtube Course - Playlist)
Dashboarding
1. PowerBI
- Hands-On Power BI Tutorial (Youtube Course)
- Data Modeling for Power BI (Youtube Course)
2. Streamlit
- Streamlit (Dashboard/Web Application in Python) (Youtube Course - Playlist)
Cloud
Azure Fundementals
- Azure Fundementals (Youtube Course - Playlist)
- Introduction to Azure Data Services (Youtube Course)
Databricks
- Azure Databricks for Beginners (Youtube Course - Playlist)
Azure Synapse
Azure Data Factory (Cloud Data Integration Tool)
- Azure Data Factory: Beginner to Pro (Youtube Course)
- Azure Data Factory Tutorial (Youtube Course)
- Azure Data Factory: Data Flows (Youtube Course - Playlist)
Data Cleaning
Data Pipeline Design Patterns
Computer Science Fundemental
Leave a Comment