FAANG-Level Apache Airflow Study Plan
1️⃣ Basics & Foundations
- What is Airflow? Why is it used?
- Airflow vs Cron / vs Luigi / vs Prefect / vs Dagster
- Core Components:
- DAG, Task, Operator, Scheduler, Executor, Worker, Webserver, Metastore (DB)
- Directed Acyclic Graph (DAG) concept
- Anatomy of a simple DAG (with example)
2️⃣ DAG Internals & Configuration
- DAG Structure:
dag_id, start_date, schedule_interval, catchup, default_args, etc.
- DAG Execution Flow:
- Dependencies:
set_upstream(), set_downstream(), >>, <<
- Triggers:
TriggerRule, depends_on_past, wait_for_downstream
- DAG Parameters:
params, Variable, XCom, BranchPythonOperator
3️⃣ Operators & Hooks
- Types of Operators:
PythonOperator, BashOperator, DummyOperator, BranchPythonOperator
EmailOperator, Sensor, SubDagOperator (deprecated but asked)
DockerOperator, KubernetesPodOperator
- Hooks and Connections
- What are Hooks?
- Common Hooks:
PostgresHook, S3Hook, HttpHook, MySqlHook, etc.
- Sensors & Their Types:
ExternalTaskSensor, TimeSensor, FileSensor, S3KeySensor
4️⃣ Scheduling & Execution Engine
- Scheduler Internals:
- How the Scheduler picks DAGs
- DAG Parsing vs DAG Execution