Apache Airflow and Prefect

Big asset managers like BlackRock, Vanguard use custom-built schedulers, cron-based orchestration, enterprise ETL tools, and monolithic job control systems to manage workflow. These setups were often brittle and difficult to manage at scale but were the standard in the 2000s and early 2010s.

Finance had complex, interdependent systems decades before open-source orchestration tools existed. Regulations required auditability, determinism, and log retention, which pushed toward enterprise solutions. Latency and overnight batch cycles meant orchestrating large volumes of data was mission-critical.

But now they have Apache Airflow (launched 2014) and Prefect. They increasingly adopt hybrid setups — e.g., using Airflow on top of legacy systems, or using AWS Step Functions for cloud-native orchestration. Python becoming dominant has made Airflow/Dagster appealing, especially for quant/dev teams.

Here is a simple_dag.py

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

# Define Python functions
def start_task():
    print("Workflow started!")

def print_current_date():
    print(f"Current date and time is: {datetime.now()}")

def end_task():
    print("Workflow completed!")

# Define the DAG
with DAG(
    dag_id='simple_etl_example',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',  # Runs once per day
    catchup=False,               # Only run from today forward
    tags=['example'],
) as dag:

    start = PythonOperator(
        task_id='start',
        python_callable=start_task
    )

    print_date = PythonOperator(
        task_id='print_date',
        python_callable=print_current_date
    )

    end = PythonOperator(
        task_id='end',
        python_callable=end_task
    )

    # Define task order
    start >> print_date >> end

So what’s pro and cons to compare Airflow to Prefect?

Capability	Airflow	Prefect 2.x
Programming model	Declarative (DAGs defined statically)	Imperative (flows written like Python code)
State Handling	Tasks run independently; DAG defines flow	Flows and tasks manage state natively
Error Handling	Requires explicit try/except, retries	Built-in retry, timeout, caching options
Dynamic Workflows	Harder (e.g., branching, looping)	Easy (can use native `if`, `for`, etc.)
UI / Observability	Rich web UI with logs, DAG view	Simple local UI, full cloud dashboard
Scheduling	Built-in scheduler	External (Prefect Cloud or CLI)
Deployment	Airflow Scheduler + Workers + DB	Prefect Agent + Python environment
Extensibility	Plugins, Operators, Hooks	Python-first decorators, integrations

Summary

Use Case	Recommendation
You want a proven, enterprise-grade batch scheduler with lots of integrations	🟢 Use Airflow
You want simple Python-native workflows, better dynamic behavior, and modern DX	🟢 Use Prefect
You’re managing legacy ETL pipelines	🟢 Stick with or migrate to Airflow gradually
You’re starting from scratch or want to move fast in Python	🟢 Prefer Prefect

Hence Prefect fits well.

A simple prefect_flow.py

from prefect import flow, task

# Define the extract task
@task
def extract():
    data = {"AAPL": 150, "MSFT": 300, "GOOG": 2800}
    print("Extracted data:", data)
    return data

# Define the transform task
@task
def transform(data):
    # Add 5% to each price
    transformed_data = {k: round(v * 1.05, 2) for k, v in data.items()}
    print("Transformed data:", transformed_data)
    return transformed_data

# Define the load task
@task
def load(data):
    print("Loading data...")
    for k, v in data.items():
        print(f"{k}: ${v}")

# Define the flow that connects the tasks
@flow
def simple_etl_flow():
    raw_data = extract()
    processed_data = transform(raw_data)
    load(processed_data)

# Run the flow directly
if __name__ == "__main__":
    simple_etl_flow()

Naixian Zhang

Apache Airflow and Prefect

Summary

Leave a comment Cancel reply

Summary

Share this:

Related

Leave a comment Cancel reply