Big asset managers like BlackRock, Vanguard use custom-built schedulers, cron-based orchestration, enterprise ETL tools, and monolithic job control systems to manage workflow. These setups were often brittle and difficult to manage at scale but were the standard in the 2000s and early 2010s.
Finance had complex, interdependent systems decades before open-source orchestration tools existed. Regulations required auditability, determinism, and log retention, which pushed toward enterprise solutions. Latency and overnight batch cycles meant orchestrating large volumes of data was mission-critical.
But now they have Apache Airflow (launched 2014) and Prefect. They increasingly adopt hybrid setups โ e.g., using Airflow on top of legacy systems, or using AWS Step Functions for cloud-native orchestration. Python becoming dominant has made Airflow/Dagster appealing, especially for quant/dev teams.
Here is a simple_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
# Define Python functions
def start_task():
print("Workflow started!")
def print_current_date():
print(f"Current date and time is: {datetime.now()}")
def end_task():
print("Workflow completed!")
# Define the DAG
with DAG(
dag_id='simple_etl_example',
start_date=datetime(2024, 1, 1),
schedule_interval='@daily', # Runs once per day
catchup=False, # Only run from today forward
tags=['example'],
) as dag:
start = PythonOperator(
task_id='start',
python_callable=start_task
)
print_date = PythonOperator(
task_id='print_date',
python_callable=print_current_date
)
end = PythonOperator(
task_id='end',
python_callable=end_task
)
# Define task order
start >> print_date >> end
So what’s pro and cons to compare Airflow to Prefect?
| Capability | Airflow | Prefect 2.x |
|---|---|---|
| Programming model | Declarative (DAGs defined statically) | Imperative (flows written like Python code) |
| State Handling | Tasks run independently; DAG defines flow | Flows and tasks manage state natively |
| Error Handling | Requires explicit try/except, retries | Built-in retry, timeout, caching options |
| Dynamic Workflows | Harder (e.g., branching, looping) | Easy (can use native if, for, etc.) |
| UI / Observability | Rich web UI with logs, DAG view | Simple local UI, full cloud dashboard |
| Scheduling | Built-in scheduler | External (Prefect Cloud or CLI) |
| Deployment | Airflow Scheduler + Workers + DB | Prefect Agent + Python environment |
| Extensibility | Plugins, Operators, Hooks | Python-first decorators, integrations |
Summary
| Use Case | Recommendation |
|---|---|
| You want a proven, enterprise-grade batch scheduler with lots of integrations | ๐ข Use Airflow |
| You want simple Python-native workflows, better dynamic behavior, and modern DX | ๐ข Use Prefect |
| You’re managing legacy ETL pipelines | ๐ข Stick with or migrate to Airflow gradually |
| You’re starting from scratch or want to move fast in Python | ๐ข Prefer Prefect |
Hence Prefect fits well.
A simple prefect_flow.py
from prefect import flow, task
# Define the extract task
@task
def extract():
data = {"AAPL": 150, "MSFT": 300, "GOOG": 2800}
print("Extracted data:", data)
return data
# Define the transform task
@task
def transform(data):
# Add 5% to each price
transformed_data = {k: round(v * 1.05, 2) for k, v in data.items()}
print("Transformed data:", transformed_data)
return transformed_data
# Define the load task
@task
def load(data):
print("Loading data...")
for k, v in data.items():
print(f"{k}: ${v}")
# Define the flow that connects the tasks
@flow
def simple_etl_flow():
raw_data = extract()
processed_data = transform(raw_data)
load(processed_data)
# Run the flow directly
if __name__ == "__main__":
simple_etl_flow()