Julian Mehne

Local Airflow setup.

python, airflow

I was looking for a virtual environment-based setup for Airflow that allows me to play around with DAGs. In my project setup, I want to have my project code and the DAGs completely separate such that the DAG e.g. just creates Operators that consume an image I build from the project code. In this setup, the DAG is completely separate from the rest of the code and never imports/touches the project code.

Note: you can find the code in my playground repository in the local-airflow folder.

Having a separate virtuel environment for Airflow

We want a separate virtual environment for airflow, because:

File tree:

- my_project: Contains the actual project code that is not directly touched by the DAGs.
- airflow-dags/dags: Contains dag files.
- airflow-dags/pyproject.toml: Contains the airflow dependencies.
- airflow-dags/tests: airflow tests.
- scripts/local_airflow_setup.sh: Creates the separate airflow virtual environment and similar.
- airflow-dags/.airflow: airflow home directory, created by the setup script.
- airflow-dags/.venv: airflow virtual environment, created by the setup script.

Running scripts/local_airflow_setup.sh creates the virtual environment and the airflow config directory. Officially, airflow does not support poetry for installation, so I stuck with their pip-based approach. Call any airflow venv executables like this: PYTHONPATH=airflow AIRFLOW_HOME=$PWD/airflow/.airflow airflow-dags/.venv/bin/airflow list dags

Using the two virtual environments in VSCode

When editing the DAG, I want it to automatically use the airflow venv and otherwise the project venv to get proper code completion. I'm not 100% happy with the solution I found, but multi-root workspaces work okay with VS Code:

With VSCode, you open .vscode/local-airflow.code-workspace.

Testing airflow

Locally, I want to check whether airflow can import my files. The Airflow docs contain a unit test snippet. They use DagBag, but I don't fully understand what happens if you have several DAGs defined in the same module. Instead, I decided to write a test that finds all sub-modules of airflow-dags/dags and tries to import each of those.

Future

I want to be able to use provider Operators in the DAG like AzureBatch and when I run the DAG locally with my_dag.test() it should just use e.g. the ExternalPythonOperator with the project vitual environment. Using a dump wrapper class should work.