Local Airflow setup.
I was looking for a virtual environment-based setup for Airflow that allows me to play around with DAGs. In my project setup, I want to have my project code and the DAGs completely separate such that the DAG e.g. just creates Operators that consume an image I build from the project code. In this setup, the DAG is completely separate from the rest of the code and never imports/touches the project code.
Note: you can find the code in my playground repository in the local-airflow
folder.
Having a separate virtuel environment for Airflow
We want a separate virtual environment for airflow, because:
- Airflow DAG code does not touch our project code.
- Thus, airflow dependencies should not constrain our project dependencies (i.e., the Python version etc.)
File tree:
- my_project: Contains the actual project code that is not directly touched by the DAGs.
- airflow-dags/dags: Contains dag files.
- airflow-dags/pyproject.toml: Contains the airflow dependencies.
- airflow-dags/tests: airflow tests.
- scripts/local_airflow_setup.sh: Creates the separate airflow virtual environment and similar.
- airflow-dags/.airflow: airflow home directory, created by the setup script.
- airflow-dags/.venv: airflow virtual environment, created by the setup script.
Running scripts/local_airflow_setup.sh
creates the virtual environment and the airflow config directory.
Officially, airflow does not support poetry for installation, so I stuck with their pip-based approach.
Call any airflow venv executables like this: PYTHONPATH=airflow AIRFLOW_HOME=$PWD/airflow/.airflow airflow-dags/.venv/bin/airflow list dags
Using the two virtual environments in VSCode
When editing the DAG, I want it to automatically use the airflow venv and otherwise the project venv to get proper code completion. I'm not 100% happy with the solution I found, but multi-root workspaces work okay with VS Code:
.vscode/settings.json
excludes theairflow-dags
directory..vscode/local-airflow.code-workspace
defines a separate workspace for.
and./airflow-dags
With VSCode, you open .vscode/local-airflow.code-workspace
.
Testing airflow
Locally, I want to check whether airflow can import my files.
The Airflow docs contain a unit test snippet.
They use DagBag
, but I don't fully understand what happens if you have several DAGs defined in the same module.
Instead, I decided to write a test that finds all sub-modules of airflow-dags/dags
and tries to import each of those.
Future
I want to be able to use provider Operators in the DAG like AzureBatch and when I run the DAG locally with my_dag.test()
it should just use e.g. the ExternalPythonOperator with the project vitual environment.
Using a dump wrapper class should work.