Analytics SQL queries tend to get rather complex. To keep me sane, it is important to improve the development experience with tools like sqlfluff that can format and lint the queries. The following article describes the usage of sqlfluff with Jinja 2 templates (obligatory: don’t use untrusted input for you query arguments).

Let’s assume I have a simple project layout with a single configuration file and an SQL folder:

./sql/foo.sql
./sql/bar.sql
pyproject.toml

First, let’s configure sqlfluff, for example:

[tool.sqlfluff.core]
templater = "jinja"
sql_file_exts = ".jinja2.sql,.sql,.sql.jinja2"
dialect = "athena"
max_line_length = 120

Important: sqlfluff uses nesting configuration, where it basically traverses all directories to a file it acts on, tries to find another config file, and if it does overrides/updates values. The closer to the final file, the higher the priority of the settings. Finally, in-file configuration is also possible with comments like -- sqlfluff:dialect:athena, but: the value of templater can never be changed.

Let’s take a look at a simple file sql/example.sql.jinja2:

select
    "year", "month", "day"
from test.records
where
    "year" >= {{ year }}
    and "month" in ({{ months | join(',') }})

Trying to lint this with sqlfluff lint sql/example.sql.jinja2 fails, because we need to provide example values for the templating variables. sqlfluff simply interpolates the query with the example values to get a valid SQL query and then acts on that interpolated query. The sqlfluff docs always define example values for templating variables in configuration files, but this seems unintuitive to me, given that the configuration files apply to all sql files in a directory. Instead, I define them as top comments:

-- sqlfluff:templater:jinja:context:year:200
-- sqlfluff:templater:jinja:context:months:["100","200","300"]
select
    "year", "month", "day"
from test.records
where
    "year" >= {{ year }}
    and "month" in ({{ months | join(',') }})

sqlfluff render sql/example.sql.jinja2 shows us the interpolated version:

-- sqlfluff:templater:jinja:context:year:200
-- sqlfluff:templater:jinja:context:months:["100","200","300"]
select
    "year", "month", "day"
from test.records
where
    "year" >= 200
    and "month" in (100,200,300)

We can now use lint/format etc.

Further integration Link to heading

repos:
- repo: https://github.com/sqlfluff/sqlfluff
  rev: stable_version
  hooks:
    - id: sqlfluff-lint
    - id: sqlfluff-fix