dbt with expectations!
Hi 👋
Still on the same vibes as my last article (here), I just wanted to share a great tips about how to save a lot of time regarding your data tests 🥸.
You may already know great_expectations a kick ass service to handle your data quality. As a reminder, great_expectations is a shared, open standard for data quality. It helps data teams eliminate pipeline debt through data testing.
And an “expectation’’ is a declarative statement which can be easily described as “expect_column_values_between’’. You can then use the expectation to test your own data. You have a lot of expectations already available in the library. Please find below several examples:
- expect_column_distinct_values_to_be_in_set
- expect_column_max_to_be_between
- expect_column_distinct_values_to_equal_set
- expect_column_min_to_be_between
- expect_column_sum_to_be_between
- and many others
If you are dealing with data warehouse, data marts, business intelligence you already know that data quality is at the center of everything. Meaning you can build the most beautiful report, with a tone of metrics and with the smoothest UX ever created… if your warehouse is not trusted and hard tested it’s meaningless.
What’s the point with dbt?
dbt is as you know a great service to handle your analytics engineering workflows. You can do a lot of things and you already have a way to manage your tests with singular and generic tests. All the tests are directly managed in your dbt project so it’s really easy to jump from your data model to your model testing.
One of the first best practices is to always test every model on the primary key with unique and not_null tests. It’s a simple trick and it will catch all the basic errors.
dbt expectations is an extension package for dbt and it’s (a lot) inspired by great_expectations and it’s developed by Calogica.com. With dbt_expectations you just have to deploy a packages.yml in your dbt project with:
packages:
- package: calogica/dbt_expectations
version: 0.5.6
And that’s it. Then you just have to start deploying expectations in your model.yml files.
models:
- name: name_of_my_model
columns:
- name: column_name
tests:
- dbt_expectations.name_of_my_expectation
And you can try it with the command dbt test and then have an overview of your test:
If you look at the details you can check what is managed behind the scene:
or
Of course it’s some basic tests, but if you have a look at their github, you will find more elegant expectations.
Here we are! You can define some great tests directly in your dbt projet and you can enjoy all of your tests failing (and start crying ðŸ˜) because of your bad bad data incoming.
Michaël