fail_calc
Test queries are written to return a set of failing records, ones not matching the expectation or assertion declared by that test: duplicate records, null values, etc.
Most often, this is the count of rows returned by the test query: the default value of fail_calc
is count(*)
. But it can also be a custom calculation, whether an aggregate calculation or simply the name of a column to be selected from the test query.
Most tests do not use the fail_calc
config, preferring to return a count of failing rows. For the tests that do, the most common place to set the fail_calc
config is right within a generic test block, alongside its query definition. All the same, fail_calc
can be set in all the same places as other configs.
For instance, you can configure a unique
test to return sum(n_records)
instead of count(*)
as the failure calculation: that is, the number of rows in the model containing a duplicated column value, rather than the number of distinct column values that are duplicated.
Beware using functions like sum()
for fail_calc
in any test that has the potential to return no rows at all.
If no rows are returned, the test won't pass or fail but will return the following error:
None is not of type 'integer'
Failed validating 'type' in schema['properties']['failures']:
{'type': 'integer'}
On instance['failures']:
None
To avoid this issue, use a case statement to ensure that 0
is returned when no rows exist:
fail_calc: "case when count(*) > 0 then sum(n_records) else 0 end"
- Specific test
- One-off test
- Generic test block
- Project level
Configure a specific instance of a generic (schema) test:
version: 2
models:
- name: my_model
columns:
- name: my_columns
tests:
- unique:
config:
fail_calc: "case when count(*) > 0 then sum(n_records) else 0 end"
Configure a one-off (data) test:
{{ config(fail_calc = "sum(total_revenue) - sum(revenue_accounted_for)") }}
select ...
Set the default for all instances of a generic (schema) test, by setting the config inside its test block (definition):
{% test <testname>(model, column_name) %}
{{ config(fail_calc = "missing_in_a + missing_in_b") }}
select ...
{% endtest %}
Set the default for all tests in a package or project:
tests:
+fail_calc: count(*) # all tests
<package_name>:
+fail_calc: count(distinct id) # tests in <package_name>