API
Docs

Expectations

Fennel's type system lets one maintain data integrity by rejecting data that does not conform to its types. However, there are cases where one may want to accept data that does not conform to the types, but still monitor how often the data does not conform to the types. For this, Fennel provides the ability to specify expectations on the data.

Fennel internally relies on Great Expectations to help users easily specify data expectations. Fennel's expectations are a subset of Great Expectations expectations and are documented below, but the api to specify expectations is the same.

Expectation Types


Single Column Expectations

The following expectations operate on a single column at a time.

  1. expect_column_values_to_not_be_null

    Expect the column values to not be null. To be counted as an exception, values must be explicitly null or missing, such as np.nan. Empty strings don't count as null unless they have been coerced to a null type.

    Parameters:

    • column (str) – The column name.
  2. expect_column_values_to_be_null

    Expect the column values to be null. It is the inverse of expect_column_values_to_not_be_null.

    Parameters:

    • column (str) – The column name.
  3. expect_column_values_to_be_of_type

    Expect a column to contain values of a specified data type.

    Parameters:

    • column (str) – The column name.
    • type_ (str) – The expected data type of the column values.
  4. expect_column_values_to_be_in_type_list

    Expect a column to contain values of one of several specified data types.

    Parameters:

    • column (str) – The column name.
    • type_list (list) – A list of expected data types of the column values.
  5. expect_column_values_to_be_in_set

    Expect each column value to be in a given set.
    Parameters:

    • column (str) – The column name.
    • value_set (list) – A set of objects used for comparison.
  6. expect_column_values_to_not_be_in_set

    Expect each column value to not be in a given set.

    Parameters:

    • column (str) – The column name.
    • value_set (list) – A set of objects used for comparison.
  7. expect_column_values_to_be_between

    Expect column values to be between a minimum value and a maximum value.

    Parameters:

    • column (str) – The column name.
    • min_value (int) – The minimum value for a column entry.
    • max_value (int) – The maximum value for a column entry.
    • strict_min (bool) – If True, the column values must be strictly larger than min_value.
    • strict_max (bool) – If True, the column values must be strictly smaller than max_value.
  8. expect_column_value_lengths_to_be_between

    Expect the lengths of column values to be between a minimum value and a maximum value.

    Parameters:

    • column (str) – The column name.
    • min_value (int) – The minimum value for a column entry length.
    • max_value (int) – The maximum value for a column entry length.
  9. expect_column_value_lengths_to_equal

    Expect the lengths of column values to equal a given value.

    Parameters:

    • column (str) – The column name.
    • value (int) – The expected length of column values.
  10. expect_column_values_to_match_regex

    Expect column entries to be strings that match a given regular expression. .

    Parameters:

    • column (str) – The column name.
    • value (int) – The expected length of column values.
  11. expect_column_values_to_not_match_regex

    Expect the lengths of column values to equal a given value.

    Parameters:

    • column (str) – The column name.
    • value (int) – The expected length of column values.
  12. expect_column_values_to_match_regex_list

    Expect column entries to be strings that match at least one of a list of regular expressions.

    Parameters:

    • column (str) – The column name.
    • regex_list (list) – The list of regular expressions that each column entry should match at least one of.
  13. expect_column_values_to_not_match_regex_list

    Expect column entries to be strings that do not match any of a list of regular expressions.

    Parameters:

    • column (str) – The column name.
    • regex_list (list) – The list of regular expressions that each column entry should not match any of.
  14. expect_column_values_to_match_strftime_format

    Expect column entries to be strings representing a date or time with a given format.

    Parameters:

    • column (str) – The column name.
    • strftime_format (str) – The strftime format that each column entry should match.
  15. expect_column_values_to_be_dateutil_parseable

    Expect column entries to be parseable using dateutil.

    Parameters:

    • column (str) – The column name.
  16. expect_column_values_to_be_json_parseable

    Expect column entries to be parseable as JSON.

    Parameters:

    • column (str) – The column name.
  17. expect_column_values_to_match_json_schema

    Expect column entries to match a given JSON schema.

    Parameters:

    • column (str) – The column name.
    • json_schema (dict) – The JSON schema that each column entry should match.

Multi Column Expectations

The following expectations require two or more columns.

  1. expect_column_pair_values_to_be_equal

    Expect the values in a column to be the exact same as the values in another column.

    Parameters:

    • column_A (str) – The first column name.
    • column_B (str) – The second column name.
    • ignore_row_if (str) – Control how null values are handled. See ignore_row_if for details.
  2. expect_column_pair_values_A_to_be_greater_than_B

    Expect the values in column A to be greater than the values in column B.

    Parameters:

    • column_A (str) – The first column name.
    • column_B (str) – The second column name.
    • or_equal (bool) – If True, then values can be equal, not strictly greater than.
  3. expect_column_pair_values_to_be_in_set

    Expect the values in a column to belong to a given set.

    Parameters:

    • column_A (str) – The first column name.
    • column_B (str) – The second column name.
    • value_pairs_set (set) – A set of tuples describing acceptable pairs of values. Each tuple should have two elements, the first from column A and the second from column B.
  4. expect_multicolumn_sum_to_equal

    Expect the sum of multiple columns to equal a specified value.

    Parameters:

    • column_list (list) – The list of column names to be summed.
    • sum_total (int) – The expected sum of the columns.


Example

1from fennel.datasets import dataset
2from fennel.lib import (
3    expectations,
4    expect_column_values_to_be_between,
5    expect_column_values_to_be_in_set,
6    expect_column_pair_values_A_to_be_greater_than_B,
7)
8from fennel.dtypes import between
9
10
11@dataset
12class Sample:
13    passenger_count: between(int, 0, 100)
14    gender: str
15    age: between(int, 0, 100, strict_min=True)
16    mothers_age: between(int, 0, 100, strict_min=True)
17    timestamp: datetime
18
19    @expectations
20    def my_function(cls):
21        return [
22            expect_column_values_to_be_between(
23                column=str(cls.passenger_count),
24                min_value=1,
25                max_value=6,
26                mostly=0.95,
27            ),
28            expect_column_values_to_be_in_set(
29                str(cls.gender), ["male", "female"], mostly=0.99
30            ),
31            # Pairwise expectation
32            expect_column_pair_values_A_to_be_greater_than_B(
33                column_A=str(cls.age), column_B=str(cls.mothers_age)
34            ),
35        ]

python