API Reference

Binary Operations

Standard binary operations - arithmetic operations (+, -, *, /, %), relational operations (==, !=, >, >=, <, >=) and logical operations (&, |). Note that logical and is represented as & and logical or is represented as |. Bitwise operations aren`t yet supported.

Typing Rules

All arithmetic operations and all comparison operations (i.e. >, >=, < <=) are only permitted on numerical data - ints/floats or options of ints/floats. In such cases, if even one of the operands is of type float, the whole expression is promoted to be of type of float.
Logical operations &, | are only permitted on boolean data.
If even one of the operands is optional, the whole expression is promoted to optional type.
None, like many SQL dialects, is interpreted as some unknown value. As a result, x + None is None for all x. None values are 'viral' in the sense that they usually make value of the whole expression None. Some notable exceptions : False & None is still False and True | None is still True irrespective of the unknown value that None represents.

1import pandas as pd
2from fennel.expr import lit, col
3
4expr = col("x") + col("y")
5assert expr.typeof(schema={"x": int, "y": int}) == int
6assert expr.typeof(schema={"x": int, "y": float}) == float
7assert expr.typeof(schema={"x": float, "y": float}) == float
8assert (
9    expr.typeof(schema={"x": Optional[float], "y": int}) == Optional[float]
10)
11
12df = pd.DataFrame({"x": [1, 2, None]})
13expr = lit(1) + col("x")
14assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [2, 3, pd.NA]
15
16expr = lit(1) - col("x")
17assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [0, -1, pd.NA]
18
19expr = lit(1) * col("x")
20assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [1, 2, pd.NA]
21
22expr = lit(1) / col("x")
23assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [
24    1,
25    0.5,
26    pd.NA,
27]

Using binary expressions

python

Col

Function to reference existing columns in the dataframe.

Parameters

name:str

The name of the column being referenced. In the case of pipelines, this will typically be the name of the field and in the case of extractors, this will be the name of the feature.

1from fennel.expr import col
2
3expr = col("x") + col("y")
4
5# type of col("x") + col("y") changes based on the type of 'x' and 'y'
6assert expr.typeof(schema={"x": int, "y": float}) == float
7
8# okay if additional columns are provided
9assert expr.typeof(schema={"x": int, "y": float, "z": str}) == float
10
11# raises an error if the schema is not provided
12with pytest.raises(ValueError):
13    expr.typeof(schema={})
14with pytest.raises(ValueError):
15    expr.typeof(schema={"x": int})
16with pytest.raises(ValueError):
17    expr.typeof(schema={"z": int, "y": float})
18
19# can be evaluated with a dataframe
20import pandas as pd
21
22df = pd.DataFrame({"x": [1, 2, 3], "y": [1.0, 2.0, 3.0]})
23assert expr.eval(df, schema={"x": int, "y": float}).tolist() == [
24    2.0,
25    4.0,
26    6.0,
27]

Referencing columns of a dataframe using col

python

Returns

Expr

Returns an expression object denoting a reference to the column. The type of the resulting expression is same as that of the referenced column. When evaluated in the context of a dataframe, the value of the expression is same as the value of the dataframe column of that name.

Errors

Referenced column not provided:

Error during typeof or eval if the referenced column isn't present.

Datetime

Function to get a constant datetime object from its constituent parts.

Parameters

year:int

The year of the datetime. Note that this must be an integer, not an expression denoting an integer.

month:int

The month of the datetime. Note that this must be an integer, not an expression denoting an integer.

day:int

The day of the datetime. Note that this must be an integer, not an expression denoting an integer.

hour:int

Default: 0

The hour of the datetime. Note that this must be an integer, not an expression denoting an integer.

minute:int

Default: 0

The minute of the datetime. Note that this must be an integer, not an expression denoting an integer.

second:int

Default: 0

The second of the datetime. Note that this must be an integer, not an expression denoting an integer.

millisecond:int

Default: 0

The millisecond of the datetime. Note that this must be an integer, not an expression denoting an integer.

microsecond:int

Default: 0

The microsecond of the datetime. Note that this must be an integer, not an expression denoting an integer.

timezone:Optional[str]

Default: UTC

The timezone of the datetime. Note that this must be a string denoting a valid timezone, not an expression denoting a string.

Returns

Expr

Returns an expression object denoting the datetime object.

1from fennel.expr import datetime as dt
2
3expr = dt(year=2024, month=1, day=1)
4
5# datetime works for any datetime type or optional datetime type
6assert expr.typeof() == datetime
7
8# can be evaluated with a dataframe
9df = pd.DataFrame({"dummy": [1, 2, 3]})
10assert expr.eval(df, schema={"dummy": int}).tolist() == [
11    pd.Timestamp("2024-01-01 00:00:00", tz="UTC"),
12    pd.Timestamp("2024-01-01 00:00:00", tz="UTC"),
13    pd.Timestamp("2024-01-01 00:00:00", tz="UTC"),
14]
15# can provide timezone
16expr = dt(year=2024, month=1, day=1, timezone="US/Eastern")
17assert expr.eval(df, schema={"dummy": int}).tolist() == [
18    pd.Timestamp("2024-01-01 00:00:00", tz="US/Eastern"),
19    pd.Timestamp("2024-01-01 00:00:00", tz="US/Eastern"),
20    pd.Timestamp("2024-01-01 00:00:00", tz="US/Eastern"),
21]

Getting a datetime from its constituent parts

python

Errors

Invalid datetime parts:

The month must be between 1 and 12, the day must be between 1 and 31, the hour must be between 0 and 23, the minute must be between 0 and 59, the second must be between 0 and 59, the millisecond must be between 0 and 999, and the microsecond must be between 0 and 999.

Timezone, if provided, must be a valid timezone string. Note that Fennel only supports area/location based timezones (e.g. "America/New_York"), not fixed offsets (e.g. "+05:30" or "UTC+05:30").

Eval

Helper function to evaluate the value of an expression in the context of a schema and a dataframe.

Parameters

input_df:pd.Dataframe

The dataframe for which the expression is evaluated - one value is produced for each row in the dataframe.

schema:Dict[str, Type]

The schema of the context under which the expression is to be evaluated. In the case of pipelines, this will be the schema of the input dataset and in the case of extractors, this will be the schema of the featureset.

1import pandas as pd
2from fennel.expr import lit, col
3
4expr = lit(1) + col("amount")
5# value of 1 + col('amount') changes based on the type of 'amount'
6df = pd.DataFrame({"amount": [1, 2, 3]})
7assert expr.eval(df, schema={"amount": int}).tolist() == [2, 3, 4]
8
9df = pd.DataFrame({"amount": [1.0, 2.0, 3.0]})
10assert expr.eval(df, schema={"amount": float}).tolist() == [2.0, 3.0, 4.0]
11
12# raises an error if the schema is not provided
13with pytest.raises(TypeError):
14    expr.eval(df)
15
16# dataframe doesn't have the required column even though schema is provided
17df = pd.DataFrame({"other": [1, 2, 3]})
18with pytest.raises(Exception):
19    expr.eval(df, schema={"amount": int})

Using eval on a dataframe

python

Returns

pd.Series

Returns a series object of the same length as the number of rows in the input dataframe.

Errors

Referenced column not provided:

All columns referenced by col expression must be present in both the dataframe and the schema.

Invalid expression:

The expression should be valid in the context of the given schema.

Runtime error:

Some expressions may produce a runtime error e.g. trying to parse an integer out of a string may throw an error if the string doesn't represent an integer.

From Epoch

Function to get a datetime object from a unix timestamp.

Parameters

duration:Expr

The duration (in units as specified by unit) since epoch to convert to a datetime in the form of an expression denoting an integer.

unit:str

Default: second

The unit of the duration parameter. Can be one of second, millisecond, or microsecond. Defaults to second.

Returns

Expr

Returns an expression object denoting the datetime object.

1from fennel.expr import col, from_epoch
2
3expr = from_epoch(col("x"), unit="second")
4
5# from_epoch works for any int or optional int type
6assert expr.typeof(schema={"x": int}) == datetime
7assert expr.typeof(schema={"x": Optional[int]}) == Optional[datetime]
8
9# can be evaluated with a dataframe
10df = pd.DataFrame({"x": [1714857600, 1714857601, 1714857602]})
11schema = {"x": int}
12expected = [
13    pd.Timestamp("2024-05-04 21:20:00", tz="UTC"),
14    pd.Timestamp("2024-05-04 21:20:01", tz="UTC"),
15    pd.Timestamp("2024-05-04 21:20:02", tz="UTC"),
16]
17assert expr.eval(df, schema=schema).tolist() == expected

Getting a datetime from a unix timestamp

python

Is Null

Expression equivalent to: x is null

Parameters

expression:Expr

The expression that will be checked for nullness.

1from fennel.expr import col
2
3expr = col("x").isnull()
4
5# type of isnull is always boolean
6assert expr.typeof(schema={"x": Optional[int]}) == bool
7
8# also works for non-optional types, where it's always False
9assert expr.typeof(schema={"x": float}) == bool
10
11# raises an error if the schema is not provided
12with pytest.raises(ValueError):
13    expr.typeof(schema={})
14
15# can be evaluated with a dataframe
16import pandas as pd
17
18df = pd.DataFrame({"x": pd.Series([1, 2, None], dtype=pd.Int64Dtype())})
19assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [
20    False,
21    False,
22    True,
23]

Using isnull expression

python

Returns

Expr

Returns an expression object denoting the output of isnull expression. Always evaluates to boolean.

Fill Null

The expression that is analogous to fillna in Pandas.

Parameters

expr:Expr

The expression that will be checked for nullness.

fill:Expr

The expression that will be substituted in case expr turns out to be null.

1from fennel.expr import col, lit
2
3expr = col("x").fillnull(lit(10))
4
5# type of fillnull depends both on type of 'x' and the literal 1
6assert expr.typeof(schema={"x": Optional[int]}) == int
7assert expr.typeof(schema={"x": float}) == float
8
9# raises an error if the schema is not provided
10with pytest.raises(ValueError):
11    expr.typeof(schema={})
12
13# can be evaluated with a dataframe
14import pandas as pd
15
16expr = col("x").fillnull(lit(10))
17df = pd.DataFrame({"x": pd.Series([1, 2, None], dtype=pd.Int64Dtype())})
18assert expr.eval(df, schema={"x": Optional[float]}).tolist() == [
19    1.0,
20    2.0,
21    10.0,
22]

Using fillnull expression

python

Returns

Expr

Returns an expression object denoting the output of fillnull expression. If the expr is of type Optional[T1] and the fill is of type T2, the type of the output expression is the smallest type that both T1 and T2 can be promoted into.

If the expr is not optional but is of type T, the output is trivially same as expr and hence is also of type T.

Lit

Fennel's way of describing constants, similar to lit in Polars or Spark.

Parameters

const:Any

The literal/constant Python object that is to be used as an expression in Fennel. This can be used to construct literals of ints, floats, strings, boolean, lists, structs etc. Notably though, it's not possible to use lit to build datetime literals.

1from fennel.expr import lit, col
2
3expr = lit(1)
4
5# lits don't need a schema to be evaluated
6assert expr.typeof() == int
7
8# can be evaluated with a dataframe
9expr = col("x") + lit(1)
10df = pd.DataFrame({"x": pd.Series([1, 2, None], dtype=pd.Int64Dtype())})
11assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [2, 3, pd.NA]

Using lit to describe contants

python

Returns

Any

The expression that denotes the literal value.

Not

Logical not unary operator, invoked by ~ symbol.

1from fennel.expr import lit
2
3expr = ~lit(True)
4assert expr.typeof() == bool
5
6# can be evaluated with a dataframe
7df = pd.DataFrame({"x": [1, 2, 3]})
8assert expr.eval(df, schema={"x": int}).tolist() == [False, False, False]

Using logical not unary operator

python

Now

Function to get current timestamp, similar to what datetime.now does in Python.

1from fennel.expr import now, col
2
3expr = now().dt.since(col("birthdate"), "year")
4
5assert (
6    expr.typeof(schema={"birthdate": Optional[datetime]}) == Optional[int]
7)
8
9# can be evaluated with a dataframe
10df = pd.DataFrame(
11    {"birthdate": [datetime(1997, 12, 24), datetime(2001, 1, 21), None]}
12)
13assert expr.eval(df, schema={"birthdate": Optional[datetime]}).tolist() == [
14    27,
15    24,
16    pd.NA,
17]

Using now to get age of a person

python

Returns

Any

Returns an expression object denoting a reference to the column. The type of the resulting expression is datetime.

Repeat

Repeat an expression n times to create a list.

Parameters

value:Expr

The expression to repeat.

by:Expr

The number of times to repeat the value - can evaluate to a different count for each row.

1from fennel.expr import repeat, col
2
3expr = repeat(col("x"), col("y"))
4
5assert expr.typeof(schema={"x": bool, "y": int}) == List[bool]
6
7# can be evaluated with a dataframe
8df = pd.DataFrame({"x": [True, False, True], "y": [1, 2, 3]})
9assert expr.eval(df, schema={"x": bool, "y": int}).tolist() == [
10    [True],
11    [False, False],
12    [True, True, True],
13]

Repeating booleans to create list

python

Returns

Expr

Returns an expression object denoting the result of the repeat expression.

Errors

Invalid input types:

An error is thrown if the by expression is not of type int. In addition, certain types (e.g. lists) are not supported as input for value.

Negative count:

An error is thrown if the by expression evaluates to a negative integer.

Typeof

Helper function to figure out the inferred type of any expression.

Parameters

schema:Dict[str, Type]

Default: None

The schema of the context under which the expression is to be analyzed. In the case of pipelines, this will be the schema of the input dataset and in the case of extractors, this will be the schema of the featureset.

Default value is set to None which represents an empty dictionary.

1from fennel.expr import lit, col
2
3expr = lit(1) + col("amount")
4# type of 1 + col('amount') changes based on the type of 'amount'
5assert expr.typeof(schema={"amount": int}) == int
6assert expr.typeof(schema={"amount": float}) == float
7assert expr.typeof(schema={"amount": Optional[int]}) == Optional[int]
8assert expr.typeof(schema={"amount": Optional[float]}) == Optional[float]
9
10# typeof raises an error if type of 'amount' isn't provided
11with pytest.raises(ValueError):
12    expr.typeof()
13
14# or when the expression won't be valid with the schema
15with pytest.raises(ValueError):
16    expr.typeof(schema={"amount": str})
17
18# no need to provide schema if the expression is constant
19const = lit(1)
20assert const.typeof() == int

Using typeof to check validity and type of expressions

python

Returns

Type

Returns the inferred type of the expression, if any.

Errors

Type of a referenced column not provided:

All columns referenced by col expression must be present in the provided schema.

Invalid expression:

The expression should be valid in the context of the given schema.

When

Ternary expressions like 'if/else' or 'case' in SQL.

Parameters

when:Expr

The predicate expression for the ternary operator. Must evaluate to a boolean.

then:Expr

The expression that the whole when expression evaluates to if the predictate evaluates to True. then must always be called on the result of a when expression.

otherwise:Expr

Default: lit(None)

The equivalent of else branch in the ternary expression - the whole expression evaluates to this branch when the predicate evaluates to be False.

Defaults to lit(None) when not provided.

1from fennel.expr import when, col, InvalidExprException
2
3expr = when(col("x")).then(1).otherwise(0)
4
5# type depends on the type of the then and otherwise values
6assert expr.typeof(schema={"x": bool}) == int
7
8# raises an error if the schema is not provided
9with pytest.raises(ValueError):
10    expr.typeof(schema={})
11# also when the predicate is not boolean
12with pytest.raises(ValueError):
13    expr.typeof(schema={"x": int})
14
15# can be evaluated with a dataframe
16import pandas as pd
17
18df = pd.DataFrame({"x": [True, False, True]})
19assert expr.eval(df, schema={"x": bool}).tolist() == [1, 0, 1]
20
21# not valid if only when is provided
22with pytest.raises(InvalidExprException):
23    expr = when(col("x"))
24    expr.typeof(schema={"x": bool})
25
26# if otherwise is not provided, it defaults to None
27expr = when(col("x")).then(1)
28assert expr.typeof(schema={"x": bool}) == Optional[int]

Conditionals using when expressions

python

Returns

Expr

Returns an expression object denoting the result of the when/then/otherwise expression.

Errors

Referenced column not provided:

Error during typeof or eval if the referenced column isn't present.

Malformed expressions:

Valid when expressions must have accompanying then and otherwise clauses.

Zip

Zip two or more lists into a list of structs.

Parameters

struct:Struct

The struct to hold the zipped values. Unlike other top level expressions, zip is written as Struct.zip(kwarg1=expr1, kwarg2=expr2, ...).

kwargs:Dict[str, Expr]

A dictionary of key-value pairs where the key is the name of the field in the struct and the value is the expression to zip.

Expressions are expected to evaluate to lists of a type that can be converted to the corresponding field type in the struct.

1from fennel.lib.schema import struct
2from fennel.expr import col
3
4@struct
5class MyStruct:
6    a: int
7    b: float
8
9expr = MyStruct.zip(a=col("x"), b=col("y"))
10
11expected = List[MyStruct]
12schema = {"x": List[int], "y": List[float]}
13assert expr.matches_type(expected, schema)
14
15# note that output is truncated to the length of the shortest list
16df = pd.DataFrame(
17    {"x": [[1, 2], [3, 4], []], "y": [[1.0, 2.0], [3.0], [4.0]]}
18)
19assert expr.eval(
20    df, schema={"x": List[int], "y": List[float]}
21).tolist() == [
22    [MyStruct(a=1, b=1.0), MyStruct(a=2, b=2.0)],
23    [MyStruct(a=3, b=3.0)],
24    [],
25]

Zipping two lists into a list of structs

python

Returns

Expr

Returns an expression object denoting the result of the zip expression.

Note

When zipping lists of unequal length, similar to Python's zip function, the resulting list will be truncated to the length of the shortest list, possibly zero.

Errors

Mismatching types:

An error is thrown if the types of the lists to zip are not compatible with the field types in the struct.

Non-list types:

An error is thrown if the expressions to zip don't evaluate to lists.