Binary Operations
Standard binary operations - arithmetic operations (+
, -
, *
, /
, %
),
relational operations (==
, !=
, >
, >=
, <
, >=
) and logical operations
(&
, |
). Note that logical and
is represented as &
and logical or
is
represented as |
. Bitwise operations aren`t yet supported.
Typing Rules
- All arithmetic operations and all comparison operations (i.e.
>
,>=
,<
<=
) are only permitted on numerical data - ints/floats or options of ints/floats. In such cases, if even one of the operands is of type float, the whole expression is promoted to be of type of float. - Logical operations
&
,|
are only permitted on boolean data. - If even one of the operands is optional, the whole expression is promoted to optional type.
- None, like many SQL dialects, is interpreted as some unknown value. As a
result,
x
+ None is None for allx
. None values are 'viral' in the sense that they usually make value of the whole expression None. Some notable exceptions :False & None
is still False andTrue | None
is still True irrespective of the unknown value that None represents.
1import pandas as pd
2from fennel.expr import lit, col
3
4expr = col("x") + col("y")
5assert expr.typeof(schema={"x": int, "y": int}) == int
6assert expr.typeof(schema={"x": int, "y": float}) == float
7assert expr.typeof(schema={"x": float, "y": float}) == float
8assert (
9 expr.typeof(schema={"x": Optional[float], "y": int}) == Optional[float]
10)
11
12df = pd.DataFrame({"x": [1, 2, None]})
13expr = lit(1) + col("x")
14assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [2, 3, pd.NA]
15
16expr = lit(1) - col("x")
17assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [0, -1, pd.NA]
18
19expr = lit(1) * col("x")
20assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [1, 2, pd.NA]
21
22expr = lit(1) / col("x")
23assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [
24 1,
25 0.5,
26 pd.NA,
27]
python
Col
Function to reference existing columns in the dataframe.
Parameters
The name of the column being referenced. In the case of pipelines, this will typically be the name of the field and in the case of extractors, this will be the name of the feature.
1from fennel.expr import col
2
3expr = col("x") + col("y")
4
5# type of col("x") + col("y") changes based on the type of 'x' and 'y'
6assert expr.typeof(schema={"x": int, "y": float}) == float
7
8# okay if additional columns are provided
9assert expr.typeof(schema={"x": int, "y": float, "z": str}) == float
10
11# raises an error if the schema is not provided
12with pytest.raises(ValueError):
13 expr.typeof(schema={})
14with pytest.raises(ValueError):
15 expr.typeof(schema={"x": int})
16with pytest.raises(ValueError):
17 expr.typeof(schema={"z": int, "y": float})
18
19# can be evaluated with a dataframe
20import pandas as pd
21
22df = pd.DataFrame({"x": [1, 2, 3], "y": [1.0, 2.0, 3.0]})
23assert expr.eval(df, schema={"x": int, "y": float}).tolist() == [
24 2.0,
25 4.0,
26 6.0,
27]
python
Returns
Returns an expression object denoting a reference to the column. The type of the resulting expression is same as that of the referenced column. When evaluated in the context of a dataframe, the value of the expression is same as the value of the dataframe column of that name.
Errors
Error during typeof
or eval
if the referenced column isn't present.
Datetime
Function to get a constant datetime object from its constituent parts.
Parameters
The year of the datetime. Note that this must be an integer, not an expression denoting an integer.
The month of the datetime. Note that this must be an integer, not an expression denoting an integer.
The day of the datetime. Note that this must be an integer, not an expression denoting an integer.
Default: 0
The hour of the datetime. Note that this must be an integer, not an expression denoting an integer.
Default: 0
The minute of the datetime. Note that this must be an integer, not an expression denoting an integer.
Default: 0
The second of the datetime. Note that this must be an integer, not an expression denoting an integer.
Default: 0
The millisecond of the datetime. Note that this must be an integer, not an expression denoting an integer.
Default: 0
The microsecond of the datetime. Note that this must be an integer, not an expression denoting an integer.
Default: UTC
The timezone of the datetime. Note that this must be a string denoting a valid timezone, not an expression denoting a string.
Returns
Returns an expression object denoting the datetime object.
1from fennel.expr import datetime as dt
2
3expr = dt(year=2024, month=1, day=1)
4
5# datetime works for any datetime type or optional datetime type
6assert expr.typeof() == datetime
7
8# can be evaluated with a dataframe
9df = pd.DataFrame({"dummy": [1, 2, 3]})
10assert expr.eval(df, schema={"dummy": int}).tolist() == [
11 pd.Timestamp("2024-01-01 00:00:00", tz="UTC"),
12 pd.Timestamp("2024-01-01 00:00:00", tz="UTC"),
13 pd.Timestamp("2024-01-01 00:00:00", tz="UTC"),
14]
15# can provide timezone
16expr = dt(year=2024, month=1, day=1, timezone="US/Eastern")
17assert expr.eval(df, schema={"dummy": int}).tolist() == [
18 pd.Timestamp("2024-01-01 00:00:00", tz="US/Eastern"),
19 pd.Timestamp("2024-01-01 00:00:00", tz="US/Eastern"),
20 pd.Timestamp("2024-01-01 00:00:00", tz="US/Eastern"),
21]
python
Errors
The month must be between 1 and 12, the day must be between 1 and 31, the hour must be between 0 and 23, the minute must be between 0 and 59, the second must be between 0 and 59, the millisecond must be between 0 and 999, and the microsecond must be between 0 and 999.
Timezone, if provided, must be a valid timezone string. Note that Fennel only supports area/location based timezones (e.g. "America/New_York"), not fixed offsets (e.g. "+05:30" or "UTC+05:30").
Eval
Helper function to evaluate the value of an expression in the context of a schema and a dataframe.
Parameters
The dataframe for which the expression is evaluated - one value is produced for each row in the dataframe.
The schema of the context under which the expression is to be evaluated. In the case of pipelines, this will be the schema of the input dataset and in the case of extractors, this will be the schema of the featureset.
1import pandas as pd
2from fennel.expr import lit, col
3
4expr = lit(1) + col("amount")
5# value of 1 + col('amount') changes based on the type of 'amount'
6df = pd.DataFrame({"amount": [1, 2, 3]})
7assert expr.eval(df, schema={"amount": int}).tolist() == [2, 3, 4]
8
9df = pd.DataFrame({"amount": [1.0, 2.0, 3.0]})
10assert expr.eval(df, schema={"amount": float}).tolist() == [2.0, 3.0, 4.0]
11
12# raises an error if the schema is not provided
13with pytest.raises(TypeError):
14 expr.eval(df)
15
16# dataframe doesn't have the required column even though schema is provided
17df = pd.DataFrame({"other": [1, 2, 3]})
18with pytest.raises(Exception):
19 expr.eval(df, schema={"amount": int})
python
Returns
Returns a series object of the same length as the number of rows in the input dataframe.
Errors
All columns referenced by col
expression must be present in both the
dataframe and the schema.
The expression should be valid in the context of the given schema.
Some expressions may produce a runtime error e.g. trying to parse an integer out of a string may throw an error if the string doesn't represent an integer.
From Epoch
Function to get a datetime object from a unix timestamp.
Parameters
The duration (in units as specified by unit
) since epoch to convert to a datetime
in the form of an expression denoting an integer.
Default: second
The unit of the duration
parameter. Can be one of second
, millisecond
,
or microsecond
. Defaults to second
.
Returns
Returns an expression object denoting the datetime object.
1from fennel.expr import col, from_epoch
2
3expr = from_epoch(col("x"), unit="second")
4
5# from_epoch works for any int or optional int type
6assert expr.typeof(schema={"x": int}) == datetime
7assert expr.typeof(schema={"x": Optional[int]}) == Optional[datetime]
8
9# can be evaluated with a dataframe
10df = pd.DataFrame({"x": [1714857600, 1714857601, 1714857602]})
11schema = {"x": int}
12expected = [
13 pd.Timestamp("2024-05-04 21:20:00", tz="UTC"),
14 pd.Timestamp("2024-05-04 21:20:01", tz="UTC"),
15 pd.Timestamp("2024-05-04 21:20:02", tz="UTC"),
16]
17assert expr.eval(df, schema=schema).tolist() == expected
python
Is Null
Expression equivalent to: x is null
Parameters
The expression that will be checked for nullness.
1from fennel.expr import col
2
3expr = col("x").isnull()
4
5# type of isnull is always boolean
6assert expr.typeof(schema={"x": Optional[int]}) == bool
7
8# also works for non-optional types, where it's always False
9assert expr.typeof(schema={"x": float}) == bool
10
11# raises an error if the schema is not provided
12with pytest.raises(ValueError):
13 expr.typeof(schema={})
14
15# can be evaluated with a dataframe
16import pandas as pd
17
18df = pd.DataFrame({"x": pd.Series([1, 2, None], dtype=pd.Int64Dtype())})
19assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [
20 False,
21 False,
22 True,
23]
python
Returns
Returns an expression object denoting the output of isnull
expression. Always
evaluates to boolean.
Fill Null
The expression that is analogous to fillna
in Pandas.
Parameters
The expression that will be checked for nullness.
The expression that will be substituted in case expr
turns out to be null.
1from fennel.expr import col, lit
2
3expr = col("x").fillnull(lit(10))
4
5# type of fillnull depends both on type of 'x' and the literal 1
6assert expr.typeof(schema={"x": Optional[int]}) == int
7assert expr.typeof(schema={"x": float}) == float
8
9# raises an error if the schema is not provided
10with pytest.raises(ValueError):
11 expr.typeof(schema={})
12
13# can be evaluated with a dataframe
14import pandas as pd
15
16expr = col("x").fillnull(lit(10))
17df = pd.DataFrame({"x": pd.Series([1, 2, None], dtype=pd.Int64Dtype())})
18assert expr.eval(df, schema={"x": Optional[float]}).tolist() == [
19 1.0,
20 2.0,
21 10.0,
22]
python
Returns
Returns an expression object denoting the output of fillnull
expression.
If the expr
is of type Optional[T1]
and the fill
is of type T2
, the
type of the output expression is the smallest type that both T1
and T2
can
be promoted into.
If the expr
is not optional but is of type T, the output is trivially same as
expr
and hence is also of type T
.
Lit
Fennel's way of describing constants, similar to lit
in Polars or Spark.
Parameters
The literal/constant Python object that is to be used as an expression in Fennel.
This can be used to construct literals of ints, floats, strings, boolean, lists,
structs etc. Notably though, it's not possible to use lit
to build datetime
literals.
1from fennel.expr import lit, col
2
3expr = lit(1)
4
5# lits don't need a schema to be evaluated
6assert expr.typeof() == int
7
8# can be evaluated with a dataframe
9expr = col("x") + lit(1)
10df = pd.DataFrame({"x": pd.Series([1, 2, None], dtype=pd.Int64Dtype())})
11assert expr.eval(df, schema={"x": Optional[int]}).tolist() == [2, 3, pd.NA]
python
Returns
The expression that denotes the literal value.
Not
Logical not unary operator, invoked by ~
symbol.
1from fennel.expr import lit
2
3expr = ~lit(True)
4assert expr.typeof() == bool
5
6# can be evaluated with a dataframe
7df = pd.DataFrame({"x": [1, 2, 3]})
8assert expr.eval(df, schema={"x": int}).tolist() == [False, False, False]
python
Now
Function to get current timestamp, similar to what datetime.now
does in Python.
1from fennel.expr import now, col
2
3expr = now().dt.since(col("birthdate"), "year")
4
5assert (
6 expr.typeof(schema={"birthdate": Optional[datetime]}) == Optional[int]
7)
8
9# can be evaluated with a dataframe
10df = pd.DataFrame(
11 {"birthdate": [datetime(1997, 12, 24), datetime(2001, 1, 21), None]}
12)
13assert expr.eval(df, schema={"birthdate": Optional[datetime]}).tolist() == [
14 27,
15 23,
16 pd.NA,
17]
python
Returns
Returns an expression object denoting a reference to the column. The type of the resulting expression is datetime.
Repeat
Repeat an expression n
times to create a list.
Parameters
The expression to repeat.
The number of times to repeat the value - can evaluate to a different count for each row.
1from fennel.expr import repeat, col
2
3expr = repeat(col("x"), col("y"))
4
5assert expr.typeof(schema={"x": bool, "y": int}) == List[bool]
6
7# can be evaluated with a dataframe
8df = pd.DataFrame({"x": [True, False, True], "y": [1, 2, 3]})
9assert expr.eval(df, schema={"x": bool, "y": int}).tolist() == [
10 [True],
11 [False, False],
12 [True, True, True],
13]
python
Returns
Returns an expression object denoting the result of the repeat expression.
Errors
An error is thrown if the by
expression is not of type int.
In addition, certain types (e.g. lists) are not supported as input for value
.
An error is thrown if the by
expression evaluates to a negative integer.
Typeof
Helper function to figure out the inferred type of any expression.
Parameters
Default: None
The schema of the context under which the expression is to be analyzed. In the case of pipelines, this will be the schema of the input dataset and in the case of extractors, this will be the schema of the featureset.
Default value is set to None
which represents an empty dictionary.
1from fennel.expr import lit, col
2
3expr = lit(1) + col("amount")
4# type of 1 + col('amount') changes based on the type of 'amount'
5assert expr.typeof(schema={"amount": int}) == int
6assert expr.typeof(schema={"amount": float}) == float
7assert expr.typeof(schema={"amount": Optional[int]}) == Optional[int]
8assert expr.typeof(schema={"amount": Optional[float]}) == Optional[float]
9
10# typeof raises an error if type of 'amount' isn't provided
11with pytest.raises(ValueError):
12 expr.typeof()
13
14# or when the expression won't be valid with the schema
15with pytest.raises(ValueError):
16 expr.typeof(schema={"amount": str})
17
18# no need to provide schema if the expression is constant
19const = lit(1)
20assert const.typeof() == int
python
Returns
Returns the inferred type of the expression, if any.
Errors
All columns referenced by col
expression must be present in the provided
schema.
The expression should be valid in the context of the given schema.
When
Ternary expressions like 'if/else' or 'case' in SQL.
Parameters
The predicate expression for the ternary operator. Must evaluate to a boolean.
The expression that the whole when expression evaluates to if the predictate
evaluates to True. then
must always be called on the result of a when
expression.
Default: lit(None)
The equivalent of else
branch in the ternary expression - the whole expression
evaluates to this branch when the predicate evaluates to be False.
Defaults to lit(None)
when not provided.
1from fennel.expr import when, col, InvalidExprException
2
3expr = when(col("x")).then(1).otherwise(0)
4
5# type depends on the type of the then and otherwise values
6assert expr.typeof(schema={"x": bool}) == int
7
8# raises an error if the schema is not provided
9with pytest.raises(ValueError):
10 expr.typeof(schema={})
11# also when the predicate is not boolean
12with pytest.raises(ValueError):
13 expr.typeof(schema={"x": int})
14
15# can be evaluated with a dataframe
16import pandas as pd
17
18df = pd.DataFrame({"x": [True, False, True]})
19assert expr.eval(df, schema={"x": bool}).tolist() == [1, 0, 1]
20
21# not valid if only when is provided
22with pytest.raises(InvalidExprException):
23 expr = when(col("x"))
24 expr.typeof(schema={"x": bool})
25
26# if otherwise is not provided, it defaults to None
27expr = when(col("x")).then(1)
28assert expr.typeof(schema={"x": bool}) == Optional[int]
python
Returns
Returns an expression object denoting the result of the when/then/otherwise expression.
Errors
Error during typeof
or eval
if the referenced column isn't present.
Valid when
expressions must have accompanying then
and otherwise
clauses.
Zip
Zip two or more lists into a list of structs.
Parameters
The struct to hold the zipped values. Unlike other top level expressions,
zip
is written as Struct.zip(kwarg1=expr1, kwarg2=expr2, ...)
.
A dictionary of key-value pairs where the key is the name of the field in the struct and the value is the expression to zip.
Expressions are expected to evaluate to lists of a type that can be converted to the corresponding field type in the struct.
1from fennel.lib.schema import struct
2from fennel.expr import col
3
4@struct
5class MyStruct:
6 a: int
7 b: float
8
9expr = MyStruct.zip(a=col("x"), b=col("y"))
10
11expected = List[MyStruct]
12schema = {"x": List[int], "y": List[float]}
13assert expr.matches_type(expected, schema)
14
15# note that output is truncated to the length of the shortest list
16df = pd.DataFrame(
17 {"x": [[1, 2], [3, 4], []], "y": [[1.0, 2.0], [3.0], [4.0]]}
18)
19assert expr.eval(
20 df, schema={"x": List[int], "y": List[float]}
21).tolist() == [
22 [MyStruct(a=1, b=1.0), MyStruct(a=2, b=2.0)],
23 [MyStruct(a=3, b=3.0)],
24 [],
25]
python
Returns
Returns an expression object denoting the result of the zip expression.
When zipping lists of unequal length, similar to Python's zip function, the resulting list will be truncated to the length of the shortest list, possibly zero.
Errors
An error is thrown if the types of the lists to zip are not compatible with the field types in the struct.
An error is thrown if the expressions to zip don't evaluate to lists.