Documentation

API Reference

Data Types

Fennel supports the following data types, expressed as native Python type hints.

intfloat

int is implemented as int64 and float is implemented as float64

bool

Booleans

str

Arbitrary sequence of bytes.

List[T]

List of elements of type T. Unlike Python lists, all elements must have the same type.

dict[T]

Map from str to data of type T. Please let us know if your use cases require dict with non-string keys

Optional[T]

Same as Python Optional - permits either None or values of type T

Embedding[int]

Denotes a list of floats of the given fixed length i.e. Embedding[32] describes a list of 32 floats. This is same as list[float] but enforces the list length which is important for dot product and other similar operations on embeddings.

datetime

Describes a timestamp, implemented as microseconds since Unix epoch (so minimum granularity is microseconds). Can be natively parsed from multiple formats.

struct {T1, T2, ..}

Describes the equivalent of a struct or dataclass - a container containing a fixed set of fields of fixed types.

Note that types don't auto-typecast. For instance, if something was expected to be of type float but received an int, Fennel will declare that to be an error.

Struct Type

Fennel natively supports 'struct' type to represent a bag of fixed typed fields. Here is how to define and use a struct type.

data-types.py
1@struct  # like dataclass but verifies that all fields are valid Fennel types
2class Address:
3    street: str
4    city: str
5    state: str
6    zip_code: str
7
8
9@meta(owner="[email protected]")
10@dataset
11class Student:
12    id: int = field(key=True)
13    name: str
14    age: int
15    address: Address  # Address is now a valid Fennel type for datasets/features
16    signup_time: datetime

Type Restrictions

Imagine that you have a field that denotes a US zip code but stored as string. Not all strings denote valid zip codes - only those that match a particular regex do but this can be hard to encode, which can lead to incorrect data being stored.

Fennel supports type restrictions -- these are additional constraints put on base types that restrict the set of valid values in some form. Here is a list of supported restrictions:

Restricted TypeBase TypeRestriction
regex('<pattern>')strPermits only the strings matching the given regex pattern
between(T, low, high)T where T can be int or floatOnly permits values between low and high (both inclusive). Left or right can be made exclusive by setting min_strict or max_strict to be False
oneof(T, [values...])TOnly one of the given values is accepted. For the restriction to be valid, values themselves should be of type T

These restricted types act as regular types -- they can be mixed/matched to form complex composite types. For instance, the following are all valid Fennel types:

  • list[regex('$[0-9]{5}$')] - list of regexes matching US zip codes
  • oneof(Optional[int], [None, 0, 1]) - a nullable type that only takes 0 or 1 as valid values

Note: data belonging to these restricted types is still stored / transmitted (e.g. in json encoding) as a regular base type. It's just that Fennel will reject data of base type that doesn't meet the restriction.

Example:

1@meta(owner="[email protected]")
2@source(webhook.endpoint("UserInfoDataset"))
3@dataset
4class UserInfoDataset:
5    user_id: int = field(key=True)
6    name: str
7    age: between(int, 0, 100)
8    gender: oneof(str, ["male", "female", "non-binary"])
9    email: regex(r"[^@]+@[^@]+\.[^@]+")
10    timestamp: datetime