Core Types
Fennel supports the following data types, expressed as native Python type hints.
Implemented as signed 8 byte integer (int64
)
Implemented as signed 8 byte float with double
precision
Implemented as signed 16 byte integer (int128
) with int val as precision.
Implemented as standard 1 byte boolean
Arbitrary sequence of utf-8 characters. Like most programming languages, str
doesn't support arbitrary binary bytes though.
Arbitrary sequence of binary bytes. This is useful for storing binary data.
List of elements of any other valid type T
. Unlike Python lists, all elements
must have the same type.
Map from str
to data of any valid type T
.
Fennel does not support dictionaries with arbitrary types for keys - please reach out to Fennel support if you have use cases requiring that.
Same as Python Optional
- permits either None
or values of type T
.
Denotes a list of floats of the given fixed length i.e. Embedding[32]
describes a list of 32 floats. This is same as list[float]
but enforces the
list length which is important for dot product and other similar operations on
embeddings.
Describes a timestamp, implemented as microseconds since Unix epoch (so minimum granularity is microseconds). Can be natively parsed from multiple formats though internally is stored as 8-byte signed integer describing timestamp as microseconds from epoch in UTC.
Describes a date, implemented as days since Unix epoch. Can be natively parsed from multiple formats though internally is stored as 8-byte signed integer describing date as days epoch in UTC.
Describes the equivalent of a struct or dataclass - a container containing a fixed set of fields of fixed types.
Fennel uses a strong type system and post data-ingestion, data doesn't auto-coerce
across types. For instance, it will be a compile or runtime error if something
was expected to be of type float
but received an int
instead.
1# imports for data types
2from typing import List, Optional, Dict
3from datetime import datetime
4from fennel.dtypes import struct
5
6# imports for datasets
7from fennel.datasets import dataset, field
8from fennel.lib import meta
9
10@struct # like dataclass but verifies that fields have valid Fennel types
11class Address:
12 street: str
13 city: str
14 state: str
15 zip_code: Optional[str]
16
17@meta(owner="[email protected]")
18@dataset
19class Student:
20 id: int = field(key=True)
21 name: str
22 grades: Dict[str, float]
23 honors: bool
24 classes: List[str]
25 address: Address # Address is now a valid Fennel type
26 signup_time: datetime
python
Type Restrictions
Fennel type restrictions allow you to put additional constraints on base types and restrict the set of valid values in some form.
Restriction on the base type of str
. Permits only the strings matching the given
regex pattern.
Restriction on the base type of int
or float
. Permits only the numbers
between low
and high
(both inclusive by default). Left or right can be made
exclusive by setting min_strict
or max_strict
to be False respectively.
Restricts a type T
to only accept one of the given values
as valid values.
oneof
can be thought of as a more general version of enum
.
For the restriction to be valid, all the values
must themselves be of type T
.
1# imports for data types
2from datetime import datetime, timezone
3from fennel.dtypes import oneof, between, regex
4
5# imports for datasets
6from fennel.datasets import dataset, field
7from fennel.lib import meta
8from fennel.connectors import source, Webhook
9
10webhook = Webhook(name="fennel_webhook")
11
12@meta(owner="[email protected]")
13@source(webhook.endpoint("UserInfoDataset"), disorder="14d", cdc="upsert")
14@dataset
15class UserInfoDataset:
16 user_id: int = field(key=True)
17 name: str
18 age: between(int, 0, 100, strict_min=True)
19 gender: oneof(str, ["male", "female", "non-binary"])
20 email: regex(r"[^@]+@[^@]+\.[^@]+")
21 timestamp: datetime
python
Type Restriction Composition
These restricted types act as regular types -- they can be mixed/matched to form complex composite types. For instance, the following are all valid Fennel types:
list[regex('$[0-9]{5}$')]
- list of regexes matching US zip codesoneof(Optional[int], [None, 0, 1])
- a nullable type that only takes 0 or 1 as valid values
Data belonging to the restricted types is still stored & transmitted (e.g. in json encoding) as a regular base type. It's just that Fennel will reject data of base type that doesn't meet the restriction.
Duration
Fennel lets you express durations in an easy to read natural language as described below:
Symbol | Unit |
---|---|
y | Year |
w | Week |
d | Day |
h | Hour |
m | Minute |
s | Second |
There is no shortcut for month because there is a very high degree of
variance in month's duration- some months are 28 days, some are 30 days and
some are 31 days. A common convention in ML is to use 4 weeks
to describe a month.
A year is hardcoded to be exactly 365 days and doesn't take into account variance like leap years.
1"7h" -> 7 hours
2"12d" -> 12 days
3"2y" -> 2 years
4"3h 20m 4s" -> 3 hours 20 minutes and 4 seconds
5"2y 4w" -> 2 years and 4 weeks
text