Useful Tips


Local Development

Fennel ships with a standalone mock client for easier & faster local mode development. Here are a few tips to debug issues encountered during local development:

Debugging Dataset Schema Issues

To debug dataset schema issues while writing pipelines, you can call the .schema() method on the dataset which returns a regular dictionary from the field name to the type and then just printing it or inspecting it using a debugger.

1@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
3class User:
4    uid: int = field(key=True)
5    city: str
6    signup_time: datetime
10class Processed:
11    uid: int = field(key=True)
12    city: str
13    country: str
14    signup_time: datetime
16    @pipeline
17    @inputs(User)
18    def my_pipeline(cls, user: Dataset):
19        ds = user.filter(lambda df: df["city"] != "London")
20        schema = ds.schema()
21        print(schema)
22        return ds.assign("country", str, lambda df: "US")
Calling .schema() on datasets in pipelines


Printing Full Datasets

You can also print the full contents of Fennel datasets at any time by calling client.get_dataset_df(dataset_name) using the mock client and printing the resulting dataframe.



1from fennel.datasets import dataset, field, pipeline, Dataset
2from fennel.lib.schema import inputs
3from fennel.connectors import source, Webhook
5webhook = Webhook(name="webhook")
7@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
9class User:
10    uid: int = field(key=True)
11    country: str
12    signup_time: datetime
15class USUsers:
16    uid: int = field(key=True)
17    country: str
18    signup_time: datetime
20    @pipeline
21    @inputs(User)
22    def my_pipeline(cls, user: Dataset):
23        return user.filter(lambda df: df["country"] == "US")
25client.commit(message="msg", datasets=[User, USUsers])
26# log some rows to the dataset
28    "webhook",
29    "User",
30    pd.DataFrame(
31        columns=["uid", "country", "signup_time"],
32        data=[
33            [1, "UK", "2021-01-01T00:00:00"],
34            [2, "US", "2021-02-01T00:00:00"],
35            [3, "US", "2021-03-01T00:00:00"],
36        ],
37    ),
39df = client.get_dataset_df("USUsers")
Obtaining full dataset from mock client



This debug functionality is only available in the mock client. You can instead use inspect APIs to debug data flow issues in prod.

Explicitly Setting Pandas Types

Fennel backend uses a strong typing system built in Rust. However, the mock client keeps the data in Pandas format which is notorious for doing arbitrary type conversions (at least before Pandas 2.0). For instance, an integer column with missing values is automatically typecast to be a float column by Pandas.

Sometimes this shows up as issues that will be present only with the mock client but not with the real server. And sometimes it shows up as real issues with real server getting masked during local development with the mock client.

Hence it's recommended to explicitly set data types when working with pandas and mock client by using the astype method.

1@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
3class User:
4    uid: int = field(key=True)
5    height_cm: Optional[float]
6    signup_time: datetime
8client.commit(message="msg", datasets=[User])
9# log some rows to the dataset
10df = pd.DataFrame(
11    columns=["uid", "height_cm", "signup_time"],
12    data=[
13        [1, 180, "2021-01-01T00:00:00"],
14        [2, 175, "2021-01-01T00:00:00"],
15        [3, None, "2021-01-01T00:00:00"],
16    ],
18df["height_cm"] = df["height_cm"].astype("Int64")
19client.log("webhook", "User", df)
Explicit type cast in pandas using astype


Data Integration

MySQL connection issues

Some users have reported that they could not connect to Amazon RDS MySQL or MariaDB. This can be diagnosed with the error message: Cannot create a PoolableConnectionFactory. To solve this issue please set jdbc_params to enabledTLSProtocols=TLSv1.2

On This Page

Edit this Page on Github