Useful Tips

Troubleshooting

Local Development

Fennel ships with a standalone mock client for easier & faster local mode development. Here are a few tips to debug issues encountered during local development:

Debugging Dataset Schema Issues

To debug dataset schema issues while writing pipelines, you can call the .schema() method on the dataset which returns a regular dictionary from the field name to the type and then just printing it or inspecting it using a debugger.

1@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
2@dataset
3class User:
4    uid: int = field(key=True)
5    city: str
6    signup_time: datetime
7
8@dataset(index=True)
9class Processed:
10    uid: int = field(key=True)
11    city: str
12    country: str
13    signup_time: datetime
14
15    @pipeline
16    @inputs(User)
17    def my_pipeline(cls, user: Dataset):
18        ds = user.filter(lambda df: df["city"] != "London")
19        schema = ds.schema()
20        print(schema)
21        return ds.assign(country=lit("US").astype(str))

Calling .schema() on datasets in pipelines

python

Printing Full Datasets

You can also print the full contents of Fennel datasets at any time by calling client.get_dataset_df(dataset_name) using the mock client and printing the resulting dataframe.

1from fennel.datasets import dataset, field, pipeline, Dataset
2from fennel.lib.schema import inputs
3from fennel.connectors import source, Webhook
4
5webhook = Webhook(name="webhook")
6
7@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
8@dataset
9class User:
10    uid: int = field(key=True)
11    country: str
12    signup_time: datetime
13
14@dataset
15class USUsers:
16    uid: int = field(key=True)
17    country: str
18    signup_time: datetime
19
20    @pipeline
21    @inputs(User)
22    def my_pipeline(cls, user: Dataset):
23        return user.filter(lambda df: df["country"] == "US")
24
25client.commit(message="msg", datasets=[User, USUsers])
26# log some rows to the dataset
27client.log(
28    "webhook",
29    "User",
30    pd.DataFrame(
31        columns=["uid", "country", "signup_time"],
32        data=[
33            [1, "UK", "2021-01-01T00:00:00"],
34            [2, "US", "2021-02-01T00:00:00"],
35            [3, "US", "2021-03-01T00:00:00"],
36        ],
37    ),
38)
39df = client.get_dataset_df("USUsers")
40print(df)

Obtaining full dataset from mock client

python

Warning

This debug functionality is only available in the mock client. You can instead use inspect APIs to debug data flow issues in prod.

Explicitly Setting Pandas Types

Fennel backend uses a strong typing system built in Rust. However, the mock client keeps the data in Pandas format which is notorious for doing arbitrary type conversions (at least before Pandas 2.0). For instance, an integer column with missing values is automatically typecast to be a float column by Pandas.

Sometimes this shows up as issues that will be present only with the mock client but not with the real server. And sometimes it shows up as real issues with real server getting masked during local development with the mock client.

Hence it's recommended to explicitly set data types when working with pandas and mock client by using the astype method.

1@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
2@dataset
3class User:
4    uid: int = field(key=True)
5    height_cm: Optional[float]
6    signup_time: datetime
7
8client.commit(message="msg", datasets=[User])
9# log some rows to the dataset
10df = pd.DataFrame(
11    columns=["uid", "height_cm", "signup_time"],
12    data=[
13        [1, 180, "2021-01-01T00:00:00"],
14        [2, 175, "2021-01-01T00:00:00"],
15        [3, None, "2021-01-01T00:00:00"],
16    ],
17)
18df["height_cm"] = df["height_cm"].astype("Int64")
19client.log("webhook", "User", df)

Explicit type cast in pandas using astype

python

Data Integration

MySQL connection issues

Some users have reported that they could not connect to Amazon RDS MySQL or MariaDB. This can be diagnosed with the error message: Cannot create a PoolableConnectionFactory. To solve this issue please set jdbc_params to enabledTLSProtocols=TLSv1.2

Troubleshooting

Local Development

Debugging Dataset Schema Issues

Printing Full Datasets

Explicitly Setting Pandas Types

Data Integration

MySQL connection issues

On This Page