Useful Tips
Troubleshooting
Local Development
Fennel ships with a standalone mock client for easier & faster local mode development. Here are a few tips to debug issues encountered during local development:
Debugging Dataset Schema Issues
To debug dataset schema issues while writing pipelines, you can call
the .schema()
method on the dataset which returns a regular dictionary from
the field name to the type and then just printing it or inspecting it using a
debugger.
1@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
2@dataset
3class User:
4 uid: int = field(key=True)
5 city: str
6 signup_time: datetime
7
8@dataset(index=True)
9class Processed:
10 uid: int = field(key=True)
11 city: str
12 country: str
13 signup_time: datetime
14
15 @pipeline
16 @inputs(User)
17 def my_pipeline(cls, user: Dataset):
18 ds = user.filter(lambda df: df["city"] != "London")
19 schema = ds.schema()
20 print(schema)
21 return ds.assign(country=lit("US").astype(str))
python
Printing Full Datasets
You can also print the full contents of Fennel datasets at any time by calling
client.get_dataset_df(dataset_name)
using the mock client and printing
the resulting dataframe.
0
1
1from fennel.datasets import dataset, field, pipeline, Dataset
2from fennel.lib.schema import inputs
3from fennel.connectors import source, Webhook
4
5webhook = Webhook(name="webhook")
6
7@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
8@dataset
9class User:
10 uid: int = field(key=True)
11 country: str
12 signup_time: datetime
13
14@dataset
15class USUsers:
16 uid: int = field(key=True)
17 country: str
18 signup_time: datetime
19
20 @pipeline
21 @inputs(User)
22 def my_pipeline(cls, user: Dataset):
23 return user.filter(lambda df: df["country"] == "US")
24
25client.commit(message="msg", datasets=[User, USUsers])
26# log some rows to the dataset
27client.log(
28 "webhook",
29 "User",
30 pd.DataFrame(
31 columns=["uid", "country", "signup_time"],
32 data=[
33 [1, "UK", "2021-01-01T00:00:00"],
34 [2, "US", "2021-02-01T00:00:00"],
35 [3, "US", "2021-03-01T00:00:00"],
36 ],
37 ),
38)
39df = client.get_dataset_df("USUsers")
40print(df)
python
This debug functionality is only available in the mock client. You can instead use inspect APIs to debug data flow issues in prod.
Explicitly Setting Pandas Types
Fennel backend uses a strong typing system built in Rust. However, the mock client keeps the data in Pandas format which is notorious for doing arbitrary type conversions (at least before Pandas 2.0). For instance, an integer column with missing values is automatically typecast to be a float column by Pandas.
Sometimes this shows up as issues that will be present only with the mock client but not with the real server. And sometimes it shows up as real issues with real server getting masked during local development with the mock client.
Hence it's recommended to explicitly set data types when working with pandas and mock client by using the astype method.
1@source(webhook.endpoint("User"), disorder="14d", cdc="upsert")
2@dataset
3class User:
4 uid: int = field(key=True)
5 height_cm: Optional[float]
6 signup_time: datetime
7
8client.commit(message="msg", datasets=[User])
9# log some rows to the dataset
10df = pd.DataFrame(
11 columns=["uid", "height_cm", "signup_time"],
12 data=[
13 [1, 180, "2021-01-01T00:00:00"],
14 [2, 175, "2021-01-01T00:00:00"],
15 [3, None, "2021-01-01T00:00:00"],
16 ],
17)
18df["height_cm"] = df["height_cm"].astype("Int64")
19client.log("webhook", "User", df)
python
Data Integration
MySQL connection issues
Some users have reported that they could not connect to Amazon RDS MySQL or
MariaDB. This can be diagnosed with the error message: Cannot create a PoolableConnectionFactory
.
To solve this issue please set jdbc_params
to enabledTLSProtocols=TLSv1.2