Useful Tips
Code Organization
Fennel code can be flexibly organized in a variety of ways - it's plain Python after all. That said, here are some tips that have been seen to work well across various teams using Fennel - feel free to use or adapt these to your needs.
You can also checkout end-to-end example projects here for inspiration.
Module Organization
The simplest of the projects may have only four separate modules - one each for datasets, featuresets, commit script, and tests.
1fennel-project/
2 | datasets.py
3 | featuresets.py
4 | commit.py
5 | test.py
bash
With this structure, as the names imply, all source & dataset definitions go
in datasets.py
, all featureset definitions go in featuresets.py
, tests go in
test.py
and the script to instantiate Fennel client and make the commit request
goes under the if __name__ == 'main'
block in commit.py
.
As the complexity grows, these modules may need to be factorized. One natural
approach is to convert datasets
and featuresets
modules to be directories
and organize sub-modules under them based on the domain. Since the sources
(e.g. s3 credentials) are needed for each dataset sub-module, they can be further
factorized in their own module. Tests could be moved to a top-level /tests
or /tests
under both datasets/featuresets or unbundled as test_x.py
files -
really up to your personal taste.
Overall, the structure could look something like this:
1fennel-project/
2 | connectors.py
3 | datasets/
4 | __init__.py
5 | user.py
6 | product.py
7 | ...
8 | featuresets/
9 | __init__.py
10 | user.py
11 | click_history.py
12 | ...
13 | commit.py
14 | tests /
15 | test_user.py
16 | ...
bash
From this point onwards, the structure can scale and grow like any other Python project.
Unit Tests With Data Files
Fennel has a strong typing system which makes it easier to detect & catch data quality issues. However, this also adds some overhead in getting the type of each dataset field right.
A common pattern is to checkin a sample data file for each sourced dataset and log the contents of the file into the dataset as part of a unit test. This ensures that the unit test passes if and only if the dataset field types all match the contents of the sample file.
With this, the directory structure may look like this:
1fennel-project/
2 | connectors.py
3 | data/
4 | user_signups.csv
5 | transactions.parquet
6 | ...
7 | datasets/
8 | ...
9 | featuresets/
10 | ...
11 | commit.py
12 | tests/
13 | ...
bash
Checkout this example data directory and this test directory to see tests that use this pattern.
Organizing Featuresets
Fennel featuresets are just a collection of some features. It's entirely possible to create a single featureset with all the features in it. It's also possible to create thousands of featuresets each with a single feature. There is no right or wrong way of grouping features.
That said, here are some useful ways to organize your featuresets:
- Featureset per entity: featuresets could be mapped 1:1 to entities,
for instance, by having one featureset for
User
features, a separate featureset for allProduct
features, a third featureset forUserSeller
features that involve user's interaction with the seller in the past and so on. - Featureset per domain: featureset could be mapped to business domains, for instance in a bank, having one featureset for credit card balance features, one for checkin account features, one for mortgage account and so on.
- Featureset per request type: almost always, special featuresets are needed that map 1:1 with the information contained in the inference requests - for instance such a featureset will have one feature for uid, one for time of request, one for IP from which user is logging in and so on. These featuresets have no extractors so all these features need to be provided as inputs.