Misc.
Performance Tuning
Derived Extractors
For certain common extractor patterns, Fennel provides the ability to derive these extractors as part of their feature definitions. These derived extractors carry metadata that allow the fennel backend to significantly improve performance: for these extractors, Fennel avoids calling into the Python runtime generally needed to execute extractor logic.
A feature can have at most one extractor, across python-based and derived extractors.
The following extractor types are supported:
- Dataset lookup extractors. These extractors perform a lookup on a single field of a dataset, optionally supply a default value for missing rows, and assign the output to a single feature. Here's an example of a manually written extractor of this form:
1@meta(owner="[email protected]")
2@source(webhook.endpoint("User"))
3@dataset
4class User:
5 uid: int = field(key=True)
6 name: str
7 timestamp: datetime
8
9
10@meta(owner="[email protected]")
11@featureset
12class UserFeatures:
13 uid: int = feature(id=1)
14 name: str = feature(id=2)
15
16 @extractor(depends_on=[User])
17 @inputs(uid)
18 @outputs(name)
19 def func(cls, ts: pd.Series, uids: pd.Series):
20 names, found = User.lookup(ts, uid=uids)
21 names.fillna("Unknown", inplace=True)
22 return names[["name"]]
- Aliases. These extractors unidirectionally map an input feature to an output feature.
Examples
These extractors are derived by the feature.extract()
function. Here is an example:
1@meta(owner="[email protected]")
2@featureset
3class Request:
4 user_id: int = feature(id=1)
5
6
7@meta(owner="[email protected]")
8@featureset
9class UserFeaturesDerived:
10 uid: int = feature(id=1).extract(feature=Request.user_id)
11 name: str = feature(id=2).extract(field=User.name, default="Unknown")
In this example, UserFeaturesDerived.uid
is an alias to Request.user_id
. Aliasing is
specified via the feature
kwarg. UserFeaturesDerived.name
specifies a lookup extractor,
with the same functionality as the extractor func
defined above.
The lookup extractor uses the following arguments:
field
- The dataset field to do a lookup ondefault
- An optional default value for rows not foundprovider
- A featureset that provides the values matching the keys of the dataset to look up. The input feature name must match the field name. If not provided, as in the above example, then the current featureset is assumed to be the provider. If the input feature that the extractor needs does not match the name of the field, analias
extractor can be defined, as is the case withUserFeaturesDerived.uid
in the above example.
Here is an example where the input provider is a different featureset:
1@meta(owner="[email protected]")
2@featureset
3class Request2:
4 uid: int = feature(id=1)
5
6
7@meta(owner="[email protected]")
8@featureset
9class UserFeaturesDerived2:
10 name: str = feature(id=1).extract(
11 field=User.name, provider=Request2, default="Unknown"
12 )