Concepts

Featureset

Featuresets refer to a group of logically related features where each feature is backed by a Python function that knows how to extract it. A featureset is written as a Python class annotated with @featureset decorator. A single application will typically have many featuresets. Let's see an example:

1from fennel.featuresets import feature, featureset, extractor
2from fennel.lib.schema import inputs, outputs
3
4@featureset
5class Movie:
6    duration: int = feature(id=1)
7    over_2hrs: bool = feature(id=2)
8
9    @extractor
10    @inputs(duration)
11    @outputs(over_2hrs)
12    def my_extractor(cls, ts: pd.Series, durations: pd.Series) -> pd.Series:
13        return pd.Series(name="over_2hrs", data=durations > 2 * 3600)

python

Above example defines a featureset called Movie with two features - duration, over_2hrs. Each feature has a type and is given a monotonically increasing id that is unique within the featureset. This featureset has one extractor - my_extractor that when given the duration feature, knows how to extract the over_2hrs feature.

Extractors

Extractors are stateless Python functions in a featureset that are annotated by @extractor decorator. Each extractor accepts zero or more inputs (marked in inputs decorator) and produces one or more features (marked in outputs decorator).

In the above example, if the value of feature durations is known, my_extractor can extract the value of over_2hrs. When you are interested in getting values of some features, Fennel locates the extractors responsible for those features, verifies their inputs are available and runs their code to obtain the feature values.

API of Extractors

Conceptually, an extractor could be thought of as getting a table of timestamped input features and producing a few output features for each row. Something like this (assuming 3 input features and 2 output features):

TimestampInput 1Input 2Input 3Output 1Output 2
Jan 11, 2022, 11:00am123'hello'True??
Jan 12, 2022, 8:30am456'world'True??
Jan 13, 2022, 10:15am789'fennel'False??

The output is supposed to be the value of that feature as of the given timestamp assuming other input feature values to be as given.

Extractor is a classmethod so its first argument is always cls. After that, the second argument is always a series of timestamps (as shown above) and after that, it gets one series for each input feature. The output is a named series or dataframe, depending on the number of output features, of the same length as the input features.

Rules of Extractors

  1. A featureset can have zero or more extractors.
  2. An extractor can have zero or more inputs but must have at least one output.
  3. Input features of an extractor can belong to any number of featuresets but all output features must be from the same featureset as the extractor.
  4. For any feature, there can be at most one extractor where the feature appears in the output list.

With this, let's look at a few valid and invalid examples:

1from fennel.featuresets import feature, featureset, extractor
2
3@featureset
4class MoviesZeroExtractors:
5    duration: int = feature(id=1)
6    over_2hrs: bool = feature(id=2)
Featureset can have zero extractors

python

A featureset having a feature without an extractor simply means that Fennel doesn't know how to compute that feature and hence that feature must always be provided as an input to the resolution process.

1from fennel.featuresets import feature, featureset, extractor
2from fennel.lib.schema import inputs, outputs
3
4@featureset
5class MoviesManyExtractors:
6    duration: int = feature(id=1)
7    over_2hrs: bool = feature(id=2)
8    over_3hrs: bool = feature(id=3)
9
10    @extractor
11    @inputs(duration)
12    @outputs(over_2hrs)
13    def e1(cls, ts: pd.Series, durations: pd.Series) -> pd.Series:
14        return pd.Series(name="over_2hrs", data=durations > 2 * 3600)
15
16    @extractor
17    @inputs(duration)
18    @outputs(over_3hrs)
19    def e2(cls, ts: pd.Series, durations: pd.Series) -> pd.Series:
20        return pd.Series(name="over_3hrs", data=durations > 3 * 3600)
Can have multiple extractors for different features

python

1from fennel.featuresets import feature, featureset, extractor
2from fennel.lib.schema import inputs, outputs
3from fennel.lib.metadata import meta
4
5@meta(owner="[email protected]")
6@featureset
7class Movies:
8    duration: int = feature(id=1)
9    over_2hrs: bool = feature(id=2)
10    # invalid: both e1 & e2 output `over_3hrs`
11    over_3hrs: bool = feature(id=3)
12
13    @extractor(tier=["default"])
14    @inputs(duration)
15    @outputs(over_2hrs, over_3hrs)
16    def e1(cls, ts: pd.Series, durations: pd.Series) -> pd.DataFrame:
17        two_hrs = durations > 2 * 3600
18        three_hrs = durations > 3 * 3600
19        return pd.DataFrame({"over_2hrs": two_hrs, "over_3hrs": three_hrs})
20
21    @extractor(tier=["non-default"])
22    @inputs(duration)
23    @outputs(over_3hrs)
24    def e2(cls, ts: pd.Series, durations: pd.Series) -> pd.Series:
25        return pd.Series(name="over_3hrs", data=durations > 3 * 3600)
Multiple extractors extracting the same feature over_3hrs

python

1from fennel.featuresets import feature, featureset, extractor
2from fennel.lib.schema import inputs, outputs
3
4@featureset
5class Length:
6    limit_secs: int = feature(id=1)
7
8@featureset
9class MoviesForeignFeatureInput:
10    duration: int = feature(id=1)
11    over_limit: bool = feature(id=2)
12
13    @extractor
14    @inputs(Length.limit_secs, duration)
15    @outputs(over_limit)
16    def e(cls, ts: pd.Series, limits: pd.Series, durations: pd.Series):
17        return pd.Series(name="over_limit", data=durations > limits)
Input feature coming from another featureset

python

Invalid - output feature of extractor from another featureset

1from fennel.featuresets import feature, featureset, extractor
2from fennel.lib.schema import inputs, outputs
3
4@featureset
5class Request:
6    too_long: bool = feature(id=1)
7
8@featureset
9class Movies:
10    duration: int = feature(id=1)
11    limit_secs: int = feature(id=2)
12
13    @extractor
14    @inputs(limit_secs, duration)
15    @outputs(Request.too_long)
16    def e(cls, ts: pd.Series, limits: pd.Series, durations: pd.Series):
17        return pd.Series(name="movie_too_long", data=durations > limits)
Extractor outputs feature from another featureset

python

Extractor Resolution

Support you have an extractor A that takes feature f1 as input and outputs f2 and there is another extractor B that takes f2 as input and returns f3 as output. Further, suppose that the value of f1 is available and you're interested in computing the value of f3.

Fennel can automatically deduce that in order to go from f1 to f3, it must first run extractor A to get to f2 and then run B on f2 to get f3.

More generally, Fennel is able is able to do recursive resolution of feature extractors and find a path via extractors to go from a set of input features to a set of output features. This allows you to reuse feature logic and not have every feature depend on root level inputs like uid.

Dataset Lookups

A large fraction of real world ML features are built on top of stored data. However, featuresets don't have any storage of their own and are completely stateless. Instead, they are able to do random lookups on datasets and use that for the feature computation. Let's see an example:

1from fennel.datasets import dataset, field
2from fennel.sources import source, Webhook
3from fennel.featuresets import featureset, extractor, feature
4from fennel.lib.schema import inputs, outputs
5
6webhook = Webhook(name="fennel_webhook")
7
8
9@source(webhook.endpoint("User"))
10@dataset
11class User:
12    uid: int = field(key=True)
13    name: str
14    timestamp: datetime
15
16
17@featureset
18class UserFeatures:
19    uid: int = feature(id=1)
20    name: str = feature(id=2)
21
22    @extractor(depends_on=[User])
23    @inputs(uid)
24    @outputs(name)
25    def func(cls, ts: pd.Series, uids: pd.Series):
26        names, found = User.lookup(ts, uid=uids)
27        names.fillna("Unknown", inplace=True)
28        return names[["name"]]

python

In this example, the extractor is just looking up the value of name given the uid from the User dataset and returning that as feature value. Note a couple of things:

  • Extractor has to explicitly declare that it depends on User dataset. This helps Fennel build an explicit lineage between features and the datasets they depend on.
  • Inside the body of the extractor, function lookup is invoked on the dataset. This function also takes series of timestamps as the first argument - you'd almost always pass the extractor's timestamp list to this function as it is. In addition, all the key fields in the dataset become kwarg to the lookup function.
  • It's not possible to do lookups on dataset without keys.

On This Page

Edit this Page on Github