Architecture

Storage Model

With the basics of Fennel Architecture and Fennel's Core Engine clarified, we can now talk about the storage model of Fennel:

All Fennel datasets correspond to an internal partitioned Kafka topic. The retention of this topic can be configured by user (though is usually long).
Fennel creates several other internal Kafka topics - though these are usually for short durations.
Data for Kafka topics is tiered to S3. This way, majority of Kafka data lives in S3 at any point and only a small subset of it lives on local SSDs of Kafka brokers.
Pipelines get mapped to partitioned jobs which may have job specific local state. This state lives in a RocksDB instance running on local SSD of the engines with snapshots living in S3.
In addition to the RocksDB and S3, metadata about this state also lives in a special internal Kafka topic (called replaylog), which again, is tiered to S3.
Indices are also mapped to partitioned jobs which also have their own local state (on RocksDB, snapshotted on S3 and metadata in Kafka) - the only difference is that their local state is exposed for querying behind gRPC endpoints.
All metadata (e.g. graph of all datasets, pipelines, indices, features etc.) as well as job registry lives in a centrally shared Postgres.