A proof of concept recommendation engine that produces a list of top-k similar items (given an item) on demand.
Following design decisions influenced the implementation of this engine.
- A one-time batch job is ran to generate an inverted index of attributes to items[1]. A streaming job updates that inverted index whenever a new item is added to the main database[2]. The inverted index helps with reducing the time complexity of each recommendation request from
O(n)toO(m)(where,nis total items, andmis total attributes x matching items[3]). - A probabilistic data structure (namely, Stream Summary[4]) is used to aggregate the top-k items. This helps with reducing the space complexity of each recommendation request from
O(n)toO(1)(where,nis total items), while trading off 100% accuracy.
- Node.js >= 6.10.0
- Yarn >= 0.21.3
Run make bootstrap.
- Run
node ./bin index <absolute path of JSON data file>to generate inverted index. - Run
node ./bin topk <sku code> -f <absolute path of JSON data file>to get top 10 similar items.
- Run
node ./bin help. - See any error displayed on the console.
- Check
./logs/recken-error.logfile.
- ^ Included as a command in the engine.
- ^ Not included in the engine, should be a part of the platform.
- ^ Since the structure of inverted index is
{ attributes -> attribute-values -> items }. - ^ Efficient Computation of Frequent and Top-k Elements in Data Streams