You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@s2graph.apache.org by "DOYUNG YOON (JIRA)" <ji...@apache.org> on 2018/04/11 23:34:00 UTC

[jira] [Created] (S2GRAPH-206) Generalize machine learning model serving.

DOYUNG YOON created S2GRAPH-206:
-----------------------------------

Summary: Generalize machine learning model serving.
Key: S2GRAPH-206
URL: https://issues.apache.org/jira/browse/S2GRAPH-206
Project: S2Graph
Issue Type: New Feature
Components: s2core
Reporter: DOYUNG YOON

One of the top use cases of OLTP graph database is the recommendation(arguably).

Let's see how item-based collaborative filtering(item-based CF) can be served as graph query.
# fetch user's history as the edges of clicked items.
# fetch each item's similar items.

There are few problems with above naive approach since we need to insert many item pairs as edges(N^2 where N is the total number of items).

Even though bulk load can update a large number of edges in a stable manner, the user needs to generate similarity matrix, which is often very large.

Also above approach does not generalize other model-based approaches.

For example, the user wants to use matrix factorization, need to work on following steps.
# dump user's history in raw records.
# convert user history to the matrix by creating dictionary map between raw value and sequence.
# factorize user history, usually using Alternating least squares (ALS) which yields factorized model U, I.
# run k nearest neighbor per each item on I, which yield an array of item sequence per each item sequence.
# convert item sequence - an array of similar item sequence back to an item - array of the similar item by using dictionary created from 2.
# bulk load item-item similarity as edges.

Note that these steps become tedious.

I think above steps can be changed into following if S2Graph support the more generalized way to support serving machine learning model.

1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be automated.

To automate 4,5,6, we need to provide ways to load ML models from the remote location, and integrate pre-loaded ML model into graph query structure.

So logically, the original query should be changed into following.
# fetch user's history as the edge of clicked items.
# convert clicked items into item sequences.
# run the k-nearest-neighbor search on pre-loaded ML model and get an array of similar item sequence.
# convert an array of similar item sequence into an array of the similar item using pre-loaded ML model's dictionary.

One might argue that supporting machine learning serving is not S2Graph's focus.

The reason behind this suggestion is that I believe providing a unified interface to traverse not only pre-stored data as vertex/edge, but also model generated data on the fly as vertex/edge can be very useful (not only for collaborative filtering use cases).

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)