You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by "DOYUNG YOON (JIRA)" <ji...@apache.org> on 2018/04/11 23:34:00 UTC

[jira] [Assigned] (S2GRAPH-206) Generalize machine learning model serving.

     [ https://issues.apache.org/jira/browse/S2GRAPH-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

DOYUNG YOON reassigned S2GRAPH-206:
-----------------------------------

    Assignee: DOYUNG YOON

> Generalize machine learning model serving.
> ------------------------------------------
>
>                 Key: S2GRAPH-206
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-206
>             Project: S2Graph
>          Issue Type: New Feature
>          Components: s2core
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>            Priority: Major
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> One of the top use cases of OLTP graph database is the recommendation(arguably).
> Let's see how item-based collaborative filtering(item-based CF) can be served as graph query.
>  # fetch user's history as the edges of clicked items.
>  # fetch each item's similar items.
> There are few problems with above naive approach since we need to insert many item pairs as edges(N^2 where N is the total number of items).
> Even though bulk load can update a large number of edges in a stable manner, the user needs to generate similarity matrix, which is often very large.
> Also above approach does not generalize other model-based approaches.
> For example, the user wants to use matrix factorization, need to work on following steps.
>  # dump user's history in raw records.
>  # convert user history to the matrix by creating dictionary map between raw value and sequence.
>  # factorize user history, usually using Alternating least squares (ALS) which yields factorized model U, I.
>  # run k nearest neighbor per each item on I, which yield an array of item sequence per each item sequence.
>  # convert item sequence - an array of similar item sequence back to an item - array of the similar item by using dictionary created from 2.
>  # bulk load item-item similarity as edges.
> Note that these steps become tedious.
> I think above steps can be changed into following if S2Graph support the more generalized way to support serving machine learning model.
> 1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be automated.
> To automate 4,5,6, we need to provide ways to load ML models from the remote location, and integrate pre-loaded ML model into graph query structure.
> So logically, the original query should be changed into following.
>  # fetch user's history as the edge of clicked items.
>  # convert clicked items into item sequences.
>  # run the k-nearest-neighbor search on pre-loaded ML model and get an array of similar item sequence.
>  # convert an array of similar item sequence into an array of the similar item using pre-loaded ML model's dictionary.
>  
> One might argue that supporting machine learning serving is not S2Graph's focus.
> The reason behind this suggestion is that I believe providing a unified interface to traverse not only pre-stored data as vertex/edge, but also model generated data on the fly as vertex/edge can be very useful (not only for collaborative filtering use cases).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)