You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/26 05:32:00 UTC

[jira] [Commented] (S2GRAPH-206) Generalize machine learning model serving.

    [ https://issues.apache.org/jira/browse/S2GRAPH-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453523#comment-16453523 ] 

ASF GitHub Bot commented on S2GRAPH-206:
----------------------------------------

GitHub user SteamShon opened a pull request:

    https://github.com/apache/incubator-s2graph/pull/162

    [S2GRAPH-206]: Generalize machine learning model serving.

    - abstract traversing edges as Fetcher interface.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/SteamShon/incubator-s2graph S2GRAPH-206

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-s2graph/pull/162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #162
    
----
commit 72c35a39e9f739d6df941d86db546811c9cb8a2a
Author: DO YUNG YOON <st...@...>
Date:   2018-04-26T05:26:06Z

    - abstract traversing edges as Fetcher interface.

----


> Generalize machine learning model serving.
> ------------------------------------------
>
>                 Key: S2GRAPH-206
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-206
>             Project: S2Graph
>          Issue Type: New Feature
>          Components: s2core
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>            Priority: Major
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> One of the top use cases of OLTP graph database is the recommendation(arguably).
> Let's see how item-based collaborative filtering(item-based CF) can be served as graph query.
>  # fetch user's history as the edges of clicked items.
>  # fetch each item's similar items.
> There are few problems with above naive approach since we need to insert many item pairs as edges(N^2 where N is the total number of items).
> Even though bulk load can update a large number of edges in a stable manner, the user needs to generate similarity matrix, which is often very large.
> Also above approach does not generalize other model-based approaches.
> For example, the user wants to use matrix factorization, need to work on following steps.
>  # dump user's history in raw records.
>  # convert user history to the matrix by creating dictionary map between raw value and sequence.
>  # factorize user history, usually using Alternating least squares (ALS) which yields factorized model U, I.
>  # run k nearest neighbor per each item on I, which yield an array of item sequence per each item sequence.
>  # convert item sequence an array of similar item sequence back to an item array of the similar item by using dictionary created from 2.
>  # bulk load item-item similarity as edges.
> Note that these steps become tedious.
> I think above steps can be changed into following if S2Graph support the more generalized way to support serving machine learning model.
> 1,2,3 is inevitably done by who focus build better models, but 4,5,6 can be automated.
> To automate 4,5,6, we need to provide ways to load ML models from the remote location and integrate pre-loaded ML model into graph query structure.
> So logically, the original query should be changed into following.
>  # fetch user's history as the edge of clicked items.
>  # convert clicked items into item sequences.
>  # run the k-nearest-neighbor search on pre-loaded ML model and get an array of similar item sequence.
>  # convert an array of similar item sequence into an array of the similar item using pre-loaded ML model's dictionary.
>  
> One might argue that supporting machine learning serving is not S2Graph's focus.
> The reason behind this suggestion is that I believe providing a unified interface to traverse not only pre-stored data as vertex/edge, but also model generated data on the fly as vertex/edge can be very useful (not only for collaborative filtering use cases).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)