You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by "DOYUNG YOON (JIRA)" <ji...@apache.org> on 2016/11/16 12:08:58 UTC

[jira] [Commented] (S2GRAPH-123) Support different index on out/in direction.

    [ https://issues.apache.org/jira/browse/S2GRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670281#comment-15670281 ] 

DOYUNG YOON commented on S2GRAPH-123:
-------------------------------------

Here is my first attempt to implement this. 
Note that now user can provide advanced options to control what to store actually per ({{LabelIndex}}, {{Direction}}) pair.

{noformat}
{
  "label": "movie_user_rate",
  "srcServiceName": "movie",
  "srcColumnName": "user_id",
  "srcColumnType": "string",
  "tgtServiceName": "movie",
  "tgtColumnName": "movie_id",
  "tgtColumnType": "long",
  "indices": [
    {
      "name": "_PK",
      "propNames": [
        "_timestamp"
      ], 
      "direction": "out" // [both/in/out, default both], 
      "options": {
        "method": "hash_sample" // [drop, sample, hash_sample],
        "totalModular": 100, 
        "rate": 0.1, 
        "degree": true
      }
    }
  ],
  "props": [
    {
      "name": "rating",
      "defaultValue": 0,
      "dataType": "integer"
    }
  ],
  "serviceName": "movie",
  "consistencyLevel": "strong",
  "hTableName": "s2graph-alpha",
  "isDirected": "true", 
  "options": {

  }
}

{noformat}

> Support different index on out/in direction.
> --------------------------------------------
>
>                 Key: S2GRAPH-123
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-123
>             Project: S2Graph
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>             Fix For: 0.2.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some situation, user might want to set different behavior based on `direction` of edge.
> Based on my experience on deploying and operating S2Graph with user's news article click activity, It is extremely common that few of article get most of clicks. 
> More formal way to describe problem, let's say we have `user_article_click` label and each edge consist of `user_id` and `article_id` as source/target vertex.
> In this case, 'out' direction edge spread out evenly because we are prepending murmur hash at the beginning of row key. we have very few edges per each source vertex(`user_id`) since each individual can't click million articles.
> However 'in' direction, which hold all edges connecting all `user_id` for each `article_id` have different scenario. only few `article_id` get lots of click from million users and this quickly become the `super node`. This yield excessive region server resource usage and It is not reasonable million edges on one single source vertex anyway because it would be timeout to send million edges to client.
> Currently, there is no way to control how to process edge per each direction, but above case can be avoided if we can provide options.
> I suggest new feature to provide separate index with write options for each `direction`.
> Possible write options can be followings(based on our write transaction steps).
> # `IndexEdge`: dropAll/sampling/storeAll(default)
> # `SnapshotEdge`: drop/store(default)
> # `Degree`: ignore/update(default)
> By enabling/disabling each element in write transaction, users can decide what to do when they know how their data will be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)