You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by "DOYUNG YOON (JIRA)" <ji...@apache.org> on 2016/03/09 02:33:40 UTC

[jira] [Created] (S2GRAPH-60) Add “divide” operation to “scorePropagateOp"

DOYUNG YOON created S2GRAPH-60:
----------------------------------

             Summary: Add “divide” operation to “scorePropagateOp"
                 Key: S2GRAPH-60
                 URL: https://issues.apache.org/jira/browse/S2GRAPH-60
             Project: S2Graph
          Issue Type: New Feature
            Reporter: DOYUNG YOON
            Assignee: Junki Kim
            Priority: Trivial


Ratio value in their service is common use cases of  service analysis. Known methods to calculate ratio is that divide values between counting data or aggregating values. Already, S2Graph query supports counting or aggregating values within S2Graph storage. With S2Graph's function, you can calculate ratio just dividing values. That is an easy way to calculate the ratio. However, it can be a more simple way to calculate the ratio. It is that calculation occurred in S2Graph web application with just one RPC, one graph query call.
This is a suggestion of the ratio calculation query. 
If we suppose to have two labels(impression feedbacks label and click feedbacks label), we can get a number of impressions and a number of clicks by a user. Using two value, we can calculate CTR(Click Through Rate) with below two count query.

Impression query

{noformat}
{
  "srcVertices": [{
    "serviceName": "some_service",
    "columnName": "user_id",
    "id": "user_a"
  }],
  "steps": [{
    "step": [{
      "label": "impression_feedback_label",
      "direction": "out",
      "offset": 0,
      "limit": 100
    }]
  }]
}
{noformat}

Click query

{noformat}
{
  "srcVertices": [{
    "serviceName": "some_service",
    "columnName": "user_id",
    "id": "user_a"
  }],
  "steps": [{
    "step": [{
      "label": "click_feedback_label",
      "direction": "out",
      "offset": 0,
      "limit": 100
    }]
  }]
}
{noformat}

After fetching each result with upper queries, we can get a CTR.

However, we can make a one query with `divide` operation to `scorePropagageOp`.

{noformat}
{
  "limit" : 10,
  "groupBy" : [ "from" ],
  "duplicate" : "sum",
  "srcVertices" : [ {
    "serviceName" : "some_service",
    "columnName" : "user_id",
    "id" : "user_a"
  } ],
  "steps" : [ {
    "step" : [ {
      "label" : "impression_feedback_label",
      "direction" : "out",
      "offset" : 0,
      "limit" : 10,
      "groupBy" : [ "from" ],
      "duplicate" : "countSum",
      "transform" : [ [ "_from" ] ]
    } ]
  }, {
    "step" : [ {
      "label": "click_feedback_label",
      "direction" : "out",
      "offset" : 0,
      "limit" : 10,
      "scorePropagateOp" : "divide",
      "scorePropagateShrinkage" : 500
    } ]
  } ]
}
{noformat}

There is another query param option key, `scorePropagateShrinkage`. It is used to try normalizing results. We use just ratio value to sort the results. However, ratio value can be non-deterministic. Ratio 1.0 by 1/1 is larger than 0.9 by 9/10. For this reason, we can add `scorePropagateShrinkage` score value which is sufficiently big to the denominator. Now we can re-calculate by 1 / (1 + 500) =0.00199600798403 and 9 / (1 + 500) = 0.01796407185629, then the latter is larger value.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)