You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/17 10:14:33 UTC

[jira] [Commented] (S2GRAPH-60) Add divide operation to scorePropagateOp

    [ https://issues.apache.org/jira/browse/S2GRAPH-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199186#comment-15199186 ] 

ASF GitHub Bot commented on S2GRAPH-60:
---------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-s2graph/pull/43


> Add divide operation to scorePropagateOp
> ----------------------------------------
>
>                 Key: S2GRAPH-60
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-60
>             Project: S2Graph
>          Issue Type: New Feature
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>            Priority: Trivial
>              Labels: newbie, query, score
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Ratio value in their service is common use cases of  service analysis. Known methods to calculate ratio is that divide values between counting data or aggregating values. Already, S2Graph query supports counting or aggregating values within S2Graph storage. With S2Graph's function, you can calculate ratio just dividing values. That is an easy way to calculate the ratio. However, it can be a more simple way to calculate the ratio. It is that calculation occurred in S2Graph web application with just one RPC, one graph query call.
> This is a suggestion of the ratio calculation query. 
> If we suppose to have two labels(impression feedbacks label and click feedbacks label), we can get a number of impressions and a number of clicks by a user. Using two value, we can calculate CTR(Click Through Rate) with below two count query.
> Impression query
> {noformat}
> {
>   "srcVertices": [{
>     "serviceName": "some_service",
>     "columnName": "user_id",
>     "id": "user_a"
>   }],
>   "steps": [{
>     "step": [{
>       "label": "impression_feedback_label",
>       "direction": "out",
>       "offset": 0,
>       "limit": 100
>     }]
>   }]
> }
> {noformat}
> Click query
> {noformat}
> {
>   "srcVertices": [{
>     "serviceName": "some_service",
>     "columnName": "user_id",
>     "id": "user_a"
>   }],
>   "steps": [{
>     "step": [{
>       "label": "click_feedback_label",
>       "direction": "out",
>       "offset": 0,
>       "limit": 100
>     }]
>   }]
> }
> {noformat}
> After fetching each result with upper queries, we can get a CTR.
> However, we can make a one query with `divide` operation to `scorePropagageOp`.
> {noformat}
> {
>   "limit" : 10,
>   "groupBy" : [ "from" ],
>   "duplicate" : "sum",
>   "srcVertices" : [ {
>     "serviceName" : "some_service",
>     "columnName" : "user_id",
>     "id" : "user_a"
>   } ],
>   "steps" : [ {
>     "step" : [ {
>       "label" : "impression_feedback_label",
>       "direction" : "out",
>       "offset" : 0,
>       "limit" : 10,
>       "groupBy" : [ "from" ],
>       "duplicate" : "countSum",
>       "transform" : [ [ "_from" ] ]
>     } ]
>   }, {
>     "step" : [ {
>       "label": "click_feedback_label",
>       "direction" : "out",
>       "offset" : 0,
>       "limit" : 10,
>       "scorePropagateOp" : "divide",
>       "scorePropagateShrinkage" : 500
>     } ]
>   } ]
> }
> {noformat}
> There is another query param option key, `scorePropagateShrinkage`. It is used to try normalizing results. We use just ratio value to sort the results. However, ratio value can be non-deterministic. Ratio 1.0 by 1/1 is larger than 0.9 by 9/10. For this reason, we can add `scorePropagateShrinkage` score value which is sufficiently big to the denominator. Now we can re-calculate by 1 / (1 + 500) =0.00199600798403 and 9 / (1 + 500) = 0.01796407185629, then the latter is larger value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)