You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "zjxian (Jira)" <ji...@apache.org> on 2020/06/02 10:39:00 UTC

[jira] [Commented] (TINKERPOP-2376) Probability distribution controlled by weight when using sample step

    [ https://issues.apache.org/jira/browse/TINKERPOP-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123617#comment-17123617 ] 

zjxian commented on TINKERPOP-2376:
-----------------------------------

[^SampleGlobalStep.java] 

 

Here is a modified SampleGlobalStep file. I'm not familiar with the origin algorithm but the modified version seems working.

> Probability distribution controlled by weight when using sample step
> --------------------------------------------------------------------
>
>                 Key: TINKERPOP-2376
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2376
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.4.6
>         Environment: Gremlin-Tinkerpop 3.4.6 on Fedora 32
>            Reporter: zjxian
>            Priority: Critical
>         Attachments: SampleGlobalStep.java, out.csv
>
>
> create a simple graph with 1 central node and 3 surronding nodes
> add 3 edges with equal weight (1) and form a stargraph
> traverse from center ( v[0] ) to other (3) nodes, sample(1) and record the destination node
> do that 10000 times
> estimated probabitlity distribution: 
> v[1]:v[2]:v[3] = 3333:3333:3333 (1:1:1)
> what i got: 
> v[1]:v[2]:v[3] = 3320:4439:2241
> I've checked some source file, like ([https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/SampleGlobalStep.java]).  The probability distribution shoud be like 1/3:4/9:2/9, which is very close to the results I got.
> I think some improvements is needed here to make "random walk" in tinkerpop really useful.
> the script i use:
> {code:java}
> //代码占位符
> conf = new BaseConfiguration()
> conf.setProperty("gremlin.tinkergraph.vertexIdManager","LONG")
> conf.setProperty("gremlin.tinkergraph.edgeIdManager","LONG")
> conf.setProperty("gremlin.tinkergraph.vertexPropertyIdManager","LONG");
> graph = TinkerGraph.open(conf)g=graph.traversal()
> for(i=0;i<=3;i++){    
>   g.addV().iterate()
> }
> for(i=1;i<=3;i++){
>  g.V(0).addE("connect").property("weight",1).to(g.V(i)).iterate()
> }
> ["bash", "-c", "rm -f out.csv"].execute().waitFor()file=new File("out.csv")file.append("id\r\n")
> for(i=0;i<10000;i++){
>  g.V(0).outE().sample(1).by("weight").otherV().map{file.append it.get().id()+"\r\n"}.iterate()
> }
> {code}
> see result in attached out.csv
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)