You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Charlie Harrison <ch...@gmail.com> on 2016/04/21 20:48:08 UTC

Re: TinkerPop 3.2.0-SNAPSHOT and OLAP/OLTP Traversal Compilation

Thanks for this Marko, it sounds like it's going to be excellent.  

In the 3.1.1 incarnation of the two VertexPrograms, is it possible to 
supply a custom traversal for propagating the energies (or cluster votes)?

Cheers,

Charlie


On Friday, 19 February 2016 20:22:36 UTC, Marko A. Rodriguez wrote:
>
> Hello,
>
> This morning, I just merged into TinkerPop 3.2.0-SNAPSHOT a large body of 
> work that allows us to now compile a single traversal into an arbitrary 
> number of OLAP and OLTP jobs. Along with this, I have taken the two 
> algorithmic VertexPrograms we have (PeerPressureVertexProgram and 
> PageRankVertexProgram) and made them available through GraphTraversal. That 
> is, for most users, no longer will you have to do 
> graph.compute().program(PageRankVertexProgram.build()…).submit().get(). 
> Instead, you can simply do:
>
> g.V().pageRank()
>
>
> *It gets better… Let me take you though a Gremlin Console session and show 
> you what you can do now:*
>
> gremlin> graph = TinkerFactory.createModern()
> ==>tinkergraph[vertices:6 edges:6]
> gremlin> g = graph.traversal().withComputer()
> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], 
> tinkergraphcomputer]
>
>
> *First, "withComputer()" is new that allows us to know configure the 
> GraphComputer used for the TraversalSource to a finer degree. Thats another 
> discussion, but just know thats better now too!*
>
> gremlin> g.V().pageRank()
> ==>v[6]
> ==>v[1]
> ==>v[3]
> ==>v[2]
> ==>v[4]
> ==>v[5]
>
>
> *pageRank() (and all other VertexProgramSteps) are sideEffect steps. The 
> results of the computation are on the vertices themselves.*
>
> gremlin> g.V().pageRank().valueMap()
> ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], 
> name:[vadas], gremlin.pageRankVertexProgram.edgeCount:[0.0], age:[27]]
> ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], 
> name:[marko], gremlin.pageRankVertexProgram.edgeCount:[3.0], age:[29]]
> ==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop], 
> gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
> ==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], 
> name:[josh], gremlin.pageRankVertexProgram.edgeCount:[2.0], age:[32]]
> ==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], 
> name:[peter], gremlin.pageRankVertexProgram.edgeCount:[1.0], age:[35]]
> ==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003], 
> name:[ripple], gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
>
>
> *Another ticket for 3.2.0 will be to have the concept of "transient 
> compute keys" so the edgeCount scratch data will be dropped automagically. 
> But still, perhaps you want to use your own custom property.*
>
> gremlin> g.V().pageRank().by('rank').valueMap()
> ==>[name:[vadas], rank:[0.19250000000000003], 
> gremlin.pageRankVertexProgram.edgeCount:[0.0], age:[27]]
> ==>[name:[marko], rank:[0.15000000000000002], 
> gremlin.pageRankVertexProgram.edgeCount:[3.0], age:[29]]
> ==>[name:[lop], rank:[0.4018125], 
> gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
> ==>[name:[josh], rank:[0.19250000000000003], 
> gremlin.pageRankVertexProgram.edgeCount:[2.0], age:[32]]
> ==>[name:[peter], rank:[0.15000000000000002], 
> gremlin.pageRankVertexProgram.edgeCount:[1.0], age:[35]]
> ==>[name:[ripple], rank:[0.23181250000000003], 
> gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
>
>
> *How about your own custom edge traversal for propagating the energy.*
>
> gremlin> g.V().pageRank().
>                 by(outE('knows')).
>                 by('friendRank').
>            valueMap('name','friendRank')
> ==>[friendRank:[0.15000000000000002], name:[marko]]
> ==>[friendRank:[0.21375000000000002], name:[vadas]]
> ==>[friendRank:[0.15000000000000002], name:[lop]]
> ==>[friendRank:[0.21375000000000002], name:[josh]]
> ==>[friendRank:[0.15000000000000002], name:[peter]]
> ==>[friendRank:[0.15000000000000002], name:[ripple]]
>
>
> *What about a more complex job?*
>
> gremlin> g.V().pageRank().
>                 by(outE('knows')).
>                 by('friendRank').
>            order().by('friendRank',decr).
>            valueMap('name','friendRank')
> ==>[friendRank:[0.21375000000000002], name:[vadas]]
> ==>[friendRank:[0.21375000000000002], name:[josh]]
> ==>[friendRank:[0.15000000000000002], name:[ripple]]
> ==>[friendRank:[0.15000000000000002], name:[marko]]
> ==>[friendRank:[0.15000000000000002], name:[lop]]
> ==>[friendRank:[0.15000000000000002], name:[peter]]
>
>
> *What about clustering?*
>
> gremlin> g.V().peerPressure().by('cluster').
>            valueMap('name','cluster')
> ==>[cluster:[1], name:[vadas]]
> ==>[cluster:[1], name:[marko]]
> ==>[cluster:[1], name:[lop]]
> ==>[cluster:[1], name:[ripple]]
> ==>[cluster:[6], name:[peter]]
> ==>[cluster:[1], name:[josh]]
>
>
> *What about clustering and ranking?*
>
> gremlin> g.V().peerPressure().by('cluster').
>                pageRank().by('rank').
>            group().by('cluster').by(values('rank').sum())
> ==>[1:1.1686250000000002, 6:0.15000000000000002]
>
>
> *What about OLAP/OLTP?*
>
> gremlin> g.V().peerPressure().by('cluster').
>                pageRank().by('rank').
>            group().by('cluster').by().
>              order(local).by(values,{a,b -> a.size() <=> b.size()}).
>                limit(local,1).select(values).unfold().unfold().
>            valueMap('name','rank','cluster')
> ==>[cluster:[6], name:[peter], rank:[0.15000000000000002]]
>
>
> *That compilation goes OLAP->OLAP->OLAP->OLTP.*
>
> gremlin> 
> g.V().peerPressure().by('cluster').pageRank().by('rank').group().by('cluster').by().order(local).by(values,{a,b 
> -> a.size() <=> 
> b.size()}).limit(local,1).select(values).unfold().unfold().valueMap('name','rank','cluster').explain()
> ==>Traversal Explanation
>
> ====================================================================================================================================================================================================================================================================================================================================================================================================================================================
> Original Traversal                 [GraphStep([],vertex), 
> PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep]), 
> OrderLocalStep(lambda), RangeLocalStep(0,1), LambdaMapStep(values), 
> UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, cluster],value)]
>
> ConnectiveStrategy           [D]   [GraphStep([],vertex), 
> PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep]), 
> OrderLocalStep(lambda), RangeLocalStep(0,1), LambdaMapStep(values), 
> UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, cluster],value)]
> VertexProgramStrategy        [D]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> IdentityRemovalStrategy      [O]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> IncidentToAdjacentStrategy   [O]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> AdjacentToIncidentStrategy   [O]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> FilterRankingStrategy        [O]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> MatchPredicateStrategy       [O]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> RangeByIsCountStrategy       [O]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> TinkerGraphStepStrategy      [P]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> ProfileStrategy              [F]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> StandardVerificationStrategy [V]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
> ComputerVerificationStrategy [V]   
> [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
>
> Final Traversal                   
>  [PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30), 
> PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30), 
> TraversalVertexProgramStep([GraphStep([],vertex), 
> GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]), 
> ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1), 
> LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank, 
> cluster],value)]
>
>
> There are a few more goodies to show like biasing the initial pageRank 
> energy for doing spreading activation-based algorithms, but we'll save that 
> for later….
>
> *Finally, because this all compiles to the Gremlin traversal machine, this 
> works for any TinkerPop-enabled graph system.*
>
> Enjoy,
> Marko.
>
> http://markorodriguez.com
>
>