You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Dan LaRocque <da...@hopcount.org> on 2015/05/02 04:03:18 UTC

Re: [TinkerPop] Vendors can easily leverage Hadoop-Gremlin for OLAP processing their graph system

On Fri, May 1, 2015, at 18:08, Marko Rodriguez wrote:
> Hi,
> 
> I was working with Dan LaRocque (Titan) today to see about getting SparkGraphComputer working over Titan. We realized a few "doh!" aspects of the current implementation of Spark/GiraphGraphComputer, but have since rectified them in SNAPSHOT. Moreover, our updates make it extremely easy for any TP3 graph vendor to leverage Hadoop-Gremlin for their OLAP requirements. It is as simple as providing a vendor specific InputFormat and OutputFormat, where a vendor-specific OutputFormat is only needed if the vendor wishes to allow OLAP operations to update the original graph with computed results. Really, its that trivial...
> 
> I added a new section to the Hadoop-Gremlin docs that explain how easy it is.
> 
> http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#_hadoop_gremlin_for_vendors
> 
> Enjoy!,
> Marko.

Hi,

To expand on Marko's point...

This opened up simplified avenues for some worthwhile use cases.  For
instance, with a Titan-specific Hadoop IF & OF, we can run PageRank in a
standard computer, such as Spark, then write the computed ranks as
vertex properties back to Titan.  In other words, we can go from this
Titan-hosted graph

==>[name:[tartarus]]
==>[name:[hydra]]
==>[name:[jupiter]]
==>[name:[cerberus]]
==>[name:[neptune]]
==>[name:[pluto]]
==>[name:[nemean]]
==>[name:[alcmene]]
==>[name:[hercules]]
==>[name:[sky]]
==>[name:[sea]]
==>[name:[saturn]]

to this one

==>[gremlin.pageRankVertexProgram.pageRank:[0.41599886933191116],
name:[tartarus]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.17550000000000002],
name:[hydra]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.31819816247447563],
name:[jupiter]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.23864803741939838],
name:[cerberus]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.2807651470037452],
name:[neptune]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.29716723463942407],
name:[pluto]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.17550000000000002],
name:[nemean]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.17550000000000002],
name:[alcmene]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002],
name:[hercules]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.21761710958434682],
name:[sky]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.2295501250550773],
name:[sea]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.21761710958434682],
name:[saturn]]

using just Hadoop I/O formats; no additional supersteps or TP3
mapreduces needed.

thanks,
Dan