You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by dk...@apache.org on 2015/06/04 12:15:56 UTC

[42/43] incubator-tinkerpop git commit: removed static code samples and added dynamic Spark sample

removed static code samples and added dynamic Spark sample


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/44b49491
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/44b49491
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/44b49491

Branch: refs/heads/preprocessor
Commit: 44b494914f6cebe8e5c774d89833c6ddbd9b6f8a
Parents: a08b06f
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Thu Jun 4 00:27:15 2015 +0200
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Thu Jun 4 00:27:15 2015 +0200

----------------------------------------------------------------------
 docs/src/implementations.asciidoc | 117 +++++----------------------------
 1 file changed, 15 insertions(+), 102 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/44b49491/docs/src/implementations.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/implementations.asciidoc b/docs/src/implementations.asciidoc
index b25cf82..138a981 100644
--- a/docs/src/implementations.asciidoc
+++ b/docs/src/implementations.asciidoc
@@ -574,21 +574,6 @@ The Gremlin-Console session below demonstrates Neo4j indices. For more informati
 * Manipulating indices with link:http://docs.neo4j.org/chunked/stable/query-schema-index.html[Cypher].
 * Manipulating indices with the Neo4j link:http://docs.neo4j.org/chunked/stable/tutorials-java-embedded-new-index.html[Java API].
 
-[source,groovy]
-gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
-==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
-gremlin> graph.cypher('CREATE INDEX ON :person(name)')
-gremlin> graph.tx().commit() // schema mutations must happen in a different tx than graph mutations
-==>null
-gremlin> graph.addVertex(label,'person','name','marko')
-==>v[0]
-gremlin> graph.addVertex(label,'dog','name','puppy')
-==>v[1]
-gremlin> g = graph.traversal(standard())
-==>graphtraversalsource[neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]], standard]
-gremlin> g.V().hasLabel('person').has('name','marko').values('name')
-==>marko
-
 [gremlin-groovy]
 ----
 graph = Neo4jGraph.open('/tmp/neo4j')
@@ -605,30 +590,6 @@ graph.close()
 
 Below demonstrates the runtime benefits of indices and demonstrates how if there is no defined index (only vertex labels), a linear scan of the vertex-label partition is still faster than a linear scan of all vertices.
 
-[source,groovy]
-gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
-==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
-gremlin> g = graph.traversal(standard())
-==>graphtraversalsource[neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]], standard]
-gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')
-==>null
-gremlin> graph.tx().commit()
-==>null
-gremlin> graph.cypher('CREATE INDEX ON :artist(name)') <1>
-gremlin> graph.tx().commit()
-==>null
-gremlin> clock(1000){g.V().hasLabel('artist').has('name','Garcia').iterate()}  <2>
-==>0.038828967
-gremlin> clock(1000){g.V().has('name','Garcia').iterate()} <3>
-==>0.6623919649999999
-gremlin> graph.cypher('DROP INDEX ON :artist(name)') <4>
-gremlin> g.tx().commit()
-==>null
-gremlin> clock(1000){g.V().hasLabel('artist').has('name','Garcia').iterate()} <5>
-==>0.29597517599999995
-gremlin> clock(1000){g.V().has('name','Garcia').iterate()} <6>
-==>0.6685323479999999
-
 [gremlin-groovy]
 ----
 graph = Neo4jGraph.open('/tmp/neo4j')
@@ -661,16 +622,14 @@ image::gremlin-loves-cypher.png[width=400]
 
 NeoTechnology are the creators of the graph pattern-match query language link:http://www.neo4j.org/learn/cypher[Cypher]. It is possible to leverage Cypher from within Gremlin by using the `Neo4jGraph.cypher()` graph traversal method.
 
-[source,groovy]
-gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
-==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
-gremlin> graph.io(gryo()).readGraph('data/tinkerpop-modern.kryo')
-==>null
-gremlin> graph.cypher('MATCH (a {name:"marko"}) RETURN a')
-==>[a:v[0]]
-gremlin> graph.cypher('MATCH (a {name:"marko"}) RETURN a').select('a').out('knows').values('name')
-==>vadas
-==>josh
+[gremlin-groovy]
+----
+graph = Neo4jGraph.open('/tmp/neo4j')
+graph.io(gryo()).readGraph('data/tinkerpop-modern.kryo')
+graph.cypher('MATCH (a {name:"marko"}) RETURN a')
+graph.cypher('MATCH (a {name:"marko"}) RETURN a').select('a').out('knows').values('name')
+graph.close()
+----
 
 Thus, like <<match-step,`match()`>>-step in Gremlin, it is possible to do a declarative pattern match and then move back into imperative Gremlin.
 
@@ -817,26 +776,6 @@ image:hadoop-pipes.png[width=180,float=left] It is possible to execute OLTP oper
 
 CAUTION: OLTP operations on `HadoopGraph` are not efficient. They require linear scans to execute and are unreasonable for large graphs. In such large graph situations, make use of <<traversalvertexprogram,TraversalVertexProgram>> which is the OLAP implementation of the Gremlin language. Hadoop-Gremlin provides various `GraphComputer` implementations to execute OLAP computations over a `HadoopGraph`.
 
-[source,text]
-gremlin> hdfs.copyFromLocal('data/tinkerpop-modern-vertices.kryo', 'tinkerpop-modern-vertices.kryo')
-==>null
-gremlin> hdfs.ls()
-==>rw-r--r-- marko supergroup 1439 tinkerpop-modern-vertices.kryo
-gremlin> graph = GraphFactory.open('../../../hadoop-gremlin/conf/hadoop-gryo.properties')
-==>hadoopgraph[gryoinputformat->gryooutputformat]
-gremlin> g = graph.traversal(standard())
-==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], standard]
-gremlin> g.V().count()
-==>6
-gremlin> g.V().out().out().values('name')
-==>ripple
-==>lop
-gremlin> g.V().group().by{it.value('name')[1]}.by('name').next()
-==>a={marko=1, vadas=1}
-==>e={peter=1}
-==>i={ripple=1}
-==>o={lop=1, josh=1}
-
 [gremlin-groovy]
 ----
 hdfs.copyFromLocal('data/tinkerpop-modern.kryo', 'tinkerpop-modern.kryo')
@@ -878,31 +817,6 @@ WARNING: Giraph uses a large number of Hadoop counters. The default for Hadoop i
 
 WARNING: The maximum number of workers can be no larger than the number of map-slots in the Hadoop cluster minus 1. For example, if the Hadoop cluster has 4 map slots, then `giraph.maxWorkers` can not be larger than 3. One map-slot is reserved for the master compute node and all other slots can be allocated as workers to execute the VertexPrograms on the vertices of the graph.
 
-[source,text]
-gremlin> g = graph.traversal(computer()) // GiraphGraphComputer is the default graph computer when no class is specified
-==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], giraphgraphcomputer]
-gremlin> g.V().count()
-INFO  org.apache.tinkerpop.gremlin.hadoop.process.computer.giraph.GiraphGraphComputer  - HadoopGremlin(Giraph): TraversalVertexProgram[GraphStep(vertex), CountGlobalStep, ComputerResultStep]
-INFO  org.apache.hadoop.mapred.JobClient  - Running job: job_201407281259_0037
-INFO  org.apache.hadoop.mapred.JobClient  -  map 0% reduce 0%
-...
-INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph  - HadoopGremlin: CountGlobalMapReduce
-INFO  org.apache.hadoop.mapred.JobClient  - Running job: job_201407281259_0038
-INFO  org.apache.hadoop.mapred.JobClient  -  map 0% reduce 0%
-...
-==>6
-gremlin> g.V().out().out().values('name')
-INFO  org.apache.tinkerpop.gremlin.hadoop.process.computer.giraph.GiraphGraphComputer  - HadoopGremlin(Giraph): TraversalVertexProgram[GraphStep(vertex), VertexStep(OUT,vertex), VertexStep(OUT,vertex), PropertiesStep([name],value), ComputerResultStep]
-INFO  org.apache.hadoop.mapred.JobClient  - Running job: job_201407281259_0031
-INFO  org.apache.hadoop.mapred.JobClient  -  map 0% reduce 0%
-...
-INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph  - HadoopGremlin: TraverserMapReduce
-INFO  org.apache.hadoop.mapred.JobClient  - Running job: job_201407281259_0032
-INFO  org.apache.hadoop.mapred.JobClient  -  map 0% reduce 0%
-...
-==>ripple
-==>lop
-
 [gremlin-groovy]
 ----
 graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
@@ -947,14 +861,13 @@ SparkGraphComputer
 
 image:spark-logo.png[width=175,float=left] link:http://spark.apache.org[Spark] is an Apache Software Foundation project focused on general-purpose OLAP data processing. Spark provides a hybrid in-memory/disk-based distributed computing model that is similar to Hadoop's MapReduce model. Spark maintains a fluent function chaining DSL that is arguably easier for developers to work with than native Hadoop MapReduce. While Spark has a shorter startup time between "jobs" (a scatter/gather-step), the actual message passing algorithm (as designed by TinkerPop) is less efficient than that of Giraph. For small graphs, Spark will typically be much faster than Giraph, but as the graph becomes larger, the Hadoop MapReduce startup time incurred by Giraph will amortize as more time is spent passing messages (i.e. traversers) between the vertices of the graph.
 
-[source,text]
-gremlin> g = graph.traversal(computer(SparkGraphComputer))
-==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]
-gremlin> g.V().count()
-==>6
-gremlin> g.V().out().out().values('name')
-==>lop
-==>ripple
+[gremlin-groovy]
+----
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+g = graph.traversal(computer(SparkGraphComputer))
+g.V().count()
+g.V().out().out().values('name')
+----
 
 For using lambdas in Gremlin-Groovy, simply provide `:remote connect` a `TraversalSource` which leverages SparkGraphComputer.