You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by ok...@apache.org on 2017/11/06 16:26:44 UTC
[06/14] tinkerpop git commit: Moved gremlin.spark.persistContext back
to recipe
Moved gremlin.spark.persistContext back to recipe
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/db859fb5
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/db859fb5
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/db859fb5
Branch: refs/heads/TINKERPOP-1802
Commit: db859fb51cc0c37e28747f68b50a11dfb3799413
Parents: b0b087e
Author: HadoopMarc <vt...@xs4all.nl>
Authored: Tue Oct 3 21:20:16 2017 +0200
Committer: HadoopMarc <vt...@xs4all.nl>
Committed: Thu Oct 19 16:11:57 2017 +0200
----------------------------------------------------------------------
docs/src/recipes/olap-spark-yarn.asciidoc | 6 ++++++
hadoop-gremlin/conf/hadoop-gryo.properties | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/db859fb5/docs/src/recipes/olap-spark-yarn.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/olap-spark-yarn.asciidoc b/docs/src/recipes/olap-spark-yarn.asciidoc
index 464470f..f55edaa 100644
--- a/docs/src/recipes/olap-spark-yarn.asciidoc
+++ b/docs/src/recipes/olap-spark-yarn.asciidoc
@@ -104,6 +104,7 @@ conf.setProperty('spark.yarn.appMasterEnv.CLASSPATH', "./$archive/*:$hadoopConfD
conf.setProperty('spark.executor.extraClassPath', "./$archive/*:$hadoopConfDir")
conf.setProperty('spark.driver.extraLibraryPath', "$hadoop/lib/native:$hadoop/lib/native/Linux-amd64-64")
conf.setProperty('spark.executor.extraLibraryPath', "$hadoop/lib/native:$hadoop/lib/native/Linux-amd64-64")
+conf.setProperty('gremlin.spark.persistContext', 'true')
graph = GraphFactory.open(conf)
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().group().by(values('name')).by(both().count())
@@ -125,9 +126,14 @@ as the directory named `spark-gremlin.zip` in the Yarn containers. The `spark.ex
`spark.yarn.appMasterEnv.CLASSPATH` properties point to the files inside this archive.
This is why they contain the `./spark-gremlin.zip/*` item. Just because a Spark executor got the archive with
jars loaded into its container, does not mean it knows how to access them.
+
Also the `HADOOP_GREMLIN_LIBS` mechanism is not used because it can not work for Spark on Yarn as implemented (jars
added to the `SparkContext` are not available to the Yarn application master).
+The `gremlin.spark.persistContext` property is explained in the reference documentation of
+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]: it helps in getting
+follow-up OLAP queries answered faster, because you skip the overhead for getting resources from Yarn.
+
Additional configuration options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This recipe does most of the graph configuration in the gremlin console so that environment variables can be used and
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/db859fb5/hadoop-gremlin/conf/hadoop-gryo.properties
----------------------------------------------------------------------
diff --git a/hadoop-gremlin/conf/hadoop-gryo.properties b/hadoop-gremlin/conf/hadoop-gryo.properties
index 7990431..aaab24d 100644
--- a/hadoop-gremlin/conf/hadoop-gryo.properties
+++ b/hadoop-gremlin/conf/hadoop-gryo.properties
@@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output
spark.master=local[4]
spark.executor.memory=1g
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
-gremlin.spark.persistContext=true
# gremlin.spark.graphStorageLevel=MEMORY_AND_DISK
+# gremlin.spark.persistContext=true
# gremlin.spark.graphWriter=org.apache.tinkerpop.gremlin.spark.structure.io.PersistedOutputRDD
# gremlin.spark.persistStorageLevel=DISK_ONLY
# spark.kryo.registrationRequired=true