You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by ok...@apache.org on 2017/11/01 18:07:43 UTC
[03/14] tinkerpop git commit: Moved gremlin.spark.persistContext back
to recipe
Moved gremlin.spark.persistContext back to recipe
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/df73338f
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/df73338f
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/df73338f
Branch: refs/heads/master
Commit: df73338fe29a8ba1faa1c873a6ed8ff597607b60
Parents: 16f3ee7
Author: HadoopMarc <vt...@xs4all.nl>
Authored: Tue Oct 3 21:20:16 2017 +0200
Committer: HadoopMarc <vt...@xs4all.nl>
Committed: Thu Oct 12 21:55:28 2017 +0200
----------------------------------------------------------------------
docs/src/recipes/olap-spark-yarn.asciidoc | 6 ++++++
hadoop-gremlin/conf/hadoop-gryo.properties | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/df73338f/docs/src/recipes/olap-spark-yarn.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/olap-spark-yarn.asciidoc b/docs/src/recipes/olap-spark-yarn.asciidoc
index 01aedcb..1bcc443 100644
--- a/docs/src/recipes/olap-spark-yarn.asciidoc
+++ b/docs/src/recipes/olap-spark-yarn.asciidoc
@@ -104,6 +104,7 @@ conf.setProperty('spark.yarn.appMasterEnv.CLASSPATH', "./__spark_libs__/*:$hadoo
conf.setProperty('spark.executor.extraClassPath', "./__spark_libs__/*:$hadoopConfDir")
conf.setProperty('spark.driver.extraLibraryPath', "$hadoop/lib/native:$hadoop/lib/native/Linux-amd64-64")
conf.setProperty('spark.executor.extraLibraryPath', "$hadoop/lib/native:$hadoop/lib/native/Linux-amd64-64")
+conf.setProperty('gremlin.spark.persistContext', 'true')
graph = GraphFactory.open(conf)
g = graph.traversal().withComputer(SparkGraphComputer)
g.V().group().by(values('name')).by(both().count())
@@ -125,9 +126,14 @@ as the directory named `+__spark_libs__+` in the Yarn containers. The `spark.exe
`spark.yarn.appMasterEnv.CLASSPATH` properties point to the jars inside this directory.
This is why they contain the `+./__spark_lib__/*+` item. Just because a Spark executor got the archive with
jars loaded into its container, does not mean it knows how to access them.
+
Also the `HADOOP_GREMLIN_LIBS` mechanism is not used because it can not work for Spark on Yarn as implemented (jars
added to the `SparkContext` are not available to the Yarn application master).
+The `gremlin.spark.persistContext` property is explained in the reference documentation of
+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]: it helps in getting
+follow-up OLAP queries answered faster, because you skip the overhead for getting resources from Yarn.
+
Additional configuration options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This recipe does most of the graph configuration in the gremlin console so that environment variables can be used and
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/df73338f/hadoop-gremlin/conf/hadoop-gryo.properties
----------------------------------------------------------------------
diff --git a/hadoop-gremlin/conf/hadoop-gryo.properties b/hadoop-gremlin/conf/hadoop-gryo.properties
index ec56abc..c156a98 100644
--- a/hadoop-gremlin/conf/hadoop-gryo.properties
+++ b/hadoop-gremlin/conf/hadoop-gryo.properties
@@ -29,11 +29,11 @@ gremlin.hadoop.outputLocation=output
####################################
spark.master=local[4]
spark.executor.memory=1g
-gremlin.spark.persistContext=true
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator
# spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer (3.2.x model)
# gremlin.spark.graphStorageLevel=MEMORY_AND_DISK
+# gremlin.spark.persistContext=true
# gremlin.spark.graphWriter=org.apache.tinkerpop.gremlin.spark.structure.io.PersistedOutputRDD
# gremlin.spark.persistStorageLevel=DISK_ONLY
# spark.kryo.registrationRequired=true