You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by graben1437 <gi...@git.apache.org> on 2015/07/01 21:32:30 UTC
[GitHub] incubator-tinkerpop pull request: TINKERPOP-714 - remove jsr305 fr...
Github user graben1437 commented on the pull request:
https://github.com/apache/incubator-tinkerpop/pull/74#issuecomment-117801830
For the SparkGraphComputer testing:
Set up Spark 1.2.1 (prebuild) in standalone cluster mode with 1 master and 2 workers.
Started the master and the workers and verified they were running via command line and Spark UI.
On the same node, I built the latest (as of 7/1) TinkerPop3 with the JSR patch in this pull request in place. Verified that the jsr305 jar was not found under the distribution:
/home/..../tinkerpop3/incubator-tinkerpop/hadoop-gremlin
find . -name jsr305*
<<nothing returned>>
Without the fix in place the following are the results of the find command:
./hadoop-gremlin/target/hadoop-gremlin-3.0.0-SNAPSHOT-standalone/lib/jsr305-1.3.9.jar
Also grepped the jar files to verify that jsr305 is not packaged:
grep jsr305 *.jar
<< no output>>
The following is the output when the jsr305 is present:
./incubator-tinkerpop/hadoop-gremlin/target
grep jsr305 *.jar
Binary file hadoop-gremlin-3.0.0-SNAPSHOT-job.jar matches
At the very end, after testing, I went to the spark-1.2.1/work directory
and ran the following command to verify that jsr305 was not in the "jar loads"
being sent to Spark:
find . -name *.jar -exec grep -H jsr305 {} \;
<< returned nothing>>
Next:
Under gremlin-console/target I unzipped apache-gremlin-console-3.0.0-SNAPSHOT-distribution.zip
cd apache-gremlin-console-3.0.0-SNAPSHOT
vi conf/hadoop-gryo.properties
In that file change:
#spark.master=local[4]
spark.master=spark://machine1.xx.xxx.xxx.com:7077
which is the Spark master indicated by the 1.2.1master started above.
I also copied ./ext/hadoop-gremlin/lib jar files over the ./lib files to eliminate Spark errors about class serialization.
The following queries were performed to validate the output was correct as well as checking the Spark master and worker logs to make sure no exceptions were thrown:
bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph - HADOOP_GREMLIN_LIBS is set to: /home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/ext/hadoop-gremlin/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph
graph = GraphFactory.open('/home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/conf/hadoop/hadoop-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g=graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
==>6
gremlin> g.V().group().by(bothE().count())
==>[1:[v[6], v[5], v[2]], 3:[v[4], v[1], v[3]]]
gremlin> g.V().groupCount('a').by(label).cap('a')
==>[software:2, person:4]
gremlin> g.V().range(0,3)
WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
==>v[4]
==>v[1]
==>v[6]
Based on this small sample of queries running against a stand alone Spark, it appears that removing the jsr305.jar from the standalone and/or distribution jar does not adversely impact use of the SparkGraphComputer functionality.
I will test the GiraphGraphComputer next, assuming this all looks correct here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---