You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by graben1437 <gi...@git.apache.org> on 2015/07/01 21:32:30 UTC

[GitHub] incubator-tinkerpop pull request: TINKERPOP-714 - remove jsr305 fr...

Github user graben1437 commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/74#issuecomment-117801830
  
    For the SparkGraphComputer testing:
    
    Set up Spark 1.2.1 (prebuild) in standalone cluster mode with 1 master and 2 workers.  
    Started the master and the workers and verified they were running via command line and Spark UI.
    
    On the same node, I built the latest (as of 7/1) TinkerPop3 with the JSR patch in this pull request in place.  Verified that the jsr305 jar was not found under the distribution:
    /home/..../tinkerpop3/incubator-tinkerpop/hadoop-gremlin
    find . -name jsr305*
    <<nothing returned>>
    Without the fix in place the following are the results of the find command:
    ./hadoop-gremlin/target/hadoop-gremlin-3.0.0-SNAPSHOT-standalone/lib/jsr305-1.3.9.jar
    
    Also grepped the jar files to verify that jsr305 is not packaged:
    grep jsr305 *.jar
    << no output>>
    
    The following is the output when the jsr305 is present:
    ./incubator-tinkerpop/hadoop-gremlin/target
    grep jsr305 *.jar
    Binary file hadoop-gremlin-3.0.0-SNAPSHOT-job.jar matches
    
    At the very end, after testing, I went to the spark-1.2.1/work directory
    and ran the following command to verify that jsr305 was not in the "jar loads"
    being sent to Spark:
    find . -name *.jar -exec grep -H jsr305 {} \;
    << returned nothing>>
    
    Next:
    Under gremlin-console/target I unzipped apache-gremlin-console-3.0.0-SNAPSHOT-distribution.zip
    cd apache-gremlin-console-3.0.0-SNAPSHOT
    vi conf/hadoop-gryo.properties 
    In that file change:
    #spark.master=local[4]
    spark.master=spark://machine1.xx.xxx.xxx.com:7077 
    which is the Spark master indicated by the  1.2.1master started above.
    
    I also copied ./ext/hadoop-gremlin/lib jar files over the ./lib files to eliminate Spark errors about class serialization.
    
    The following queries were performed to validate the output was correct as well as checking the Spark master and worker logs to make sure no exceptions were thrown:
    
    bin/gremlin.sh
    
             \,,,/
             (o o)
    -----oOOo-(3)-oOOo-----
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities
    INFO  org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph  - HADOOP_GREMLIN_LIBS is set to: /home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/ext/hadoop-gremlin/lib
    plugin activated: tinkerpop.hadoop
    plugin activated: tinkerpop.tinkergraph
    graph = GraphFactory.open('/home/..../tinkerpop3/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.0.0-SNAPSHOT/conf/hadoop/hadoop-gryo.properties')
    ==>hadoopgraph[gryoinputformat->gryooutputformat]
    gremlin> g=graph.traversal(computer(SparkGraphComputer))
    ==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]
    gremlin> g.V().count()
    ==>6
    gremlin> g.V().group().by(bothE().count()) 
    ==>[1:[v[6], v[5], v[2]], 3:[v[4], v[1], v[3]]]
    gremlin> g.V().groupCount('a').by(label).cap('a')
    ==>[software:2, person:4]
    gremlin> g.V().range(0,3)
    WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy  - Snappy native library not loaded
    ==>v[4]
    ==>v[1]
    ==>v[6]
    
    Based on this small sample of queries running against a stand alone Spark, it appears that removing the jsr305.jar from the standalone and/or distribution jar does not adversely impact use of the SparkGraphComputer functionality.
    
    I will test the GiraphGraphComputer next, assuming this all looks correct here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---