You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by okram <gi...@git.apache.org> on 2015/12/02 23:57:37 UTC

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168

    TINKERPOP3-1011: HadoopGraph can't re-attach when the InputFormat is not a FileInputFormat

    https://issues.apache.org/jira/browse/TINKERPOP3-1011
    
    I really half-assed our `InputRDD` work in 3.1.0. Well, it works, but for some providers they have to do stupid work arounds. The reason I never caught the problem was that I didn't have a robust test infrastructure for it. I have since rectified the situation and more. The `SparkGraphComputer` integration tests now choose between using Gryo, GraphSON, or InputRDD as the source data. Thus, we are able to test `SparkGraphComptuer` without any communication to Hadoop HDFS.
    
    I also solved a long time serialization issue with `WrappedArray` in `GryoSerializer`. This makes it so that we can now ALWAYS use `GryoSerializer`. I have changed all the respective tests to now use `GryoSerializer` and TADA happy happy Kryo.
    
    Finally, `HadoopElementIterator` was always bound to use `FileInputFormat`. Why the hell did I leave it like that for so long?! Now ANY `InputFormat` can be streamed into Hadoop OLTP including `InputRDDs` via `InputRDDFormat`. Ballin'.
    
    Finally finally, small tweak to `BulkLoaderVertexProgramTest` to use `target/test-output` as its data storage location (not `/tmp`).
    
    Ran `mvn clean install` and did integration testing on `spark-gremlin/`. All is golden.
    
    VOTE +1.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP3-1011

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #168
    
----
commit e954408b6f30f33d17316551c6d014cd63ab83c2
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-01T16:25:14Z

    HadoopElementIterator now supports ANY InputFormat, not just FileInputFormat. Sweet. Also, if you are using an RDD in Spark (and thus, not really doing Hadoop InputFormat stuffs), we have InputRDDFormat which wraps an RDD in an InputFormat so HadoopElementIterator works as well. This solves the HadoopGraph OLTP problem for ALL InputFormats and it allows ComputerResultStep to Attach elements for more than just FileInputFormats. Good stuff.

commit c374f4ecb1c2c82f911b5fb0a19a66ed663eea60
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T19:02:09Z

    I have the SparkIntegrationTestSuite now testing either from Gryo FileInputFormat, GraphSON FileInputFormat, or an InputRDD. This gives us super coverage and proves that InputRDD (bypassing Hadoop) is working as expected. I also fixed up some other tests that used KryoSerializer instead of GryoSerializer as I learned how to deal with Scalas WrappedArray class. It was insane. This is really good stuff.

commit 311e8abe733995f970d2b1aeb8153584b6ba3024
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T19:11:53Z

    Merge branch 'master' into TINKERPOP3-1011

commit e20ff91995f83efd4ebbe9388b0aec1535669ecb
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T20:11:29Z

    some organization and clean up. Stuff is lookin SOLID. Time to run full integration tests.

commit 91efb28df23fdc0dc99c78bd72159f0478614df1
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T22:44:50Z

    GroovyProcessCompiuterSuite was missing GroovyFlatMapTest. Added it. Added HadoopPool registration to ToyGraphInputRDD so it doesn't give a WARN message. Also I tweaked BulkLoaderVertexProgramTest to use target/test-output/ for its intermediary data.

commit 796fe24ae5a4a858b6b301ffa86efc63a2d98758
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T22:49:42Z

    GroovyProcessStartard was missing GroovyFlatMapTest. Added.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

Posted by dkuppitz <gi...@git.apache.org>.
Github user dkuppitz commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168#issuecomment-161724665
  
    Ok, I think we can ignore the Travis errors for now. Worked like a charm on my local machine.
    
    VOTE: +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

Posted by okram <gi...@git.apache.org>.
Github user okram commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168#issuecomment-161658402
  
    @dkuppitz just realized why the Travis CI breaks.
    
    Everytime you do `HadoopGraph` OLTP, you get:
    
    ```
    [INFO] FileInputFormat - Total input paths to process : 1
    ```
    
    And we do it alllllllot in all or tests. It doesn't show up in the GremlinConsole, only in our tests and I don't know how to turn it off. @dkuppitz said we can probably set the Logging in Travis to only be [WARN]. Dunno... I don't understand Java logging and with each passing year, I understand less.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

Posted by spmallette <gi...@git.apache.org>.
Github user spmallette commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168#issuecomment-161753302
  
    VOTE: +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

Posted by spmallette <gi...@git.apache.org>.
Github user spmallette commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168#issuecomment-161755316
  
    @okram don't worry about the branch conflicts to master - it's probably in the `.travis.yml` file.  you should keep the changes from your branch and replace them over the one in master when you merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

Posted by okram <gi...@git.apache.org>.
Github user okram commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168#issuecomment-161767688
  
    This was merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

Posted by okram <gi...@git.apache.org>.
Github user okram commented on the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168#issuecomment-161510914
  
    Over the course of this evening, I ran full integration tests -- giraph, spark, and neo4j. they all passed.
    
    ```
    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Summary:
    [INFO]
    [INFO] Apache TinkerPop .................................. SUCCESS [6.068s]
    [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.772s]
    [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [40.853s]
    [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [12.714s]
    [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [34.257s]
    [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [7.461s]
    [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [4:16.902s]
    [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [6:53.244s]
    [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [6:02.920s]
    [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:00:55.196s]
    [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [20:51.515s]
    [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [12.510s]
    [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [10:28.470s]
    [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:16.914s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 2:52:42.398s
    [INFO] Finished at: Wed Dec 02 21:12:39 MST 2015
    [INFO] Final Memory: 100M/738M
    [INFO] ------------------------------------------------------------------------
    ~/software/tinkerpop/tinkerpop3$
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP3-1011: HadoopGraph can...

Posted by okram <gi...@git.apache.org>.
Github user okram closed the pull request at:

    https://github.com/apache/incubator-tinkerpop/pull/168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---