You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/12/02 23:58:11 UTC
[jira] [Commented] (TINKERPOP-1011) HadoopGraph can't re-attach when the InputFormat is not a FileInputFormat

    [ https://issues.apache.org/jira/browse/TINKERPOP-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036826#comment-15036826 ] 

ASF GitHub Bot commented on TINKERPOP-1011:
-------------------------------------------

GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/168

    TINKERPOP3-1011: HadoopGraph can't re-attach when the InputFormat is not a FileInputFormat

    https://issues.apache.org/jira/browse/TINKERPOP3-1011
    
    I really half-assed our `InputRDD` work in 3.1.0. Well, it works, but for some providers they have to do stupid work arounds. The reason I never caught the problem was that I didn't have a robust test infrastructure for it. I have since rectified the situation and more. The `SparkGraphComputer` integration tests now choose between using Gryo, GraphSON, or InputRDD as the source data. Thus, we are able to test `SparkGraphComptuer` without any communication to Hadoop HDFS.
    
    I also solved a long time serialization issue with `WrappedArray` in `GryoSerializer`. This makes it so that we can now ALWAYS use `GryoSerializer`. I have changed all the respective tests to now use `GryoSerializer` and TADA happy happy Kryo.
    
    Finally, `HadoopElementIterator` was always bound to use `FileInputFormat`. Why the hell did I leave it like that for so long?! Now ANY `InputFormat` can be streamed into Hadoop OLTP including `InputRDDs` via `InputRDDFormat`. Ballin'.
    
    Finally finally, small tweak to `BulkLoaderVertexProgramTest` to use `target/test-output` as its data storage location (not `/tmp`).
    
    Ran `mvn clean install` and did integration testing on `spark-gremlin/`. All is golden.
    
    VOTE +1.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP3-1011

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #168
    
----
commit e954408b6f30f33d17316551c6d014cd63ab83c2
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-01T16:25:14Z

    HadoopElementIterator now supports ANY InputFormat, not just FileInputFormat. Sweet. Also, if you are using an RDD in Spark (and thus, not really doing Hadoop InputFormat stuffs), we have InputRDDFormat which wraps an RDD in an InputFormat so HadoopElementIterator works as well. This solves the HadoopGraph OLTP problem for ALL InputFormats and it allows ComputerResultStep to Attach elements for more than just FileInputFormats. Good stuff.

commit c374f4ecb1c2c82f911b5fb0a19a66ed663eea60
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T19:02:09Z

    I have the SparkIntegrationTestSuite now testing either from Gryo FileInputFormat, GraphSON FileInputFormat, or an InputRDD. This gives us super coverage and proves that InputRDD (bypassing Hadoop) is working as expected. I also fixed up some other tests that used KryoSerializer instead of GryoSerializer as I learned how to deal with Scalas WrappedArray class. It was insane. This is really good stuff.

commit 311e8abe733995f970d2b1aeb8153584b6ba3024
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T19:11:53Z

    Merge branch 'master' into TINKERPOP3-1011

commit e20ff91995f83efd4ebbe9388b0aec1535669ecb
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T20:11:29Z

    some organization and clean up. Stuff is lookin SOLID. Time to run full integration tests.

commit 91efb28df23fdc0dc99c78bd72159f0478614df1
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T22:44:50Z

    GroovyProcessCompiuterSuite was missing GroovyFlatMapTest. Added it. Added HadoopPool registration to ToyGraphInputRDD so it doesn't give a WARN message. Also I tweaked BulkLoaderVertexProgramTest to use target/test-output/ for its intermediary data.

commit 796fe24ae5a4a858b6b301ffa86efc63a2d98758
Author: Marko A. Rodriguez <ok...@gmail.com>
Date:   2015-12-02T22:49:42Z

    GroovyProcessStartard was missing GroovyFlatMapTest. Added.

----


> HadoopGraph can't re-attach when the InputFormat is not a FileInputFormat
> -------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1011
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1011
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.1.0-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.1.1-incubating
>
>
> We need to generalize {{HadoopElementIterator}} to support non-filebased {{InputFormats}}.
> https://github.com/apache/incubator-tinkerpop/blob/master/hadoop-gremlin/src/main/java/org/apache/tinkerpop/gremlin/hadoop/structure/hdfs/HadoopElementIterator.java
> A bug is showing up with a provider that only uses {{InputRDD}} and thus, no HDFS interactions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)