You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "stephen mallette (JIRA)" <ji...@apache.org> on 2016/02/03 12:20:39 UTC

[jira] [Updated] (TINKERPOP-1117) InputFormatRDD.readGraphRDD requires a valid gremlin.hadoop.inputLocation, breaking InputFormats (Cassandra, HBase) that don't need one

     [ https://issues.apache.org/jira/browse/TINKERPOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stephen mallette updated TINKERPOP-1117:
----------------------------------------
    Component/s: hadoop

> InputFormatRDD.readGraphRDD requires a valid gremlin.hadoop.inputLocation, breaking InputFormats (Cassandra, HBase) that don't need one
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1117
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1117
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.2.0-incubating
>            Reporter: Dylan Bethune-Waddell
>            Priority: Minor
>             Fix For: 3.2.0-incubating
>
>
> On line 43, the call to Constants.getSearchGraphLocation returns Optional.empty() if gremlin.hadoop.inputLocation=none as advised in Titan's CassandraInputFormat and HBaseInputFormat. Changing the readGraphRDD method to call .isPresent() and only set the storage location in the config if so allows SparkGraphComputer from the 3.2.0-SNAPSHOT branch to work with Titan via CassandraInputFormat in a traversal source:
> {code}
> // Imports
> import java.util.Optional;
> @Override
> public JavaPairRDD<Object, VertexWritable> readGraphRDD(final Configuration configuration, final JavaSparkContext sparkContext) {
>     final org.apache.hadoop.conf.Configuration hadoopConfiguration = ConfUtil.makeHadoopConfiguration(configuration);
>     // This part was used directly in hadoopConfiguration.set(...)
>     final Optional<String> searchGraph = Constants.getSearchGraphLocation(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), FileSystemStorage.open(hadoopConfiguration));
>     if (searchGraph.isPresent()) {
>         hadoopConfiguration.set(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION), searchGraph.get());
>     }
>     return sparkContext.newAPIHadoopRDD(hadoopConfiguration, (Class<InputFormat<NullWritable, VertexWritable>>) hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, InputFormat.class),
>         NullWritable.class,
>         VertexWritable.class)
>         .mapToPair(tuple -> new Tuple2<>(tuple._2().get().id(), new VertexWritable(tuple._2().get())));
> {code}
> I don't really understand the intended behaviour, so this is probably not the right thing to do. Would the addition of a configuration variable such as "gremlin.hadoop.inputLocationRequired" that defaults to true, and can be set to false for these other input formats work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)