You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Jason Plurad <pl...@gmail.com> on 2015/12/23 16:26:00 UTC

HdfsTinkerGraph

I've been playing around with this repo to enable writing a graph out to
HDFS. I haven't tested it at scale, but it seems to work for the basic BLVP
write graph scenario with the Grateful Dead graph.

https://github.com/pluradj/incubator-tinkerpop/commits/hdfstinkergraph
https://github.com/pluradj/titan/tree/titan11-hdfstinkergraph

Does this approach make sense? Would it scale?

I'd appreciate any comments or feedback. Thanks!

-- Jason

Re: HdfsTinkerGraph

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi Jason,

From what I can tell, you are trying to load HDFS data into TinkerGraph. The means by which you are doing this is by creating an HDFSTinkerGraph which wraps a TinkerGraph and has some HDFS data loading capabilities in it (e.g. loadGraph()).

I don't think this is a good idea. If you want to load HDFS data into TinkerGraph, I would do it as such 

	HadoopGraph --BulkLoaderVertexProgram--> TinkerGraph.

*** Kuppitz' BulkLoaderVertexProgramTest demonstrates how to do this.

With the BulkLoaderVertexProgram model, you get a few benefits. 

	1. HadoopGraph can load any InputFormat so not just HDFS files. For example, it could load Spark RDDs, CassandraInputFormat, etc.
	2. We don't have "yet another graph implementation" to maintain, explain, and document.
	3. HdfsTinkerGraph will only scale to the size of a single machine RAM and thus is a bit of "toy implementation."
		- If you really want to use TinkerGraph for Hadoop-data, its a "corner case" that can be solved using BulkLoaderVertexProgram.
	4. If you don't want to use BulkLoaderVertexProgram, then just do hadoopGraph.io().writeGraph(), tinkerGraph.io().readGraph().

Thoughts?,
Marko.

http://markorodriguez.com

On Dec 23, 2015, at 8:26 AM, Jason Plurad <pl...@gmail.com> wrote:

> I've been playing around with this repo to enable writing a graph out to
> HDFS. I haven't tested it at scale, but it seems to work for the basic BLVP
> write graph scenario with the Grateful Dead graph.
> 
> https://github.com/pluradj/incubator-tinkerpop/commits/hdfstinkergraph
> https://github.com/pluradj/titan/tree/titan11-hdfstinkergraph
> 
> Does this approach make sense? Would it scale?
> 
> I'd appreciate any comments or feedback. Thanks!
> 
> -- Jason