You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2015/12/02 21:22:52 UTC

[DISCUSS] DefaultInputRDD and DefaultInputFormat

Hello,

It is possible for us to provide a DefaultInputRDD and DefaultInputFormat to allow any OLTP graph system to easily load the data into Giraph/Spark/etc.

	https://issues.apache.org/jira/browse/TINKERPOP3-1015

This is a "quick and dirty" as its single threaded -- no splits. It uses Graph.vertices() to stream in the vertices one at a time.

Would people be interested in this feature? It would allow you to, for example, use Spark with Neo4j. Also, another thing we could do to make this efficient is:

	List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)

Then each graph provider can specify how to do parallel reads. The default implementation would be:
	
	List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
	list.add(this.vertices());
	return splits;

Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.

Take care,
Marko.

http://markorodriguez.com

Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat

Posted by Stephen Mallette <sp...@gmail.com>.

I like the sound of where this is going - seems like a good idea to me.

On Thu, Dec 3, 2015 at 8:20 PM, Ran Magen <rm...@gmail.com> wrote:

> After digging some more in the code, I retract my ill-informed question.
>
> Apologies,
> Ran
>
>
> On Thu, 3 Dec 2015 at 23:11 Ran Magen <rm...@gmail.com> wrote:
>
> > This would be great for me.
> > In Unopop we want to enable running heavy queries in a distributed
> manner.
> > We figured we could implement some kind of UnipopSparkComputer that
> > utilizes the current Spark implementation, but from a quick check we
> didn't
> > find an obvious way to do that.
> >
> > Might DefaultInputRDD be a good solution for us?
> >
> > Cheers,
> > Ran
> >
> > On Wed, 2 Dec 2015 at 22:23 Marko Rodriguez <ok...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> It is possible for us to provide a DefaultInputRDD and
> DefaultInputFormat
> >> to allow any OLTP graph system to easily load the data into
> >> Giraph/Spark/etc.
> >>
> >>         https://issues.apache.org/jira/browse/TINKERPOP3-1015
> >>
> >> This is a "quick and dirty" as its single threaded -- no splits. It uses
> >> Graph.vertices() to stream in the vertices one at a time.
> >>
> >> Would people be interested in this feature? It would allow you to, for
> >> example, use Spark with Neo4j. Also, another thing we could do to make
> this
> >> efficient is:
> >>
> >>         List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
> >>
> >> Then each graph provider can specify how to do parallel reads. The
> >> default implementation would be:
> >>
> >>         List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
> >>         list.add(this.vertices());
> >>         return splits;
> >>
> >> Anywho…. random idea as I was doing some Spark InputRDD test suite
> stuff.
> >>
> >> Take care,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >>
>

Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat

Posted by Ran Magen <rm...@gmail.com>.

After digging some more in the code, I retract my ill-informed question.

Apologies,
Ran


On Thu, 3 Dec 2015 at 23:11 Ran Magen <rm...@gmail.com> wrote:

> This would be great for me.
> In Unopop we want to enable running heavy queries in a distributed manner.
> We figured we could implement some kind of UnipopSparkComputer that
> utilizes the current Spark implementation, but from a quick check we didn't
> find an obvious way to do that.
>
> Might DefaultInputRDD be a good solution for us?
>
> Cheers,
> Ran
>
> On Wed, 2 Dec 2015 at 22:23 Marko Rodriguez <ok...@gmail.com> wrote:
>
>> Hello,
>>
>> It is possible for us to provide a DefaultInputRDD and DefaultInputFormat
>> to allow any OLTP graph system to easily load the data into
>> Giraph/Spark/etc.
>>
>>         https://issues.apache.org/jira/browse/TINKERPOP3-1015
>>
>> This is a "quick and dirty" as its single threaded -- no splits. It uses
>> Graph.vertices() to stream in the vertices one at a time.
>>
>> Would people be interested in this feature? It would allow you to, for
>> example, use Spark with Neo4j. Also, another thing we could do to make this
>> efficient is:
>>
>>         List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
>>
>> Then each graph provider can specify how to do parallel reads. The
>> default implementation would be:
>>
>>         List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
>>         list.add(this.vertices());
>>         return splits;
>>
>> Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.
>>
>> Take care,
>> Marko.
>>
>> http://markorodriguez.com
>>
>>

Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat

Posted by Ran Magen <rm...@gmail.com>.

This would be great for me.
In Unopop we want to enable running heavy queries in a distributed manner.
We figured we could implement some kind of UnipopSparkComputer that
utilizes the current Spark implementation, but from a quick check we didn't
find an obvious way to do that.

Might DefaultInputRDD be a good solution for us?

Cheers,
Ran

On Wed, 2 Dec 2015 at 22:23 Marko Rodriguez <ok...@gmail.com> wrote:

> Hello,
>
> It is possible for us to provide a DefaultInputRDD and DefaultInputFormat
> to allow any OLTP graph system to easily load the data into
> Giraph/Spark/etc.
>
>         https://issues.apache.org/jira/browse/TINKERPOP3-1015
>
> This is a "quick and dirty" as its single threaded -- no splits. It uses
> Graph.vertices() to stream in the vertices one at a time.
>
> Would people be interested in this feature? It would allow you to, for
> example, use Spark with Neo4j. Also, another thing we could do to make this
> efficient is:
>
>         List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
>
> Then each graph provider can specify how to do parallel reads. The default
> implementation would be:
>
>         List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
>         list.add(this.vertices());
>         return splits;
>
> Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.
>
> Take care,
> Marko.
>
> http://markorodriguez.com
>
>