You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2015/12/02 21:22:52 UTC
[DISCUSS] DefaultInputRDD and DefaultInputFormat
Hello,
It is possible for us to provide a DefaultInputRDD and DefaultInputFormat to allow any OLTP graph system to easily load the data into Giraph/Spark/etc.
https://issues.apache.org/jira/browse/TINKERPOP3-1015
This is a "quick and dirty" as its single threaded -- no splits. It uses Graph.vertices() to stream in the vertices one at a time.
Would people be interested in this feature? It would allow you to, for example, use Spark with Neo4j. Also, another thing we could do to make this efficient is:
List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
Then each graph provider can specify how to do parallel reads. The default implementation would be:
List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
list.add(this.vertices());
return splits;
Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.
Take care,
Marko.
http://markorodriguez.com
Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat
Posted by Stephen Mallette <sp...@gmail.com>.
I like the sound of where this is going - seems like a good idea to me.
On Thu, Dec 3, 2015 at 8:20 PM, Ran Magen <rm...@gmail.com> wrote:
> After digging some more in the code, I retract my ill-informed question.
>
> Apologies,
> Ran
>
>
> On Thu, 3 Dec 2015 at 23:11 Ran Magen <rm...@gmail.com> wrote:
>
> > This would be great for me.
> > In Unopop we want to enable running heavy queries in a distributed
> manner.
> > We figured we could implement some kind of UnipopSparkComputer that
> > utilizes the current Spark implementation, but from a quick check we
> didn't
> > find an obvious way to do that.
> >
> > Might DefaultInputRDD be a good solution for us?
> >
> > Cheers,
> > Ran
> >
> > On Wed, 2 Dec 2015 at 22:23 Marko Rodriguez <ok...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> It is possible for us to provide a DefaultInputRDD and
> DefaultInputFormat
> >> to allow any OLTP graph system to easily load the data into
> >> Giraph/Spark/etc.
> >>
> >> https://issues.apache.org/jira/browse/TINKERPOP3-1015
> >>
> >> This is a "quick and dirty" as its single threaded -- no splits. It uses
> >> Graph.vertices() to stream in the vertices one at a time.
> >>
> >> Would people be interested in this feature? It would allow you to, for
> >> example, use Spark with Neo4j. Also, another thing we could do to make
> this
> >> efficient is:
> >>
> >> List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
> >>
> >> Then each graph provider can specify how to do parallel reads. The
> >> default implementation would be:
> >>
> >> List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
> >> list.add(this.vertices());
> >> return splits;
> >>
> >> Anywho…. random idea as I was doing some Spark InputRDD test suite
> stuff.
> >>
> >> Take care,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >>
>
Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat
Posted by Ran Magen <rm...@gmail.com>.
After digging some more in the code, I retract my ill-informed question.
Apologies,
Ran
On Thu, 3 Dec 2015 at 23:11 Ran Magen <rm...@gmail.com> wrote:
> This would be great for me.
> In Unopop we want to enable running heavy queries in a distributed manner.
> We figured we could implement some kind of UnipopSparkComputer that
> utilizes the current Spark implementation, but from a quick check we didn't
> find an obvious way to do that.
>
> Might DefaultInputRDD be a good solution for us?
>
> Cheers,
> Ran
>
> On Wed, 2 Dec 2015 at 22:23 Marko Rodriguez <ok...@gmail.com> wrote:
>
>> Hello,
>>
>> It is possible for us to provide a DefaultInputRDD and DefaultInputFormat
>> to allow any OLTP graph system to easily load the data into
>> Giraph/Spark/etc.
>>
>> https://issues.apache.org/jira/browse/TINKERPOP3-1015
>>
>> This is a "quick and dirty" as its single threaded -- no splits. It uses
>> Graph.vertices() to stream in the vertices one at a time.
>>
>> Would people be interested in this feature? It would allow you to, for
>> example, use Spark with Neo4j. Also, another thing we could do to make this
>> efficient is:
>>
>> List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
>>
>> Then each graph provider can specify how to do parallel reads. The
>> default implementation would be:
>>
>> List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
>> list.add(this.vertices());
>> return splits;
>>
>> Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.
>>
>> Take care,
>> Marko.
>>
>> http://markorodriguez.com
>>
>>
Re: [DISCUSS] DefaultInputRDD and DefaultInputFormat
Posted by Ran Magen <rm...@gmail.com>.
This would be great for me.
In Unopop we want to enable running heavy queries in a distributed manner.
We figured we could implement some kind of UnipopSparkComputer that
utilizes the current Spark implementation, but from a quick check we didn't
find an obvious way to do that.
Might DefaultInputRDD be a good solution for us?
Cheers,
Ran
On Wed, 2 Dec 2015 at 22:23 Marko Rodriguez <ok...@gmail.com> wrote:
> Hello,
>
> It is possible for us to provide a DefaultInputRDD and DefaultInputFormat
> to allow any OLTP graph system to easily load the data into
> Giraph/Spark/etc.
>
> https://issues.apache.org/jira/browse/TINKERPOP3-1015
>
> This is a "quick and dirty" as its single threaded -- no splits. It uses
> Graph.vertices() to stream in the vertices one at a time.
>
> Would people be interested in this feature? It would allow you to, for
> example, use Spark with Neo4j. Also, another thing we could do to make this
> efficient is:
>
> List<Iterator<Vertex>> Graph.vertexSplits(int numberOfSplits)
>
> Then each graph provider can specify how to do parallel reads. The default
> implementation would be:
>
> List<Iterator<Vertex>> splits = new ArrayList<>(numberOfSplits);
> list.add(this.vertices());
> return splits;
>
> Anywho…. random idea as I was doing some Spark InputRDD test suite stuff.
>
> Take care,
> Marko.
>
> http://markorodriguez.com
>
>