You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by James Thornton <ja...@jamesthornton.com> on 2015/12/04 11:28:05 UTC

Re: Gremlin on Flink & Gelly?

*Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache *
*Flink*

https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw>
youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw
<https://youtu.be/-tFzG2dzJXw>
On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <ok...@gmail.com> wrote:

> Hi Vasia (everyone),
>
> Does Flink have a graph query language? If not, then with a
> FlinkGraphComputer implementation, Flink could ship with Gremlin support.
>
> If you have the time, please read the following blog post as it will help
> explain our approach and how Flink could benefit from it:
>
> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
>
> In short, if Flink provides a FlinkGraphComputer implementation, then the
> Gremlin virtual machine will work over Flink and any language that compiles
> to the Gremlin virtual machine will thus work over Flink.
>
> If you would like to see a demo of TinkerPop with, for example Spark or
> Giraph, I'd be more than happy to do a Google Hangout session with you (< 1
> hour) so you can better understand the breadth of the work we are doing and
> how it can benefit your efforts.
>
> Thanks Vasia,
> Marko.
>
> http://markorodriguez.com
>
> On Nov 27, 2015, at 5:27 AM, Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > Hi Vasia, I had started tinkering on it in my spare time in a separate
> > repo.  There really isn't much to collaborate on at this point.  I was
> > mostly trying to understand the parallels between Flink and Spark so
> that I
> > could understand how a FlinkGraphComputer could be implemented given what
> > I'd seen of the Spark implementation Marko did.  I had expected to
> > contribute the work to Flink (rather than keep it here on the TinkerPop
> > side).  Anyway, not much else to offer - Marko can probably get you
> running
> > much faster than I can, as that area is where he holds the most
> expertise.
> > You should probably keep an eye out for his comments.
> >
> >
> >
> > On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <va...@apache.org>
> wrote:
> >
> >> Hi James and TinkerPop community,
> >>
> >> thanks a lot for starting this discussion!
> >> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet you
> ;)
> >>
> >> I'm only starting to get familiar with the TinkerPop project, but it
> seems
> >> that it can play well with Flink.
> >> As you already noticed, a FlinkGraphComputer should be straight-forward
> to
> >> implement. Gelly has a vertex-centric API that is similar to the
> >> scatter-gather model [1] and a gather-sum-apply API [2] that is closer
> to
> >> the Powergraph model. These are built on top of Flink's delta iteration
> >> operators, which are more generic and could also be used directly for
> the
> >> FlinkGraphComputer, if the existing Gelly abstractions won't work.
> >>
> >> Regarding the difference between stream and batch in Flink. Flink is a
> >> streaming dataflow engine, on top of which you can run both streaming
> and
> >> batch jobs. A batch job is simply seen by Flink as a job operating on a
> >> finite stream. Respectively, Flink has a stream and a batch API. Gelly
> is
> >> currently built on top of the batch API, i.e. the DataSet API.
> >>
> >> James mentioned in the Flink mailing list that someone has already
> started
> >> working on a FlinkGraphComputer. Is there a JIRA for this? Let me know
> if
> >> you have questions or you think I can help in some way!
> >>
> >> Cheers,
> >> -Vasia.
> >>
> >> [1]:
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
> >> [2]:
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations
> >> [3]:
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator
> >>
> >> On 25 November 2015 at 17:05, James Thornton <ja...@jamesthornton.com>
> >> wrote:
> >>
> >>> Hi Vasia -
> >>>
> >>> Welcome to TinkerPop (linking you into the Flink thread as
> requested)...
> >>>
> >>> - James
> >>>
> >>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <
> okrammarko@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi James,
> >>>>
> >>>> Thank you for always having a ear to the tech pulse. If it wasn't for
> >>> you,
> >>>> I would still be excited about XMPP and would be programming in
> Tcl/Tk.
> >>>>
> >>>> Given my 20 minute review of their docs …… It would be cool if like
> the
> >>>> "Table API," they also had a "Graph API" that was just TinkerPop
> >>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple step
> >> --
> >>>> they already have a "vertex-centric" API and thus, having a
> >>>> FlinkGraphComputer implementation seems "easy." Then from there,
> >> Gremlin
> >>>> should just work. I don't really understand the difference between
> >> steam
> >>>> and batch unless they are talking the difference between "Storm" and
> >>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the
> >>>> stream-scene.
> >>>>
> >>>> Next, their fluent API is similar to Spark's and I would argue that
> >>>> Gremlin's API is much nicer than just low-level primitives like map(),
> >>>> flatMap(), etc. Thus, they could really benefit from having a full
> >> graph
> >>>> query language already available for their users. (As a side note, its
> >>>> really nice to see more and more systems use functional/fluent APIs as
> >>> this
> >>>> really trains the next generation to think like this which is
> important
> >>> as
> >>>> Gremlin is purely this! Hopefully the SQL model of querying starts to
> >>> look
> >>>> odd to people in comparison.)
> >>>>
> >>>> I just sent out this tweet:
> >>>>        https://twitter.com/apachetinkerpop/status/668820458599530497
> >>>>
> >>>> If they seem positive, I can detail in JIRA what would be required for
> >>>> them to have TinkerPop-support.
> >>>>
> >>>> Thanks again James,
> >>>> Marko.
> >>>>
> >>>> http://markorodriguez.com
> >>>>
> >>>> On Nov 19, 2015, at 12:19 PM, James Thornton <james@jamesthornton.com
> >
> >>>> wrote:
> >>>>
> >>>>> Hi -
> >>>>>
> >>>>> Apache Flink has a graph API named Gelly...
> >>>>>
> >>>>>
> >> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
> >>>>>
> >>>>> ...and Flink's "dedicated support for iterative operations" should
> >> pair
> >>>>> well with Gremlin:
> >>>>>
> >>>>> https://flink.apache.org/features.html
> >>>>>
> >>>>> Has anyone dug into this yet?
> >>>>>
> >>>>> - James
> >>>>>
> >>>>>
> >>>>> --
> >>>>> James Thornton, *http://electricspeed.com <http://electricspeed.com
> >>> *
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> James Thornton, *http://electricspeed.com <http://electricspeed.com>*
> >>>
> >>
>
>

Re: Gremlin on Flink & Gelly?

Posted by Vasiliki Kalavri <va...@gmail.com>.
Hi,


On 9 December 2015 at 17:06, Marko Rodriguez <ok...@gmail.com> wrote:

> Hi Vasia,
>
> > Flink doesn't have a graph query language yet, so Gremlin support would
> be
> > a really nice contribution.
> > I have read the blog post and also the Gremlin paper. There are some
> really
> > great ideas in there!
>
> Great. Glad you are excited about Gremlin.
>
> > I'm currently quite busy with several projects, so I don't see myself
> > working on a FlinkGraphComputer soon. If someone from the TinkerPop
> > community would like to take this on, I (and the rest of the Flink
> > community) would of course be more than happy to provide feedback and
> help
> > with Flink-related issues. Otherwise, I'll get back to you once my load
> > levels decrease a bit :)
>
> In the past, TinkerPop use to be a "dumping ground" for all
> implementations, but we decided for TinkerPop3 that we would only have
> "reference implementations" so users can play, system providers can learn,
> and ultimately, system providers would provide TinkerPop support in their
> distribution. As such, we would like to have FlinkGraphComputer distributed
> with Flink. If that sounds like something your project would be comfortable
> with, I think we can provide a JIRA/PR for FlinkGraphComputer (as well as
> any necessary documentation). We can start with a JIRA ticket to get things
> going. Thoughts?
>

​I see. This makes sense.
It sounds like a good idea to me! Let me sync with the Flink community, so
we make sure we're all in the same page.
I'll cc dev@tinkerpop, so both communities can provide feedback.

Thanks!
-Vasia.



>
> Besides some I/O stuff (InputFormats, RDDs, etc.), this is the beef of the
> SparkGraphComputer implementation:
>
> https://github.com/apache/incubator-tinkerpop/tree/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer
>
> > Keep up the great work!
>
> Thanks, you too.
>
> Marko.
>
> http://markorodriguez.com
>
>
>
> >
> > On 4 December 2015 at 11:28, James Thornton <ja...@jamesthornton.com>
> wrote:
> >
> >> *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache *
> >> *Flink*
> >>
> >> https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw>
> >> youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw
> >> <https://youtu.be/-tFzG2dzJXw>
> >> On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <ok...@gmail.com>
> wrote:
> >>
> >>> Hi Vasia (everyone),
> >>>
> >>> Does Flink have a graph query language? If not, then with a
> >>> FlinkGraphComputer implementation, Flink could ship with Gremlin
> support.
> >>>
> >>> If you have the time, please read the following blog post as it will
> help
> >>> explain our approach and how Flink could benefit from it:
> >>>
> >>>
> >>
> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
> >>>
> >>> In short, if Flink provides a FlinkGraphComputer implementation, then
> the
> >>> Gremlin virtual machine will work over Flink and any language that
> >> compiles
> >>> to the Gremlin virtual machine will thus work over Flink.
> >>>
> >>> If you would like to see a demo of TinkerPop with, for example Spark or
> >>> Giraph, I'd be more than happy to do a Google Hangout session with you
> >> (< 1
> >>> hour) so you can better understand the breadth of the work we are doing
> >> and
> >>> how it can benefit your efforts.
> >>>
> >>> Thanks Vasia,
> >>> Marko.
> >>>
> >>> http://markorodriguez.com
> >>>
> >>> On Nov 27, 2015, at 5:27 AM, Stephen Mallette <sp...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Vasia, I had started tinkering on it in my spare time in a separate
> >>>> repo.  There really isn't much to collaborate on at this point.  I was
> >>>> mostly trying to understand the parallels between Flink and Spark so
> >>> that I
> >>>> could understand how a FlinkGraphComputer could be implemented given
> >> what
> >>>> I'd seen of the Spark implementation Marko did.  I had expected to
> >>>> contribute the work to Flink (rather than keep it here on the
> TinkerPop
> >>>> side).  Anyway, not much else to offer - Marko can probably get you
> >>> running
> >>>> much faster than I can, as that area is where he holds the most
> >>> expertise.
> >>>> You should probably keep an eye out for his comments.
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <va...@apache.org>
> >>> wrote:
> >>>>
> >>>>> Hi James and TinkerPop community,
> >>>>>
> >>>>> thanks a lot for starting this discussion!
> >>>>> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet
> >> you
> >>> ;)
> >>>>>
> >>>>> I'm only starting to get familiar with the TinkerPop project, but it
> >>> seems
> >>>>> that it can play well with Flink.
> >>>>> As you already noticed, a FlinkGraphComputer should be
> >> straight-forward
> >>> to
> >>>>> implement. Gelly has a vertex-centric API that is similar to the
> >>>>> scatter-gather model [1] and a gather-sum-apply API [2] that is
> closer
> >>> to
> >>>>> the Powergraph model. These are built on top of Flink's delta
> >> iteration
> >>>>> operators, which are more generic and could also be used directly for
> >>> the
> >>>>> FlinkGraphComputer, if the existing Gelly abstractions won't work.
> >>>>>
> >>>>> Regarding the difference between stream and batch in Flink. Flink is
> a
> >>>>> streaming dataflow engine, on top of which you can run both streaming
> >>> and
> >>>>> batch jobs. A batch job is simply seen by Flink as a job operating on
> >> a
> >>>>> finite stream. Respectively, Flink has a stream and a batch API.
> Gelly
> >>> is
> >>>>> currently built on top of the batch API, i.e. the DataSet API.
> >>>>>
> >>>>> James mentioned in the Flink mailing list that someone has already
> >>> started
> >>>>> working on a FlinkGraphComputer. Is there a JIRA for this? Let me
> know
> >>> if
> >>>>> you have questions or you think I can help in some way!
> >>>>>
> >>>>> Cheers,
> >>>>> -Vasia.
> >>>>>
> >>>>> [1]:
> >>>>>
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
> >>>>> [2]:
> >>>>>
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations
> >>>>> [3]:
> >>>>>
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator
> >>>>>
> >>>>> On 25 November 2015 at 17:05, James Thornton <
> james@jamesthornton.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Vasia -
> >>>>>>
> >>>>>> Welcome to TinkerPop (linking you into the Flink thread as
> >>> requested)...
> >>>>>>
> >>>>>> - James
> >>>>>>
> >>>>>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <
> >>> okrammarko@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi James,
> >>>>>>>
> >>>>>>> Thank you for always having a ear to the tech pulse. If it wasn't
> >> for
> >>>>>> you,
> >>>>>>> I would still be excited about XMPP and would be programming in
> >>> Tcl/Tk.
> >>>>>>>
> >>>>>>> Given my 20 minute review of their docs …… It would be cool if like
> >>> the
> >>>>>>> "Table API," they also had a "Graph API" that was just TinkerPop
> >>>>>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple
> >> step
> >>>>> --
> >>>>>>> they already have a "vertex-centric" API and thus, having a
> >>>>>>> FlinkGraphComputer implementation seems "easy." Then from there,
> >>>>> Gremlin
> >>>>>>> should just work. I don't really understand the difference between
> >>>>> steam
> >>>>>>> and batch unless they are talking the difference between "Storm"
> and
> >>>>>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the
> >>>>>>> stream-scene.
> >>>>>>>
> >>>>>>> Next, their fluent API is similar to Spark's and I would argue that
> >>>>>>> Gremlin's API is much nicer than just low-level primitives like
> >> map(),
> >>>>>>> flatMap(), etc. Thus, they could really benefit from having a full
> >>>>> graph
> >>>>>>> query language already available for their users. (As a side note,
> >> its
> >>>>>>> really nice to see more and more systems use functional/fluent APIs
> >> as
> >>>>>> this
> >>>>>>> really trains the next generation to think like this which is
> >>> important
> >>>>>> as
> >>>>>>> Gremlin is purely this! Hopefully the SQL model of querying starts
> >> to
> >>>>>> look
> >>>>>>> odd to people in comparison.)
> >>>>>>>
> >>>>>>> I just sent out this tweet:
> >>>>>>>
> >> https://twitter.com/apachetinkerpop/status/668820458599530497
> >>>>>>>
> >>>>>>> If they seem positive, I can detail in JIRA what would be required
> >> for
> >>>>>>> them to have TinkerPop-support.
> >>>>>>>
> >>>>>>> Thanks again James,
> >>>>>>> Marko.
> >>>>>>>
> >>>>>>> http://markorodriguez.com
> >>>>>>>
> >>>>>>> On Nov 19, 2015, at 12:19 PM, James Thornton <
> >> james@jamesthornton.com
> >>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi -
> >>>>>>>>
> >>>>>>>> Apache Flink has a graph API named Gelly...
> >>>>>>>>
> >>>>>>>>
> >>>>>
> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
> >>>>>>>>
> >>>>>>>> ...and Flink's "dedicated support for iterative operations" should
> >>>>> pair
> >>>>>>>> well with Gremlin:
> >>>>>>>>
> >>>>>>>> https://flink.apache.org/features.html
> >>>>>>>>
> >>>>>>>> Has anyone dug into this yet?
> >>>>>>>>
> >>>>>>>> - James
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> James Thornton, *http://electricspeed.com <
> >> http://electricspeed.com
> >>>>>> *
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> James Thornton, *http://electricspeed.com <http://electricspeed.com
> >>> *
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Gremlin on Flink & Gelly?

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi Vasia,

> Flink doesn't have a graph query language yet, so Gremlin support would be
> a really nice contribution.
> I have read the blog post and also the Gremlin paper. There are some really
> great ideas in there!

Great. Glad you are excited about Gremlin.

> I'm currently quite busy with several projects, so I don't see myself
> working on a FlinkGraphComputer soon. If someone from the TinkerPop
> community would like to take this on, I (and the rest of the Flink
> community) would of course be more than happy to provide feedback and help
> with Flink-related issues. Otherwise, I'll get back to you once my load
> levels decrease a bit :)

In the past, TinkerPop use to be a "dumping ground" for all implementations, but we decided for TinkerPop3 that we would only have "reference implementations" so users can play, system providers can learn, and ultimately, system providers would provide TinkerPop support in their distribution. As such, we would like to have FlinkGraphComputer distributed with Flink. If that sounds like something your project would be comfortable with, I think we can provide a JIRA/PR for FlinkGraphComputer (as well as any necessary documentation). We can start with a JIRA ticket to get things going. Thoughts?

Besides some I/O stuff (InputFormats, RDDs, etc.), this is the beef of the SparkGraphComputer implementation:
	https://github.com/apache/incubator-tinkerpop/tree/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer

> Keep up the great work!

Thanks, you too. 

Marko.

http://markorodriguez.com



> 
> On 4 December 2015 at 11:28, James Thornton <ja...@jamesthornton.com> wrote:
> 
>> *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache *
>> *Flink*
>> 
>> https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw>
>> youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw
>> <https://youtu.be/-tFzG2dzJXw>
>> On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <ok...@gmail.com> wrote:
>> 
>>> Hi Vasia (everyone),
>>> 
>>> Does Flink have a graph query language? If not, then with a
>>> FlinkGraphComputer implementation, Flink could ship with Gremlin support.
>>> 
>>> If you have the time, please read the following blog post as it will help
>>> explain our approach and how Flink could benefit from it:
>>> 
>>> 
>> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
>>> 
>>> In short, if Flink provides a FlinkGraphComputer implementation, then the
>>> Gremlin virtual machine will work over Flink and any language that
>> compiles
>>> to the Gremlin virtual machine will thus work over Flink.
>>> 
>>> If you would like to see a demo of TinkerPop with, for example Spark or
>>> Giraph, I'd be more than happy to do a Google Hangout session with you
>> (< 1
>>> hour) so you can better understand the breadth of the work we are doing
>> and
>>> how it can benefit your efforts.
>>> 
>>> Thanks Vasia,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> On Nov 27, 2015, at 5:27 AM, Stephen Mallette <sp...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Vasia, I had started tinkering on it in my spare time in a separate
>>>> repo.  There really isn't much to collaborate on at this point.  I was
>>>> mostly trying to understand the parallels between Flink and Spark so
>>> that I
>>>> could understand how a FlinkGraphComputer could be implemented given
>> what
>>>> I'd seen of the Spark implementation Marko did.  I had expected to
>>>> contribute the work to Flink (rather than keep it here on the TinkerPop
>>>> side).  Anyway, not much else to offer - Marko can probably get you
>>> running
>>>> much faster than I can, as that area is where he holds the most
>>> expertise.
>>>> You should probably keep an eye out for his comments.
>>>> 
>>>> 
>>>> 
>>>> On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <va...@apache.org>
>>> wrote:
>>>> 
>>>>> Hi James and TinkerPop community,
>>>>> 
>>>>> thanks a lot for starting this discussion!
>>>>> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet
>> you
>>> ;)
>>>>> 
>>>>> I'm only starting to get familiar with the TinkerPop project, but it
>>> seems
>>>>> that it can play well with Flink.
>>>>> As you already noticed, a FlinkGraphComputer should be
>> straight-forward
>>> to
>>>>> implement. Gelly has a vertex-centric API that is similar to the
>>>>> scatter-gather model [1] and a gather-sum-apply API [2] that is closer
>>> to
>>>>> the Powergraph model. These are built on top of Flink's delta
>> iteration
>>>>> operators, which are more generic and could also be used directly for
>>> the
>>>>> FlinkGraphComputer, if the existing Gelly abstractions won't work.
>>>>> 
>>>>> Regarding the difference between stream and batch in Flink. Flink is a
>>>>> streaming dataflow engine, on top of which you can run both streaming
>>> and
>>>>> batch jobs. A batch job is simply seen by Flink as a job operating on
>> a
>>>>> finite stream. Respectively, Flink has a stream and a batch API. Gelly
>>> is
>>>>> currently built on top of the batch API, i.e. the DataSet API.
>>>>> 
>>>>> James mentioned in the Flink mailing list that someone has already
>>> started
>>>>> working on a FlinkGraphComputer. Is there a JIRA for this? Let me know
>>> if
>>>>> you have questions or you think I can help in some way!
>>>>> 
>>>>> Cheers,
>>>>> -Vasia.
>>>>> 
>>>>> [1]:
>>>>> 
>>>>> 
>>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
>>>>> [2]:
>>>>> 
>>>>> 
>>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations
>>>>> [3]:
>>>>> 
>>>>> 
>>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator
>>>>> 
>>>>> On 25 November 2015 at 17:05, James Thornton <james@jamesthornton.com
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi Vasia -
>>>>>> 
>>>>>> Welcome to TinkerPop (linking you into the Flink thread as
>>> requested)...
>>>>>> 
>>>>>> - James
>>>>>> 
>>>>>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <
>>> okrammarko@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi James,
>>>>>>> 
>>>>>>> Thank you for always having a ear to the tech pulse. If it wasn't
>> for
>>>>>> you,
>>>>>>> I would still be excited about XMPP and would be programming in
>>> Tcl/Tk.
>>>>>>> 
>>>>>>> Given my 20 minute review of their docs …… It would be cool if like
>>> the
>>>>>>> "Table API," they also had a "Graph API" that was just TinkerPop
>>>>>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple
>> step
>>>>> --
>>>>>>> they already have a "vertex-centric" API and thus, having a
>>>>>>> FlinkGraphComputer implementation seems "easy." Then from there,
>>>>> Gremlin
>>>>>>> should just work. I don't really understand the difference between
>>>>> steam
>>>>>>> and batch unless they are talking the difference between "Storm" and
>>>>>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the
>>>>>>> stream-scene.
>>>>>>> 
>>>>>>> Next, their fluent API is similar to Spark's and I would argue that
>>>>>>> Gremlin's API is much nicer than just low-level primitives like
>> map(),
>>>>>>> flatMap(), etc. Thus, they could really benefit from having a full
>>>>> graph
>>>>>>> query language already available for their users. (As a side note,
>> its
>>>>>>> really nice to see more and more systems use functional/fluent APIs
>> as
>>>>>> this
>>>>>>> really trains the next generation to think like this which is
>>> important
>>>>>> as
>>>>>>> Gremlin is purely this! Hopefully the SQL model of querying starts
>> to
>>>>>> look
>>>>>>> odd to people in comparison.)
>>>>>>> 
>>>>>>> I just sent out this tweet:
>>>>>>> 
>> https://twitter.com/apachetinkerpop/status/668820458599530497
>>>>>>> 
>>>>>>> If they seem positive, I can detail in JIRA what would be required
>> for
>>>>>>> them to have TinkerPop-support.
>>>>>>> 
>>>>>>> Thanks again James,
>>>>>>> Marko.
>>>>>>> 
>>>>>>> http://markorodriguez.com
>>>>>>> 
>>>>>>> On Nov 19, 2015, at 12:19 PM, James Thornton <
>> james@jamesthornton.com
>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi -
>>>>>>>> 
>>>>>>>> Apache Flink has a graph API named Gelly...
>>>>>>>> 
>>>>>>>> 
>>>>> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
>>>>>>>> 
>>>>>>>> ...and Flink's "dedicated support for iterative operations" should
>>>>> pair
>>>>>>>> well with Gremlin:
>>>>>>>> 
>>>>>>>> https://flink.apache.org/features.html
>>>>>>>> 
>>>>>>>> Has anyone dug into this yet?
>>>>>>>> 
>>>>>>>> - James
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> James Thornton, *http://electricspeed.com <
>> http://electricspeed.com
>>>>>> *
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> James Thornton, *http://electricspeed.com <http://electricspeed.com
>>> *
>>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: Gremlin on Flink & Gelly?

Posted by Vasiliki Kalavri <va...@apache.org>.
Hi all,

thank you for your replies and sorry for the long silence.

Flink doesn't have a graph query language yet, so Gremlin support would be
a really nice contribution.
I have read the blog post and also the Gremlin paper. There are some really
great ideas in there!

I'm currently quite busy with several projects, so I don't see myself
working on a FlinkGraphComputer soon. If someone from the TinkerPop
community would like to take this on, I (and the rest of the Flink
community) would of course be more than happy to provide feedback and help
with Flink-related issues. Otherwise, I'll get back to you once my load
levels decrease a bit :)

Keep up the great work!

Best,
-Vasia.

On 4 December 2015 at 11:28, James Thornton <ja...@jamesthornton.com> wrote:

> *Vasia* *Kalavri**: **Gelly**: Large-scale graph analysis with Apache *
> *Flink*
>
> https <https://youtu.be/-tFzG2dzJXw>:// <https://youtu.be/-tFzG2dzJXw>
> youtu.be <https://youtu.be/-tFzG2dzJXw>/-tFzG2dzJXw
> <https://youtu.be/-tFzG2dzJXw>
> On Nov 30, 2015 12:49 PM, "Marko Rodriguez" <ok...@gmail.com> wrote:
>
> > Hi Vasia (everyone),
> >
> > Does Flink have a graph query language? If not, then with a
> > FlinkGraphComputer implementation, Flink could ship with Gremlin support.
> >
> > If you have the time, please read the following blog post as it will help
> > explain our approach and how Flink could benefit from it:
> >
> >
> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
> >
> > In short, if Flink provides a FlinkGraphComputer implementation, then the
> > Gremlin virtual machine will work over Flink and any language that
> compiles
> > to the Gremlin virtual machine will thus work over Flink.
> >
> > If you would like to see a demo of TinkerPop with, for example Spark or
> > Giraph, I'd be more than happy to do a Google Hangout session with you
> (< 1
> > hour) so you can better understand the breadth of the work we are doing
> and
> > how it can benefit your efforts.
> >
> > Thanks Vasia,
> > Marko.
> >
> > http://markorodriguez.com
> >
> > On Nov 27, 2015, at 5:27 AM, Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > Hi Vasia, I had started tinkering on it in my spare time in a separate
> > > repo.  There really isn't much to collaborate on at this point.  I was
> > > mostly trying to understand the parallels between Flink and Spark so
> > that I
> > > could understand how a FlinkGraphComputer could be implemented given
> what
> > > I'd seen of the Spark implementation Marko did.  I had expected to
> > > contribute the work to Flink (rather than keep it here on the TinkerPop
> > > side).  Anyway, not much else to offer - Marko can probably get you
> > running
> > > much faster than I can, as that area is where he holds the most
> > expertise.
> > > You should probably keep an eye out for his comments.
> > >
> > >
> > >
> > > On Wed, Nov 25, 2015 at 11:38 AM, Vasiliki Kalavri <va...@apache.org>
> > wrote:
> > >
> > >> Hi James and TinkerPop community,
> > >>
> > >> thanks a lot for starting this discussion!
> > >> I am Vasia, Apache Flink PMC and core Gelly developer. Nice to meet
> you
> > ;)
> > >>
> > >> I'm only starting to get familiar with the TinkerPop project, but it
> > seems
> > >> that it can play well with Flink.
> > >> As you already noticed, a FlinkGraphComputer should be
> straight-forward
> > to
> > >> implement. Gelly has a vertex-centric API that is similar to the
> > >> scatter-gather model [1] and a gather-sum-apply API [2] that is closer
> > to
> > >> the Powergraph model. These are built on top of Flink's delta
> iteration
> > >> operators, which are more generic and could also be used directly for
> > the
> > >> FlinkGraphComputer, if the existing Gelly abstractions won't work.
> > >>
> > >> Regarding the difference between stream and batch in Flink. Flink is a
> > >> streaming dataflow engine, on top of which you can run both streaming
> > and
> > >> batch jobs. A batch job is simply seen by Flink as a job operating on
> a
> > >> finite stream. Respectively, Flink has a stream and a batch API. Gelly
> > is
> > >> currently built on top of the batch API, i.e. the DataSet API.
> > >>
> > >> James mentioned in the Flink mailing list that someone has already
> > started
> > >> working on a FlinkGraphComputer. Is there a JIRA for this? Let me know
> > if
> > >> you have questions or you think I can help in some way!
> > >>
> > >> Cheers,
> > >> -Vasia.
> > >>
> > >> [1]:
> > >>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations
> > >> [2]:
> > >>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#gather-sum-apply-iterations
> > >> [3]:
> > >>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/apis/iterations.html#delta-iterate-operator
> > >>
> > >> On 25 November 2015 at 17:05, James Thornton <james@jamesthornton.com
> >
> > >> wrote:
> > >>
> > >>> Hi Vasia -
> > >>>
> > >>> Welcome to TinkerPop (linking you into the Flink thread as
> > requested)...
> > >>>
> > >>> - James
> > >>>
> > >>> On Mon, Nov 23, 2015 at 10:01 AM, Marko Rodriguez <
> > okrammarko@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hi James,
> > >>>>
> > >>>> Thank you for always having a ear to the tech pulse. If it wasn't
> for
> > >>> you,
> > >>>> I would still be excited about XMPP and would be programming in
> > Tcl/Tk.
> > >>>>
> > >>>> Given my 20 minute review of their docs …… It would be cool if like
> > the
> > >>>> "Table API," they also had a "Graph API" that was just TinkerPop
> > >>>> Graph/Vertex/Edge. That could be super intrusive, so as a simple
> step
> > >> --
> > >>>> they already have a "vertex-centric" API and thus, having a
> > >>>> FlinkGraphComputer implementation seems "easy." Then from there,
> > >> Gremlin
> > >>>> should just work. I don't really understand the difference between
> > >> steam
> > >>>> and batch unless they are talking the difference between "Storm" and
> > >>>> "MapReduce." ? Would be cool to see how TinkerPop fits into the
> > >>>> stream-scene.
> > >>>>
> > >>>> Next, their fluent API is similar to Spark's and I would argue that
> > >>>> Gremlin's API is much nicer than just low-level primitives like
> map(),
> > >>>> flatMap(), etc. Thus, they could really benefit from having a full
> > >> graph
> > >>>> query language already available for their users. (As a side note,
> its
> > >>>> really nice to see more and more systems use functional/fluent APIs
> as
> > >>> this
> > >>>> really trains the next generation to think like this which is
> > important
> > >>> as
> > >>>> Gremlin is purely this! Hopefully the SQL model of querying starts
> to
> > >>> look
> > >>>> odd to people in comparison.)
> > >>>>
> > >>>> I just sent out this tweet:
> > >>>>
> https://twitter.com/apachetinkerpop/status/668820458599530497
> > >>>>
> > >>>> If they seem positive, I can detail in JIRA what would be required
> for
> > >>>> them to have TinkerPop-support.
> > >>>>
> > >>>> Thanks again James,
> > >>>> Marko.
> > >>>>
> > >>>> http://markorodriguez.com
> > >>>>
> > >>>> On Nov 19, 2015, at 12:19 PM, James Thornton <
> james@jamesthornton.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi -
> > >>>>>
> > >>>>> Apache Flink has a graph API named Gelly...
> > >>>>>
> > >>>>>
> > >> https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
> > >>>>>
> > >>>>> ...and Flink's "dedicated support for iterative operations" should
> > >> pair
> > >>>>> well with Gremlin:
> > >>>>>
> > >>>>> https://flink.apache.org/features.html
> > >>>>>
> > >>>>> Has anyone dug into this yet?
> > >>>>>
> > >>>>> - James
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> James Thornton, *http://electricspeed.com <
> http://electricspeed.com
> > >>> *
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> James Thornton, *http://electricspeed.com <http://electricspeed.com
> >*
> > >>>
> > >>
> >
> >
>