You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by enzo <en...@smartinsightsfromdata.com> on 2017/03/09 17:42:54 UTC

Question on Spark's graph libraries

I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark.

I understand that we have had for some time two libraries (the following is my understanding - please amend as appropriate!):

. GraphX, part of Spark project.  This library is based on RDD and it is only accessible via Scala.  It doesn’t look that this library has been enhanced recently.
. GraphFrames, independent (at the moment?) library for Spark.  This library is based on Spark DataFrames and accessible by Scala & Python. Last commit on GitHub was 2 months ago.

GraphFrames cam about with the promise at some point to be integrated in Apache Spark.

I can see other projects coming up with interesting libraries and ideas (e.g. Graphulo on Accumulo, a new project with the goal of implementing the GraphBlas building blocks for graph algorithms on top of Accumulo).

Where is Apache Spark going?

Where are graph libraries in the roadmap?



Thanks for any clarity brought to this matter.

Enzo

Re: Question on Spark's graph libraries roadmap

Posted by Joseph Bradley <jo...@databricks.com>.
I'll try to answer some questions which I do not see answered above.

> GraphFrame is just a Graph Analytics/Query Engine, not a Graph Engine
which GraphX used to be.

GraphFrames supports all of the algorithms which GraphX supports.  We've
also taken first steps towards providing primitives for people to implement
their own.  The main blocker here is what Tim mentioned above: some
improvements in SQL/Catalyst will be needed in order to make it even easier
to write scalable implementations of iterative algorithms for graphs.  I'd
recommend checking out Ankur's talk which Tim linked; it more overview of
the library's capabilities.

> future for GraphFrames

Same as Tim: It's still active, though of course less so than Spark.
 (Thanks Felix and others for many PRs.)  Some further improvements are
needed to make a 1.x release, such as the improvements Tim mentioned +
figuring out our take on DataFrames vs Datasets for graphs.

If people have specific feedback on GraphFrames, it'd be great to hear it
on the Github issues there.

More generally, I do hope GraphFrames is a vision for the future of graph
analytics on Spark.  And once there have been sufficient improvements +
community discussion, we'd like to propose merging GraphFrames into Spark
itself.  That will take time, but more community feedback would be great to
accelerate the process.

On Tue, Mar 14, 2017 at 12:45 AM, Andy <an...@gmail.com> wrote:

> GraphFrame is just a Graph Analytics/Query Engine, not a Graph Engine
> which GraphX used to be.
>
> And I'm sorry to say, it doesn’t fit most scenarioes at all in fact.
>
> Enzo, I don’t think there is any roadmap of Graph libraries for Spark for
> now.
>
> *Andy*
>
>
> On Tue, Mar 14, 2017 at 7:28 AM, Tim Hunter <ti...@databricks.com>
> wrote:
>
>> Hello Enzo,
>>
>> since this question is also relevant to Spark, I will answer it here. The
>> goal of GraphFrames is to provide graph capabilities along with excellent
>> integration to the rest of the Spark ecosystem (using modern APIs such as
>> DataFrames). As you seem to be well aware, a large number of graph
>> algorithms can be implemented in terms of a small subset of graph
>> primitives. These graph primitives can be translated to Spark operations,
>> but we feel that some important low-level optimizations should be added to
>> the Catalyst engine in order to realize the true potential of GraphFrames.
>> You can find a flavor of this work in this presentation of Ankur Dave [1].
>> This is still an area of collaboration with the Spark core team, and we
>> would like to merge GraphFrames in Spark 2.x eventually.
>>
>> Where does it leave us for the time being? GraphFrames is actively
>> supported, and we implemented a highly scalable version of GraphFrames in
>> November. As you mentioned, there are a number of distributed Graph
>> frameworks out there, but to my knowledge they are not as easy to integrate
>> with Spark. The current approach has been to reach parity with GraphX first
>> and then add new algorithms based on popular demand. Along these lines,
>> GraphBLAS could be added on top of it if someone is willing to step up.
>>
>> Tim
>>
>> [1] https://spark-summit.org/east-2016/events/graphframes-gr
>> aph-queries-in-spark-sql/
>>
>> On Mon, Mar 13, 2017 at 2:58 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> Since GraphFrames is not part of the Spark project, your
>>> GraphFrames-specific questions are probably better directed at the
>>> GraphFrames issue tracker:
>>>
>>> https://github.com/graphframes/graphframes/issues
>>>
>>> As far as I know, GraphFrames is an active project, though not as active
>>> as Spark of course. There will be lulls in development since the people
>>> driving that project forward also have major commitments to other projects.
>>> This is natural.
>>>
>>> If you post on GitHub I would wager somewhere there (maybe Joseph or Tim
>>> <https://github.com/graphframes/graphframes/graphs/contributors>?)
>>> should be able to answer your questions about GraphFrames.
>>>
>>>
>>>    1. The page you linked refers to a *plan* to move GraphFrames to the
>>>    standard Spark release cycle. Is this *plan* publicly available /
>>>    visible?
>>>
>>> I didn’t see any such reference to a plan in the page I linked you to.
>>> Rather, the page says
>>> <http://graphframes.github.io/#what-are-graphframes>:
>>>
>>> The current plan is to keep GraphFrames separate from core Apache Spark
>>> for the time being.
>>>
>>> Nick
>>> ​
>>>
>>> On Mon, Mar 13, 2017 at 5:46 PM enzo <en...@smartinsightsfromdata.com>
>>> wrote:
>>>
>>>> Nick
>>>>
>>>> Thanks for the quick answer :)
>>>>
>>>> Sadly, the comment in the page doesn’t answer my questions. More
>>>> specifically:
>>>>
>>>> 1. GraphFrames last activity in github was 2 months ago.  Last release
>>>> on 12 Nov 2016.  Till recently 2 month was close to a Spark release
>>>> cycle.  Why there has been no major development since mid November?
>>>>
>>>> 2. The page you linked refers to a *plan* to move GraphFrames to the
>>>> standard Spark release cycle.  Is this *plan* publicly available / visible?
>>>>
>>>> 3. I couldn’t find any statement of intent to preserve either one or
>>>> the other APIs, or just merge them: in other words, there seem to be no
>>>> overarching plan for a cohesive & comprehensive graph API (I apologise in
>>>> advance if I’m wrong).
>>>>
>>>> 4. I was initially impressed by GraphFrames syntax in places similar to
>>>> Neo4J Cypher (now open source), but later I understood was an incomplete
>>>> lightweight experiment (with no intention to move to full compatibility,
>>>> perhaps for good reasons).  To me it sort of gave the wrong message.
>>>>
>>>> 5. In the mean time the world of graphs is changing. GraphBlas forum
>>>> seems to make some traction: a library based on GraphBlas has been made
>>>> available on Accumulo (Graphulo).  Assuming that Spark is NOT going to
>>>> adopt similar lines, nor to follow Datastax with tinkertop and Gremlin,
>>>> again, what is the new,  cohesive & comprehensive API that Spark is going
>>>> to deliver?
>>>>
>>>>
>>>> Sadly, the API uncertainty may force developers to more stable kind of
>>>> API / platforms & roadmaps.
>>>>
>>>>
>>>>
>>>> Thanks Enzo
>>>>
>>>> On 13 Mar 2017, at 22:09, Nicholas Chammas <ni...@gmail.com>
>>>> wrote:
>>>>
>>>> Your question is answered here under "Will GraphFrames be part of
>>>> Apache Spark?", no?
>>>>
>>>> http://graphframes.github.io/#what-are-graphframes
>>>>
>>>> Nick
>>>>
>>>> On Mon, Mar 13, 2017 at 4:56 PM enzo <en...@smartinsightsfromdata.com>
>>>> wrote:
>>>>
>>>> Please see this email  trail:  no answer so far on the user@spark
>>>> board.  Trying the developer board for better luck
>>>>
>>>> The question:
>>>>
>>>> I am a bit confused by the current roadmap for graph and graph
>>>> analytics in Apache Spark.
>>>>
>>>> I understand that we have had for some time two libraries (the
>>>> following is my understanding - please amend as appropriate!):
>>>>
>>>> . GraphX, part of Spark project.  This library is based on RDD and it
>>>> is only accessible via Scala.  It doesn’t look that this library has been
>>>> enhanced recently.
>>>> . GraphFrames, independent (at the moment?) library for Spark.  This
>>>> library is based on Spark DataFrames and accessible by Scala & Python. Last
>>>> commit on GitHub was 2 months ago.
>>>>
>>>> GraphFrames cam about with the promise at some point to be integrated
>>>> in Apache Spark.
>>>>
>>>> I can see other projects coming up with interesting libraries and ideas
>>>> (e.g. Graphulo on Accumulo, a new project with the goal of
>>>> implementing the GraphBlas building blocks for graph algorithms on top
>>>> of Accumulo).
>>>>
>>>> Where is Apache Spark going?
>>>>
>>>> Where are graph libraries in the roadmap?
>>>>
>>>>
>>>>
>>>> Thanks for any clarity brought to this matter.
>>>>
>>>> Thanks Enzo
>>>>
>>>> Begin forwarded message:
>>>>
>>>> *From: *"Md. Rezaul Karim" <re...@insight-centre.org>
>>>> *Subject: **Re: Question on Spark's graph libraries*
>>>> *Date: *10 March 2017 at 13:13:15 CET
>>>> *To: *Robin East <ro...@xense.co.uk>
>>>> *Cc: *enzo <en...@smartinsightsfromdata.com>, spark users <
>>>> user@spark.apache.org>
>>>>
>>>> +1
>>>>
>>>> Regards,
>>>> _________________________________
>>>> *Md. Rezaul Karim*, BSc, MSc
>>>> PhD Researcher, INSIGHT Centre for Data Analytics
>>>> National University of Ireland, Galway
>>>> IDA Business Park, Dangan, Galway, Ireland
>>>> Web: http://www.reza-analytics.eu/index.html
>>>> <http://139.59.184.114/index.html>
>>>>
>>>> On 10 March 2017 at 12:10, Robin East <ro...@xense.co.uk> wrote:
>>>>
>>>> I would love to know the answer to that too.
>>>> ------------------------------------------------------------
>>>> -------------------
>>>> Robin East
>>>> *Spark GraphX in Action* Michael Malak and Robin East
>>>> Manning Publications Co.
>>>> http://www.manning.com/books/spark-graphx-in-action
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 9 Mar 2017, at 17:42, enzo <en...@smartinsightsfromdata.com> wrote:
>>>>
>>>> I am a bit confused by the current roadmap for graph and graph
>>>> analytics in Apache Spark.
>>>>
>>>> I understand that we have had for some time two libraries (the
>>>> following is my understanding - please amend as appropriate!):
>>>>
>>>> . GraphX, part of Spark project.  This library is based on RDD and it
>>>> is only accessible via Scala.  It doesn’t look that this library has been
>>>> enhanced recently.
>>>> . GraphFrames, independent (at the moment?) library for Spark.  This
>>>> library is based on Spark DataFrames and accessible by Scala & Python. Last
>>>> commit on GitHub was 2 months ago.
>>>>
>>>> GraphFrames cam about with the promise at some point to be integrated
>>>> in Apache Spark.
>>>>
>>>> I can see other projects coming up with interesting libraries and ideas
>>>> (e.g. Graphulo on Accumulo, a new project with the goal of
>>>> implementing the GraphBlas building blocks for graph algorithms on top
>>>> of Accumulo).
>>>>
>>>> Where is Apache Spark going?
>>>>
>>>> Where are graph libraries in the roadmap?
>>>>
>>>>
>>>>
>>>> Thanks for any clarity brought to this matter.
>>>>
>>>> Enzo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] <http://databricks.com/>

Re: Question on Spark's graph libraries roadmap

Posted by Andy <an...@gmail.com>.
GraphFrame is just a Graph Analytics/Query Engine, not a Graph Engine which
GraphX used to be.

And I'm sorry to say, it doesn’t fit most scenarioes at all in fact.

Enzo, I don’t think there is any roadmap of Graph libraries for Spark for
now.

*Andy*


On Tue, Mar 14, 2017 at 7:28 AM, Tim Hunter <ti...@databricks.com>
wrote:

> Hello Enzo,
>
> since this question is also relevant to Spark, I will answer it here. The
> goal of GraphFrames is to provide graph capabilities along with excellent
> integration to the rest of the Spark ecosystem (using modern APIs such as
> DataFrames). As you seem to be well aware, a large number of graph
> algorithms can be implemented in terms of a small subset of graph
> primitives. These graph primitives can be translated to Spark operations,
> but we feel that some important low-level optimizations should be added to
> the Catalyst engine in order to realize the true potential of GraphFrames.
> You can find a flavor of this work in this presentation of Ankur Dave [1].
> This is still an area of collaboration with the Spark core team, and we
> would like to merge GraphFrames in Spark 2.x eventually.
>
> Where does it leave us for the time being? GraphFrames is actively
> supported, and we implemented a highly scalable version of GraphFrames in
> November. As you mentioned, there are a number of distributed Graph
> frameworks out there, but to my knowledge they are not as easy to integrate
> with Spark. The current approach has been to reach parity with GraphX first
> and then add new algorithms based on popular demand. Along these lines,
> GraphBLAS could be added on top of it if someone is willing to step up.
>
> Tim
>
> [1] https://spark-summit.org/east-2016/events/graphframes-
> graph-queries-in-spark-sql/
>
> On Mon, Mar 13, 2017 at 2:58 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Since GraphFrames is not part of the Spark project, your
>> GraphFrames-specific questions are probably better directed at the
>> GraphFrames issue tracker:
>>
>> https://github.com/graphframes/graphframes/issues
>>
>> As far as I know, GraphFrames is an active project, though not as active
>> as Spark of course. There will be lulls in development since the people
>> driving that project forward also have major commitments to other projects.
>> This is natural.
>>
>> If you post on GitHub I would wager somewhere there (maybe Joseph or Tim
>> <https://github.com/graphframes/graphframes/graphs/contributors>?)
>> should be able to answer your questions about GraphFrames.
>>
>>
>>    1. The page you linked refers to a *plan* to move GraphFrames to the
>>    standard Spark release cycle. Is this *plan* publicly available /
>>    visible?
>>
>> I didn’t see any such reference to a plan in the page I linked you to.
>> Rather, the page says
>> <http://graphframes.github.io/#what-are-graphframes>:
>>
>> The current plan is to keep GraphFrames separate from core Apache Spark
>> for the time being.
>>
>> Nick
>> ​
>>
>> On Mon, Mar 13, 2017 at 5:46 PM enzo <en...@smartinsightsfromdata.com>
>> wrote:
>>
>>> Nick
>>>
>>> Thanks for the quick answer :)
>>>
>>> Sadly, the comment in the page doesn’t answer my questions. More
>>> specifically:
>>>
>>> 1. GraphFrames last activity in github was 2 months ago.  Last release
>>> on 12 Nov 2016.  Till recently 2 month was close to a Spark release
>>> cycle.  Why there has been no major development since mid November?
>>>
>>> 2. The page you linked refers to a *plan* to move GraphFrames to the
>>> standard Spark release cycle.  Is this *plan* publicly available / visible?
>>>
>>> 3. I couldn’t find any statement of intent to preserve either one or the
>>> other APIs, or just merge them: in other words, there seem to be no
>>> overarching plan for a cohesive & comprehensive graph API (I apologise in
>>> advance if I’m wrong).
>>>
>>> 4. I was initially impressed by GraphFrames syntax in places similar to
>>> Neo4J Cypher (now open source), but later I understood was an incomplete
>>> lightweight experiment (with no intention to move to full compatibility,
>>> perhaps for good reasons).  To me it sort of gave the wrong message.
>>>
>>> 5. In the mean time the world of graphs is changing. GraphBlas forum
>>> seems to make some traction: a library based on GraphBlas has been made
>>> available on Accumulo (Graphulo).  Assuming that Spark is NOT going to
>>> adopt similar lines, nor to follow Datastax with tinkertop and Gremlin,
>>> again, what is the new,  cohesive & comprehensive API that Spark is going
>>> to deliver?
>>>
>>>
>>> Sadly, the API uncertainty may force developers to more stable kind of
>>> API / platforms & roadmaps.
>>>
>>>
>>>
>>> Thanks Enzo
>>>
>>> On 13 Mar 2017, at 22:09, Nicholas Chammas <ni...@gmail.com>
>>> wrote:
>>>
>>> Your question is answered here under "Will GraphFrames be part of Apache
>>> Spark?", no?
>>>
>>> http://graphframes.github.io/#what-are-graphframes
>>>
>>> Nick
>>>
>>> On Mon, Mar 13, 2017 at 4:56 PM enzo <en...@smartinsightsfromdata.com>
>>> wrote:
>>>
>>> Please see this email  trail:  no answer so far on the user@spark
>>> board.  Trying the developer board for better luck
>>>
>>> The question:
>>>
>>> I am a bit confused by the current roadmap for graph and graph analytics
>>> in Apache Spark.
>>>
>>> I understand that we have had for some time two libraries (the following
>>> is my understanding - please amend as appropriate!):
>>>
>>> . GraphX, part of Spark project.  This library is based on RDD and it is
>>> only accessible via Scala.  It doesn’t look that this library has been
>>> enhanced recently.
>>> . GraphFrames, independent (at the moment?) library for Spark.  This
>>> library is based on Spark DataFrames and accessible by Scala & Python. Last
>>> commit on GitHub was 2 months ago.
>>>
>>> GraphFrames cam about with the promise at some point to be integrated in
>>> Apache Spark.
>>>
>>> I can see other projects coming up with interesting libraries and ideas
>>> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
>>> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>>>
>>> Where is Apache Spark going?
>>>
>>> Where are graph libraries in the roadmap?
>>>
>>>
>>>
>>> Thanks for any clarity brought to this matter.
>>>
>>> Thanks Enzo
>>>
>>> Begin forwarded message:
>>>
>>> *From: *"Md. Rezaul Karim" <re...@insight-centre.org>
>>> *Subject: **Re: Question on Spark's graph libraries*
>>> *Date: *10 March 2017 at 13:13:15 CET
>>> *To: *Robin East <ro...@xense.co.uk>
>>> *Cc: *enzo <en...@smartinsightsfromdata.com>, spark users <
>>> user@spark.apache.org>
>>>
>>> +1
>>>
>>> Regards,
>>> _________________________________
>>> *Md. Rezaul Karim*, BSc, MSc
>>> PhD Researcher, INSIGHT Centre for Data Analytics
>>> National University of Ireland, Galway
>>> IDA Business Park, Dangan, Galway, Ireland
>>> Web: http://www.reza-analytics.eu/index.html
>>> <http://139.59.184.114/index.html>
>>>
>>> On 10 March 2017 at 12:10, Robin East <ro...@xense.co.uk> wrote:
>>>
>>> I would love to know the answer to that too.
>>> ------------------------------------------------------------
>>> -------------------
>>> Robin East
>>> *Spark GraphX in Action* Michael Malak and Robin East
>>> Manning Publications Co.
>>> http://www.manning.com/books/spark-graphx-in-action
>>>
>>>
>>>
>>>
>>>
>>> On 9 Mar 2017, at 17:42, enzo <en...@smartinsightsfromdata.com> wrote:
>>>
>>> I am a bit confused by the current roadmap for graph and graph analytics
>>> in Apache Spark.
>>>
>>> I understand that we have had for some time two libraries (the following
>>> is my understanding - please amend as appropriate!):
>>>
>>> . GraphX, part of Spark project.  This library is based on RDD and it is
>>> only accessible via Scala.  It doesn’t look that this library has been
>>> enhanced recently.
>>> . GraphFrames, independent (at the moment?) library for Spark.  This
>>> library is based on Spark DataFrames and accessible by Scala & Python. Last
>>> commit on GitHub was 2 months ago.
>>>
>>> GraphFrames cam about with the promise at some point to be integrated in
>>> Apache Spark.
>>>
>>> I can see other projects coming up with interesting libraries and ideas
>>> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
>>> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>>>
>>> Where is Apache Spark going?
>>>
>>> Where are graph libraries in the roadmap?
>>>
>>>
>>>
>>> Thanks for any clarity brought to this matter.
>>>
>>> Enzo
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Question on Spark's graph libraries roadmap

Posted by Tim Hunter <ti...@databricks.com>.
Hello Enzo,

since this question is also relevant to Spark, I will answer it here. The
goal of GraphFrames is to provide graph capabilities along with excellent
integration to the rest of the Spark ecosystem (using modern APIs such as
DataFrames). As you seem to be well aware, a large number of graph
algorithms can be implemented in terms of a small subset of graph
primitives. These graph primitives can be translated to Spark operations,
but we feel that some important low-level optimizations should be added to
the Catalyst engine in order to realize the true potential of GraphFrames.
You can find a flavor of this work in this presentation of Ankur Dave [1].
This is still an area of collaboration with the Spark core team, and we
would like to merge GraphFrames in Spark 2.x eventually.

Where does it leave us for the time being? GraphFrames is actively
supported, and we implemented a highly scalable version of GraphFrames in
November. As you mentioned, there are a number of distributed Graph
frameworks out there, but to my knowledge they are not as easy to integrate
with Spark. The current approach has been to reach parity with GraphX first
and then add new algorithms based on popular demand. Along these lines,
GraphBLAS could be added on top of it if someone is willing to step up.

Tim

[1]
https://spark-summit.org/east-2016/events/graphframes-graph-queries-in-spark-sql/

On Mon, Mar 13, 2017 at 2:58 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Since GraphFrames is not part of the Spark project, your
> GraphFrames-specific questions are probably better directed at the
> GraphFrames issue tracker:
>
> https://github.com/graphframes/graphframes/issues
>
> As far as I know, GraphFrames is an active project, though not as active
> as Spark of course. There will be lulls in development since the people
> driving that project forward also have major commitments to other projects.
> This is natural.
>
> If you post on GitHub I would wager somewhere there (maybe Joseph or Tim
> <https://github.com/graphframes/graphframes/graphs/contributors>?) should
> be able to answer your questions about GraphFrames.
>
>
>    1. The page you linked refers to a *plan* to move GraphFrames to the
>    standard Spark release cycle. Is this *plan* publicly available /
>    visible?
>
> I didn’t see any such reference to a plan in the page I linked you to.
> Rather, the page says <http://graphframes.github.io/#what-are-graphframes>
> :
>
> The current plan is to keep GraphFrames separate from core Apache Spark
> for the time being.
>
> Nick
> ​
>
> On Mon, Mar 13, 2017 at 5:46 PM enzo <en...@smartinsightsfromdata.com>
> wrote:
>
>> Nick
>>
>> Thanks for the quick answer :)
>>
>> Sadly, the comment in the page doesn’t answer my questions. More
>> specifically:
>>
>> 1. GraphFrames last activity in github was 2 months ago.  Last release on 12
>> Nov 2016.  Till recently 2 month was close to a Spark release cycle.
>> Why there has been no major development since mid November?
>>
>> 2. The page you linked refers to a *plan* to move GraphFrames to the
>> standard Spark release cycle.  Is this *plan* publicly available / visible?
>>
>> 3. I couldn’t find any statement of intent to preserve either one or the
>> other APIs, or just merge them: in other words, there seem to be no
>> overarching plan for a cohesive & comprehensive graph API (I apologise in
>> advance if I’m wrong).
>>
>> 4. I was initially impressed by GraphFrames syntax in places similar to
>> Neo4J Cypher (now open source), but later I understood was an incomplete
>> lightweight experiment (with no intention to move to full compatibility,
>> perhaps for good reasons).  To me it sort of gave the wrong message.
>>
>> 5. In the mean time the world of graphs is changing. GraphBlas forum
>> seems to make some traction: a library based on GraphBlas has been made
>> available on Accumulo (Graphulo).  Assuming that Spark is NOT going to
>> adopt similar lines, nor to follow Datastax with tinkertop and Gremlin,
>> again, what is the new,  cohesive & comprehensive API that Spark is going
>> to deliver?
>>
>>
>> Sadly, the API uncertainty may force developers to more stable kind of
>> API / platforms & roadmaps.
>>
>>
>>
>> Thanks Enzo
>>
>> On 13 Mar 2017, at 22:09, Nicholas Chammas <ni...@gmail.com>
>> wrote:
>>
>> Your question is answered here under "Will GraphFrames be part of Apache
>> Spark?", no?
>>
>> http://graphframes.github.io/#what-are-graphframes
>>
>> Nick
>>
>> On Mon, Mar 13, 2017 at 4:56 PM enzo <en...@smartinsightsfromdata.com>
>> wrote:
>>
>> Please see this email  trail:  no answer so far on the user@spark
>> board.  Trying the developer board for better luck
>>
>> The question:
>>
>> I am a bit confused by the current roadmap for graph and graph analytics
>> in Apache Spark.
>>
>> I understand that we have had for some time two libraries (the following
>> is my understanding - please amend as appropriate!):
>>
>> . GraphX, part of Spark project.  This library is based on RDD and it is
>> only accessible via Scala.  It doesn’t look that this library has been
>> enhanced recently.
>> . GraphFrames, independent (at the moment?) library for Spark.  This
>> library is based on Spark DataFrames and accessible by Scala & Python. Last
>> commit on GitHub was 2 months ago.
>>
>> GraphFrames cam about with the promise at some point to be integrated in
>> Apache Spark.
>>
>> I can see other projects coming up with interesting libraries and ideas
>> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
>> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>>
>> Where is Apache Spark going?
>>
>> Where are graph libraries in the roadmap?
>>
>>
>>
>> Thanks for any clarity brought to this matter.
>>
>> Thanks Enzo
>>
>> Begin forwarded message:
>>
>> *From: *"Md. Rezaul Karim" <re...@insight-centre.org>
>> *Subject: **Re: Question on Spark's graph libraries*
>> *Date: *10 March 2017 at 13:13:15 CET
>> *To: *Robin East <ro...@xense.co.uk>
>> *Cc: *enzo <en...@smartinsightsfromdata.com>, spark users <
>> user@spark.apache.org>
>>
>> +1
>>
>> Regards,
>> _________________________________
>> *Md. Rezaul Karim*, BSc, MSc
>> PhD Researcher, INSIGHT Centre for Data Analytics
>> National University of Ireland, Galway
>> IDA Business Park, Dangan, Galway, Ireland
>> Web: http://www.reza-analytics.eu/index.html
>> <http://139.59.184.114/index.html>
>>
>> On 10 March 2017 at 12:10, Robin East <ro...@xense.co.uk> wrote:
>>
>> I would love to know the answer to that too.
>> ------------------------------------------------------------
>> -------------------
>> Robin East
>> *Spark GraphX in Action* Michael Malak and Robin East
>> Manning Publications Co.
>> http://www.manning.com/books/spark-graphx-in-action
>>
>>
>>
>>
>>
>> On 9 Mar 2017, at 17:42, enzo <en...@smartinsightsfromdata.com> wrote:
>>
>> I am a bit confused by the current roadmap for graph and graph analytics
>> in Apache Spark.
>>
>> I understand that we have had for some time two libraries (the following
>> is my understanding - please amend as appropriate!):
>>
>> . GraphX, part of Spark project.  This library is based on RDD and it is
>> only accessible via Scala.  It doesn’t look that this library has been
>> enhanced recently.
>> . GraphFrames, independent (at the moment?) library for Spark.  This
>> library is based on Spark DataFrames and accessible by Scala & Python. Last
>> commit on GitHub was 2 months ago.
>>
>> GraphFrames cam about with the promise at some point to be integrated in
>> Apache Spark.
>>
>> I can see other projects coming up with interesting libraries and ideas
>> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
>> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>>
>> Where is Apache Spark going?
>>
>> Where are graph libraries in the roadmap?
>>
>>
>>
>> Thanks for any clarity brought to this matter.
>>
>> Enzo
>>
>>
>>
>>
>>
>>

Re: Question on Spark's graph libraries roadmap

Posted by Nicholas Chammas <ni...@gmail.com>.
Since GraphFrames is not part of the Spark project, your
GraphFrames-specific questions are probably better directed at the
GraphFrames issue tracker:

https://github.com/graphframes/graphframes/issues

As far as I know, GraphFrames is an active project, though not as active as
Spark of course. There will be lulls in development since the people
driving that project forward also have major commitments to other projects.
This is natural.

If you post on GitHub I would wager somewhere there (maybe Joseph or Tim
<https://github.com/graphframes/graphframes/graphs/contributors>?) should
be able to answer your questions about GraphFrames.


   1. The page you linked refers to a *plan* to move GraphFrames to the
   standard Spark release cycle. Is this *plan* publicly available /
   visible?

I didn’t see any such reference to a plan in the page I linked you to.
Rather, the page says <http://graphframes.github.io/#what-are-graphframes>:

The current plan is to keep GraphFrames separate from core Apache Spark for
the time being.

Nick
​

On Mon, Mar 13, 2017 at 5:46 PM enzo <en...@smartinsightsfromdata.com> wrote:

> Nick
>
> Thanks for the quick answer :)
>
> Sadly, the comment in the page doesn’t answer my questions. More
> specifically:
>
> 1. GraphFrames last activity in github was 2 months ago.  Last release on 12
> Nov 2016.  Till recently 2 month was close to a Spark release cycle.  Why
> there has been no major development since mid November?
>
> 2. The page you linked refers to a *plan* to move GraphFrames to the
> standard Spark release cycle.  Is this *plan* publicly available / visible?
>
> 3. I couldn’t find any statement of intent to preserve either one or the
> other APIs, or just merge them: in other words, there seem to be no
> overarching plan for a cohesive & comprehensive graph API (I apologise in
> advance if I’m wrong).
>
> 4. I was initially impressed by GraphFrames syntax in places similar to
> Neo4J Cypher (now open source), but later I understood was an incomplete
> lightweight experiment (with no intention to move to full compatibility,
> perhaps for good reasons).  To me it sort of gave the wrong message.
>
> 5. In the mean time the world of graphs is changing. GraphBlas forum seems
> to make some traction: a library based on GraphBlas has been made available
> on Accumulo (Graphulo).  Assuming that Spark is NOT going to adopt similar
> lines, nor to follow Datastax with tinkertop and Gremlin, again, what is
> the new,  cohesive & comprehensive API that Spark is going to deliver?
>
>
> Sadly, the API uncertainty may force developers to more stable kind of API
> / platforms & roadmaps.
>
>
>
> Thanks Enzo
>
> On 13 Mar 2017, at 22:09, Nicholas Chammas <ni...@gmail.com>
> wrote:
>
> Your question is answered here under "Will GraphFrames be part of Apache
> Spark?", no?
>
> http://graphframes.github.io/#what-are-graphframes
>
> Nick
>
> On Mon, Mar 13, 2017 at 4:56 PM enzo <en...@smartinsightsfromdata.com>
> wrote:
>
> Please see this email  trail:  no answer so far on the user@spark board.
> Trying the developer board for better luck
>
> The question:
>
> I am a bit confused by the current roadmap for graph and graph analytics
> in Apache Spark.
>
> I understand that we have had for some time two libraries (the following
> is my understanding - please amend as appropriate!):
>
> . GraphX, part of Spark project.  This library is based on RDD and it is
> only accessible via Scala.  It doesn’t look that this library has been
> enhanced recently.
> . GraphFrames, independent (at the moment?) library for Spark.  This
> library is based on Spark DataFrames and accessible by Scala & Python. Last
> commit on GitHub was 2 months ago.
>
> GraphFrames cam about with the promise at some point to be integrated in
> Apache Spark.
>
> I can see other projects coming up with interesting libraries and ideas
> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>
> Where is Apache Spark going?
>
> Where are graph libraries in the roadmap?
>
>
>
> Thanks for any clarity brought to this matter.
>
> Thanks Enzo
>
> Begin forwarded message:
>
> *From: *"Md. Rezaul Karim" <re...@insight-centre.org>
> *Subject: **Re: Question on Spark's graph libraries*
> *Date: *10 March 2017 at 13:13:15 CET
> *To: *Robin East <ro...@xense.co.uk>
> *Cc: *enzo <en...@smartinsightsfromdata.com>, spark users <
> user@spark.apache.org>
>
> +1
>
> Regards,
> _________________________________
> *Md. Rezaul Karim*, BSc, MSc
> PhD Researcher, INSIGHT Centre for Data Analytics
> National University of Ireland, Galway
> IDA Business Park, Dangan, Galway, Ireland
> Web: http://www.reza-analytics.eu/index.html
> <http://139.59.184.114/index.html>
>
> On 10 March 2017 at 12:10, Robin East <ro...@xense.co.uk> wrote:
>
> I would love to know the answer to that too.
>
> -------------------------------------------------------------------------------
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 9 Mar 2017, at 17:42, enzo <en...@smartinsightsfromdata.com> wrote:
>
> I am a bit confused by the current roadmap for graph and graph analytics
> in Apache Spark.
>
> I understand that we have had for some time two libraries (the following
> is my understanding - please amend as appropriate!):
>
> . GraphX, part of Spark project.  This library is based on RDD and it is
> only accessible via Scala.  It doesn’t look that this library has been
> enhanced recently.
> . GraphFrames, independent (at the moment?) library for Spark.  This
> library is based on Spark DataFrames and accessible by Scala & Python. Last
> commit on GitHub was 2 months ago.
>
> GraphFrames cam about with the promise at some point to be integrated in
> Apache Spark.
>
> I can see other projects coming up with interesting libraries and ideas
> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>
> Where is Apache Spark going?
>
> Where are graph libraries in the roadmap?
>
>
>
> Thanks for any clarity brought to this matter.
>
> Enzo
>
>
>
>
>
>

Re: Question on Spark's graph libraries roadmap

Posted by enzo <en...@smartinsightsfromdata.com>.
Nick

Thanks for the quick answer :)

Sadly, the comment in the page doesn’t answer my questions. More specifically:

1. GraphFrames last activity in github was 2 months ago.  Last release on 12 Nov 2016.  Till recently 2 month was close to a Spark release cycle.  Why there has been no major development since mid November?

2. The page you linked refers to a *plan* to move GraphFrames to the standard Spark release cycle.  Is this *plan* publicly available / visible?

3. I couldn’t find any statement of intent to preserve either one or the other APIs, or just merge them: in other words, there seem to be no overarching plan for a cohesive & comprehensive graph API (I apologise in advance if I’m wrong).

4. I was initially impressed by GraphFrames syntax in places similar to Neo4J Cypher (now open source), but later I understood was an incomplete lightweight experiment (with no intention to move to full compatibility, perhaps for good reasons).  To me it sort of gave the wrong message.

5. In the mean time the world of graphs is changing. GraphBlas forum seems to make some traction: a library based on GraphBlas has been made available on Accumulo (Graphulo).  Assuming that Spark is NOT going to adopt similar lines, nor to follow Datastax with tinkertop and Gremlin, again, what is the new,  cohesive & comprehensive API that Spark is going to deliver?


Sadly, the API uncertainty may force developers to more stable kind of API / platforms & roadmaps.



Thanks Enzo

> On 13 Mar 2017, at 22:09, Nicholas Chammas <ni...@gmail.com> wrote:
> 
> Your question is answered here under "Will GraphFrames be part of Apache Spark?", no?
> 
> http://graphframes.github.io/#what-are-graphframes <http://graphframes.github.io/#what-are-graphframes>
> 
> Nick
> 
> On Mon, Mar 13, 2017 at 4:56 PM enzo <enzo@smartinsightsfromdata.com <ma...@smartinsightsfromdata.com>> wrote:
> Please see this email  trail:  no answer so far on the user@spark board.  Trying the developer board for better luck
> 
> The question:
> 
> I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark.
> 
> I understand that we have had for some time two libraries (the following is my understanding - please amend as appropriate!):
> 
> . GraphX, part of Spark project.  This library is based on RDD and it is only accessible via Scala.  It doesn’t look that this library has been enhanced recently.
> . GraphFrames, independent (at the moment?) library for Spark.  This library is based on Spark DataFrames and accessible by Scala & Python. Last commit on GitHub was 2 months ago.
> 
> GraphFrames cam about with the promise at some point to be integrated in Apache Spark.
> 
> I can see other projects coming up with interesting libraries and ideas (e.g. Graphulo on Accumulo, a new project with the goal of implementing the GraphBlas building blocks for graph algorithms on top of Accumulo).
> 
> Where is Apache Spark going?
> 
> Where are graph libraries in the roadmap?
> 
> 
> 
> Thanks for any clarity brought to this matter.
> 
> Thanks Enzo
> 
>> Begin forwarded message:
>> 
>> From: "Md. Rezaul Karim" <rezaul.karim@insight-centre.org <ma...@insight-centre.org>>
>> Subject: Re: Question on Spark's graph libraries
>> Date: 10 March 2017 at 13:13:15 CET
>> To: Robin East <robin.east@xense.co.uk <ma...@xense.co.uk>>
>> Cc: enzo <enzo@smartinsightsfromdata.com <ma...@smartinsightsfromdata.com>>, spark users <user@spark.apache.org <ma...@spark.apache.org>>
>> 
>> +1
>> 
>> Regards,
>> _________________________________
>> Md. Rezaul Karim, BSc, MSc
>> PhD Researcher, INSIGHT Centre for Data Analytics 
>> National University of Ireland, Galway
>> IDA Business Park, Dangan, Galway, Ireland
>> Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>
>> 
>> On 10 March 2017 at 12:10, Robin East <robin.east@xense.co.uk <ma...@xense.co.uk>> wrote:
>> I would love to know the answer to that too.
>> -------------------------------------------------------------------------------
>> Robin East
>> Spark GraphX in Action Michael Malak and Robin East
>> Manning Publications Co.
>> http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>
>> 
>> 
>> 
>> 
>> 
>>> On 9 Mar 2017, at 17:42, enzo <enzo@smartinsightsfromdata.com <ma...@smartinsightsfromdata.com>> wrote:
>>> 
>>> I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark.
>>> 
>>> I understand that we have had for some time two libraries (the following is my understanding - please amend as appropriate!):
>>> 
>>> . GraphX, part of Spark project.  This library is based on RDD and it is only accessible via Scala.  It doesn’t look that this library has been enhanced recently.
>>> . GraphFrames, independent (at the moment?) library for Spark.  This library is based on Spark DataFrames and accessible by Scala & Python. Last commit on GitHub was 2 months ago.
>>> 
>>> GraphFrames cam about with the promise at some point to be integrated in Apache Spark.
>>> 
>>> I can see other projects coming up with interesting libraries and ideas (e.g. Graphulo on Accumulo, a new project with the goal of implementing the GraphBlas building blocks for graph algorithms on top of Accumulo).
>>> 
>>> Where is Apache Spark going?
>>> 
>>> Where are graph libraries in the roadmap?
>>> 
>>> 
>>> 
>>> Thanks for any clarity brought to this matter.
>>> 
>>> Enzo
>> 
>> 
> 


Re: Question on Spark's graph libraries roadmap

Posted by Nicholas Chammas <ni...@gmail.com>.
Your question is answered here under "Will GraphFrames be part of Apache
Spark?", no?

http://graphframes.github.io/#what-are-graphframes

Nick

On Mon, Mar 13, 2017 at 4:56 PM enzo <en...@smartinsightsfromdata.com> wrote:

> Please see this email  trail:  no answer so far on the user@spark board.
> Trying the developer board for better luck
>
> The question:
>
> I am a bit confused by the current roadmap for graph and graph analytics
> in Apache Spark.
>
> I understand that we have had for some time two libraries (the following
> is my understanding - please amend as appropriate!):
>
> . GraphX, part of Spark project.  This library is based on RDD and it is
> only accessible via Scala.  It doesn’t look that this library has been
> enhanced recently.
> . GraphFrames, independent (at the moment?) library for Spark.  This
> library is based on Spark DataFrames and accessible by Scala & Python. Last
> commit on GitHub was 2 months ago.
>
> GraphFrames cam about with the promise at some point to be integrated in
> Apache Spark.
>
> I can see other projects coming up with interesting libraries and ideas
> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>
> Where is Apache Spark going?
>
> Where are graph libraries in the roadmap?
>
>
>
> Thanks for any clarity brought to this matter.
>
> Thanks Enzo
>
> Begin forwarded message:
>
> *From: *"Md. Rezaul Karim" <re...@insight-centre.org>
> *Subject: **Re: Question on Spark's graph libraries*
> *Date: *10 March 2017 at 13:13:15 CET
> *To: *Robin East <ro...@xense.co.uk>
> *Cc: *enzo <en...@smartinsightsfromdata.com>, spark users <
> user@spark.apache.org>
>
> +1
>
> Regards,
> _________________________________
> *Md. Rezaul Karim*, BSc, MSc
> PhD Researcher, INSIGHT Centre for Data Analytics
> National University of Ireland, Galway
> IDA Business Park, Dangan, Galway, Ireland
> Web: http://www.reza-analytics.eu/index.html
> <http://139.59.184.114/index.html>
>
> On 10 March 2017 at 12:10, Robin East <ro...@xense.co.uk> wrote:
>
> I would love to know the answer to that too.
>
> -------------------------------------------------------------------------------
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 9 Mar 2017, at 17:42, enzo <en...@smartinsightsfromdata.com> wrote:
>
> I am a bit confused by the current roadmap for graph and graph analytics
> in Apache Spark.
>
> I understand that we have had for some time two libraries (the following
> is my understanding - please amend as appropriate!):
>
> . GraphX, part of Spark project.  This library is based on RDD and it is
> only accessible via Scala.  It doesn’t look that this library has been
> enhanced recently.
> . GraphFrames, independent (at the moment?) library for Spark.  This
> library is based on Spark DataFrames and accessible by Scala & Python. Last
> commit on GitHub was 2 months ago.
>
> GraphFrames cam about with the promise at some point to be integrated in
> Apache Spark.
>
> I can see other projects coming up with interesting libraries and ideas
> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>
> Where is Apache Spark going?
>
> Where are graph libraries in the roadmap?
>
>
>
> Thanks for any clarity brought to this matter.
>
> Enzo
>
>
>
>
>

Fwd: Question on Spark's graph libraries roadmap

Posted by enzo <en...@smartinsightsfromdata.com>.
Please see this email  trail:  no answer so far on the user@spark board.  Trying the developer board for better luck

The question:

I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark.

I understand that we have had for some time two libraries (the following is my understanding - please amend as appropriate!):

. GraphX, part of Spark project.  This library is based on RDD and it is only accessible via Scala.  It doesn’t look that this library has been enhanced recently.
. GraphFrames, independent (at the moment?) library for Spark.  This library is based on Spark DataFrames and accessible by Scala & Python. Last commit on GitHub was 2 months ago.

GraphFrames cam about with the promise at some point to be integrated in Apache Spark.

I can see other projects coming up with interesting libraries and ideas (e.g. Graphulo on Accumulo, a new project with the goal of implementing the GraphBlas building blocks for graph algorithms on top of Accumulo).

Where is Apache Spark going?

Where are graph libraries in the roadmap?



Thanks for any clarity brought to this matter.

Thanks Enzo

> Begin forwarded message:
> 
> From: "Md. Rezaul Karim" <re...@insight-centre.org>
> Subject: Re: Question on Spark's graph libraries
> Date: 10 March 2017 at 13:13:15 CET
> To: Robin East <ro...@xense.co.uk>
> Cc: enzo <en...@smartinsightsfromdata.com>, spark users <us...@spark.apache.org>
> 
> +1
> 
> Regards,
> _________________________________
> Md. Rezaul Karim, BSc, MSc
> PhD Researcher, INSIGHT Centre for Data Analytics 
> National University of Ireland, Galway
> IDA Business Park, Dangan, Galway, Ireland
> Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>
> 
> On 10 March 2017 at 12:10, Robin East <robin.east@xense.co.uk <ma...@xense.co.uk>> wrote:
> I would love to know the answer to that too.
> -------------------------------------------------------------------------------
> Robin East
> Spark GraphX in Action Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>
> 
> 
> 
> 
> 
>> On 9 Mar 2017, at 17:42, enzo <enzo@smartinsightsfromdata.com <ma...@smartinsightsfromdata.com>> wrote:
>> 
>> I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark.
>> 
>> I understand that we have had for some time two libraries (the following is my understanding - please amend as appropriate!):
>> 
>> . GraphX, part of Spark project.  This library is based on RDD and it is only accessible via Scala.  It doesn’t look that this library has been enhanced recently.
>> . GraphFrames, independent (at the moment?) library for Spark.  This library is based on Spark DataFrames and accessible by Scala & Python. Last commit on GitHub was 2 months ago.
>> 
>> GraphFrames cam about with the promise at some point to be integrated in Apache Spark.
>> 
>> I can see other projects coming up with interesting libraries and ideas (e.g. Graphulo on Accumulo, a new project with the goal of implementing the GraphBlas building blocks for graph algorithms on top of Accumulo).
>> 
>> Where is Apache Spark going?
>> 
>> Where are graph libraries in the roadmap?
>> 
>> 
>> 
>> Thanks for any clarity brought to this matter.
>> 
>> Enzo
> 
> 


Re: Question on Spark's graph libraries

Posted by "Md. Rezaul Karim" <re...@insight-centre.org>.
+1

Regards,
_________________________________
*Md. Rezaul Karim*, BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
<http://139.59.184.114/index.html>

On 10 March 2017 at 12:10, Robin East <ro...@xense.co.uk> wrote:

> I would love to know the answer to that too.
> ------------------------------------------------------------
> -------------------
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 9 Mar 2017, at 17:42, enzo <en...@smartinsightsfromdata.com> wrote:
>
> I am a bit confused by the current roadmap for graph and graph analytics
> in Apache Spark.
>
> I understand that we have had for some time two libraries (the following
> is my understanding - please amend as appropriate!):
>
> . GraphX, part of Spark project.  This library is based on RDD and it is
> only accessible via Scala.  It doesn’t look that this library has been
> enhanced recently.
> . GraphFrames, independent (at the moment?) library for Spark.  This
> library is based on Spark DataFrames and accessible by Scala & Python. Last
> commit on GitHub was 2 months ago.
>
> GraphFrames cam about with the promise at some point to be integrated in
> Apache Spark.
>
> I can see other projects coming up with interesting libraries and ideas
> (e.g. Graphulo on Accumulo, a new project with the goal of implementing
> the GraphBlas building blocks for graph algorithms on top of Accumulo).
>
> Where is Apache Spark going?
>
> Where are graph libraries in the roadmap?
>
>
>
> Thanks for any clarity brought to this matter.
>
> Enzo
>
>
>

Re: Question on Spark's graph libraries

Posted by Robin East <ro...@xense.co.uk>.
I would love to know the answer to that too.
-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>





> On 9 Mar 2017, at 17:42, enzo <en...@smartinsightsfromdata.com> wrote:
> 
> I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark.
> 
> I understand that we have had for some time two libraries (the following is my understanding - please amend as appropriate!):
> 
> . GraphX, part of Spark project.  This library is based on RDD and it is only accessible via Scala.  It doesn’t look that this library has been enhanced recently.
> . GraphFrames, independent (at the moment?) library for Spark.  This library is based on Spark DataFrames and accessible by Scala & Python. Last commit on GitHub was 2 months ago.
> 
> GraphFrames cam about with the promise at some point to be integrated in Apache Spark.
> 
> I can see other projects coming up with interesting libraries and ideas (e.g. Graphulo on Accumulo, a new project with the goal of implementing the GraphBlas building blocks for graph algorithms on top of Accumulo).
> 
> Where is Apache Spark going?
> 
> Where are graph libraries in the roadmap?
> 
> 
> 
> Thanks for any clarity brought to this matter.
> 
> Enzo