You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by kant kodali <ka...@gmail.com> on 2020/02/14 07:12:07 UTC

Re: SparkGraph review process

any update on this? Is spark graph going to make it into Spark or no?

On Mon, Oct 14, 2019 at 12:26 PM Holden Karau <ho...@pigscanfly.ca> wrote:

> Maybe let’s ask the folks from Lightbend who helped with the previous
> scala upgrade for their thoughts?
>
> On Mon, Oct 14, 2019 at 8:24 PM Xiao Li <ga...@gmail.com> wrote:
>
>> 1. On the technical side, my main concern is the runtime dependency on
>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We
>>> came out with the solution to shade a few Scala libraries to avoid
>>> pollution. However, I'm not super confident that the approach is
>>> sustainable for two reasons: a) there exists no proper shading libraries
>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries
>>> before we can upgrade Spark to use a newer Scala version. So it would be
>>> great if some Scala experts can help review the current implementation and
>>> help assess the risk.
>>
>>
>> This concern is valid. I think we should start the vote to ensure the
>> whole community is aware of the risk and take the responsibility to
>> maintain this in the long term.
>>
>> Cheers,
>>
>> Xiao
>>
>>
>> Xiangrui Meng <me...@gmail.com> 于2019年10月4日周五 下午12:27写道:
>>
>>> Hi all,
>>>
>>> I want to clarify my role first to avoid misunderstanding. I'm an
>>> individual contributor here. My work on the graph SPIP as well as other
>>> Spark features I contributed to are not associated with my employer. It
>>> became quite challenging for me to keep track of the graph SPIP work due to
>>> less available time at home.
>>>
>>> On retrospective, we should have involved more Spark devs and committers
>>> early on so there is no single point of failure, i.e., me. Hopefully it is
>>> not too late to fix. I summarize my thoughts here to help onboard other
>>> reviewers:
>>>
>>> 1. On the technical side, my main concern is the runtime dependency on
>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We
>>> came out with the solution to shade a few Scala libraries to avoid
>>> pollution. However, I'm not super confident that the approach is
>>> sustainable for two reasons: a) there exists no proper shading libraries
>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries
>>> before we can upgrade Spark to use a newer Scala version. So it would be
>>> great if some Scala experts can help review the current implementation and
>>> help assess the risk.
>>>
>>> 2. Overloading helper methods. MLlib used to have several overloaded
>>> helper methods for each algorithm, which later became a major maintenance
>>> burden. Builders and setters/getters are more maintainable. I will comment
>>> again on the PR.
>>>
>>> 3. The proposed API partitions graph into sub-graphs, as described in
>>> the property graph model. It is unclear to me how it would affect query
>>> performance because it requires SQL optimizer to correctly recognize data
>>> from the same source and make execution efficient.
>>>
>>> 4. The feature, although originally targeted for Spark 3.0, should not
>>> be a Spark 3.0 release blocker because it doesn't require breaking changes.
>>> If we miss the code freeze deadline, we can introduce a build flag to
>>> exclude the module from the official release/distribution, and then make it
>>> default once the module is ready.
>>>
>>> 5. If unfortunately we still don't see sufficient committer reviews, I
>>> think the best option would be submitting the work to Apache Incubator
>>> instead to unblock the work. But maybe it is too earlier to discuss this
>>> option.
>>>
>>> It would be great if other committers can offer help on the review!
>>> Really appreciated!
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Fri, Oct 4, 2019 at 1:32 AM Mats Rydberg <ma...@neo4j.org.invalid>
>>> wrote:
>>>
>>>> Hello dear Spark community
>>>>
>>>> We are the developers behind the SparkGraph SPIP, which is a project
>>>> created out of our work on openCypher Morpheus (
>>>> https://github.com/opencypher/morpheus). During this year we have
>>>> collaborated with mainly Xiangrui Meng of Databricks to define and develop
>>>> a new SparkGraph module based on our experience from working on Morpheus.
>>>> Morpheus - formerly known as "Cypher for Apache Spark" - has been in
>>>> development for over 3 years and matured in its API and implementation.
>>>>
>>>> The SPIP work has been on hold for a period of time now, as priorities
>>>> at Databricks have changed which has occupied Xiangrui's time (as well as
>>>> other happenings). As you may know, the latest API PR (
>>>> https://github.com/apache/spark/pull/24851) is blocking us from moving
>>>> forward with the implementation.
>>>>
>>>> In an attempt to not lose track of this project we now reach out to you
>>>> to ask whether there are any Spark committers in the community who would be
>>>> prepared to commit to helping us review and merge our code contributions to
>>>> Apache Spark? We are not asking for lots of direct development support, as
>>>> we believe we have the implementation more or less completed already since
>>>> early this year. There is a proof-of-concept PR (
>>>> https://github.com/apache/spark/pull/24297) which contains the
>>>> functionality.
>>>>
>>>> If you could offer such aid it would be greatly appreciated. None of us
>>>> are Spark committers, which is hindering our ability to deliver this
>>>> project in time for Spark 3.0.
>>>>
>>>> Sincerely
>>>> the Neo4j Graph Analytics team
>>>> Mats, Martin, Max, Sören, Jonatan
>>>>
>>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: SparkGraph review process

Posted by kant kodali <ka...@gmail.com>.
Hi Sean,

In that case, Can we have Graphframes as part of spark release? or separate
release is also fine. Currently, I don't see any releases w.r.t Graphframes.

Thanks


On Fri, Feb 14, 2020 at 9:06 AM Sean Owen <sr...@gmail.com> wrote:

> This will not be Spark 3.0, no.
>
> On Fri, Feb 14, 2020 at 1:12 AM kant kodali <ka...@gmail.com> wrote:
> >
> > any update on this? Is spark graph going to make it into Spark or no?
> >
> > On Mon, Oct 14, 2019 at 12:26 PM Holden Karau <ho...@pigscanfly.ca>
> wrote:
> >>
> >> Maybe let’s ask the folks from Lightbend who helped with the previous
> scala upgrade for their thoughts?
> >>
> >> On Mon, Oct 14, 2019 at 8:24 PM Xiao Li <ga...@gmail.com> wrote:
> >>>>
> >>>> 1. On the technical side, my main concern is the runtime dependency
> on org.opencypher:okapi-shade. okapi depends on several Scala libraries. We
> came out with the solution to shade a few Scala libraries to avoid
> pollution. However, I'm not super confident that the approach is
> sustainable for two reasons: a) there exists no proper shading libraries
> for Scala, 2) We will have to wait for upgrades from those Scala libraries
> before we can upgrade Spark to use a newer Scala version. So it would be
> great if some Scala experts can help review the current implementation and
> help assess the risk.
> >>>
> >>>
> >>> This concern is valid. I think we should start the vote to ensure the
> whole community is aware of the risk and take the responsibility to
> maintain this in the long term.
> >>>
> >>> Cheers,
> >>>
> >>> Xiao
> >>>
> >>>
> >>> Xiangrui Meng <me...@gmail.com> 于2019年10月4日周五 下午12:27写道:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I want to clarify my role first to avoid misunderstanding. I'm an
> individual contributor here. My work on the graph SPIP as well as other
> Spark features I contributed to are not associated with my employer. It
> became quite challenging for me to keep track of the graph SPIP work due to
> less available time at home.
> >>>>
> >>>> On retrospective, we should have involved more Spark devs and
> committers early on so there is no single point of failure, i.e., me.
> Hopefully it is not too late to fix. I summarize my thoughts here to help
> onboard other reviewers:
> >>>>
> >>>> 1. On the technical side, my main concern is the runtime dependency
> on org.opencypher:okapi-shade. okapi depends on several Scala libraries. We
> came out with the solution to shade a few Scala libraries to avoid
> pollution. However, I'm not super confident that the approach is
> sustainable for two reasons: a) there exists no proper shading libraries
> for Scala, 2) We will have to wait for upgrades from those Scala libraries
> before we can upgrade Spark to use a newer Scala version. So it would be
> great if some Scala experts can help review the current implementation and
> help assess the risk.
> >>>>
> >>>> 2. Overloading helper methods. MLlib used to have several overloaded
> helper methods for each algorithm, which later became a major maintenance
> burden. Builders and setters/getters are more maintainable. I will comment
> again on the PR.
> >>>>
> >>>> 3. The proposed API partitions graph into sub-graphs, as described in
> the property graph model. It is unclear to me how it would affect query
> performance because it requires SQL optimizer to correctly recognize data
> from the same source and make execution efficient.
> >>>>
> >>>> 4. The feature, although originally targeted for Spark 3.0, should
> not be a Spark 3.0 release blocker because it doesn't require breaking
> changes. If we miss the code freeze deadline, we can introduce a build flag
> to exclude the module from the official release/distribution, and then make
> it default once the module is ready.
> >>>>
> >>>> 5. If unfortunately we still don't see sufficient committer reviews,
> I think the best option would be submitting the work to Apache Incubator
> instead to unblock the work. But maybe it is too earlier to discuss this
> option.
> >>>>
> >>>> It would be great if other committers can offer help on the review!
> Really appreciated!
> >>>>
> >>>> Best,
> >>>> Xiangrui
> >>>>
> >>>> On Fri, Oct 4, 2019 at 1:32 AM Mats Rydberg <ma...@neo4j.org.invalid>
> wrote:
> >>>>>
> >>>>> Hello dear Spark community
> >>>>>
> >>>>> We are the developers behind the SparkGraph SPIP, which is a project
> created out of our work on openCypher Morpheus (
> https://github.com/opencypher/morpheus). During this year we have
> collaborated with mainly Xiangrui Meng of Databricks to define and develop
> a new SparkGraph module based on our experience from working on Morpheus.
> Morpheus - formerly known as "Cypher for Apache Spark" - has been in
> development for over 3 years and matured in its API and implementation.
> >>>>>
> >>>>> The SPIP work has been on hold for a period of time now, as
> priorities at Databricks have changed which has occupied Xiangrui's time
> (as well as other happenings). As you may know, the latest API PR (
> https://github.com/apache/spark/pull/24851) is blocking us from moving
> forward with the implementation.
> >>>>>
> >>>>> In an attempt to not lose track of this project we now reach out to
> you to ask whether there are any Spark committers in the community who
> would be prepared to commit to helping us review and merge our code
> contributions to Apache Spark? We are not asking for lots of direct
> development support, as we believe we have the implementation more or less
> completed already since early this year. There is a proof-of-concept PR (
> https://github.com/apache/spark/pull/24297) which contains the
> functionality.
> >>>>>
> >>>>> If you could offer such aid it would be greatly appreciated. None of
> us are Spark committers, which is hindering our ability to deliver this
> project in time for Spark 3.0.
> >>>>>
> >>>>> Sincerely
> >>>>> the Neo4j Graph Analytics team
> >>>>> Mats, Martin, Max, Sören, Jonatan
> >>>>>
> >> --
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: SparkGraph review process

Posted by Sean Owen <sr...@gmail.com>.
This will not be Spark 3.0, no.

On Fri, Feb 14, 2020 at 1:12 AM kant kodali <ka...@gmail.com> wrote:
>
> any update on this? Is spark graph going to make it into Spark or no?
>
> On Mon, Oct 14, 2019 at 12:26 PM Holden Karau <ho...@pigscanfly.ca> wrote:
>>
>> Maybe let’s ask the folks from Lightbend who helped with the previous scala upgrade for their thoughts?
>>
>> On Mon, Oct 14, 2019 at 8:24 PM Xiao Li <ga...@gmail.com> wrote:
>>>>
>>>> 1. On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade. okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.
>>>
>>>
>>> This concern is valid. I think we should start the vote to ensure the whole community is aware of the risk and take the responsibility to maintain this in the long term.
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>>
>>> Xiangrui Meng <me...@gmail.com> 于2019年10月4日周五 下午12:27写道:
>>>>
>>>> Hi all,
>>>>
>>>> I want to clarify my role first to avoid misunderstanding. I'm an individual contributor here. My work on the graph SPIP as well as other Spark features I contributed to are not associated with my employer. It became quite challenging for me to keep track of the graph SPIP work due to less available time at home.
>>>>
>>>> On retrospective, we should have involved more Spark devs and committers early on so there is no single point of failure, i.e., me. Hopefully it is not too late to fix. I summarize my thoughts here to help onboard other reviewers:
>>>>
>>>> 1. On the technical side, my main concern is the runtime dependency on org.opencypher:okapi-shade. okapi depends on several Scala libraries. We came out with the solution to shade a few Scala libraries to avoid pollution. However, I'm not super confident that the approach is sustainable for two reasons: a) there exists no proper shading libraries for Scala, 2) We will have to wait for upgrades from those Scala libraries before we can upgrade Spark to use a newer Scala version. So it would be great if some Scala experts can help review the current implementation and help assess the risk.
>>>>
>>>> 2. Overloading helper methods. MLlib used to have several overloaded helper methods for each algorithm, which later became a major maintenance burden. Builders and setters/getters are more maintainable. I will comment again on the PR.
>>>>
>>>> 3. The proposed API partitions graph into sub-graphs, as described in the property graph model. It is unclear to me how it would affect query performance because it requires SQL optimizer to correctly recognize data from the same source and make execution efficient.
>>>>
>>>> 4. The feature, although originally targeted for Spark 3.0, should not be a Spark 3.0 release blocker because it doesn't require breaking changes. If we miss the code freeze deadline, we can introduce a build flag to exclude the module from the official release/distribution, and then make it default once the module is ready.
>>>>
>>>> 5. If unfortunately we still don't see sufficient committer reviews, I think the best option would be submitting the work to Apache Incubator instead to unblock the work. But maybe it is too earlier to discuss this option.
>>>>
>>>> It would be great if other committers can offer help on the review! Really appreciated!
>>>>
>>>> Best,
>>>> Xiangrui
>>>>
>>>> On Fri, Oct 4, 2019 at 1:32 AM Mats Rydberg <ma...@neo4j.org.invalid> wrote:
>>>>>
>>>>> Hello dear Spark community
>>>>>
>>>>> We are the developers behind the SparkGraph SPIP, which is a project created out of our work on openCypher Morpheus (https://github.com/opencypher/morpheus). During this year we have collaborated with mainly Xiangrui Meng of Databricks to define and develop a new SparkGraph module based on our experience from working on Morpheus. Morpheus - formerly known as "Cypher for Apache Spark" - has been in development for over 3 years and matured in its API and implementation.
>>>>>
>>>>> The SPIP work has been on hold for a period of time now, as priorities at Databricks have changed which has occupied Xiangrui's time (as well as other happenings). As you may know, the latest API PR (https://github.com/apache/spark/pull/24851) is blocking us from moving forward with the implementation.
>>>>>
>>>>> In an attempt to not lose track of this project we now reach out to you to ask whether there are any Spark committers in the community who would be prepared to commit to helping us review and merge our code contributions to Apache Spark? We are not asking for lots of direct development support, as we believe we have the implementation more or less completed already since early this year. There is a proof-of-concept PR (https://github.com/apache/spark/pull/24297) which contains the functionality.
>>>>>
>>>>> If you could offer such aid it would be greatly appreciated. None of us are Spark committers, which is hindering our ability to deliver this project in time for Spark 3.0.
>>>>>
>>>>> Sincerely
>>>>> the Neo4j Graph Analytics team
>>>>> Mats, Martin, Max, Sören, Jonatan
>>>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org