You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Luciano Resende <lu...@gmail.com> on 2016/04/15 18:01:55 UTC

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

After some collaboration with other community members, we have created a
initial draft for Spark Extras which is available for review at

https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing

We would like to invite other community members to participate in the
project, particularly the Spark Committers and PMC (feel free to express
interest and I will update the proposal). Another option here is just to
give ALL Spark committers write access to "Spark Extras".


We also have couple asks from the Spark PMC :

- Permission to use "Spark Extras" as the project name. We already checked
this with Apache Brand Management, and the recommendation was to discuss
and reach consensus with the Spark PMC.

- We would also want to check with the Spark PMC that, in case of
successfully creation of  "Spark Extras", if the PMC would be willing to
continue the development of the remaining connectors that stayed in Spark
2.0 codebase in the "Spark Extras" project.


Thanks in advance, and we welcome any feedback around this proposal before
we present to the Apache Board for consideration.



On Sat, Mar 26, 2016 at 10:07 AM, Luciano Resende <lu...@gmail.com>
wrote:

> I believe some of this has been resolved in the context of some parts that
> had interest in one extra connector, but we still have a few removed, and
> as you mentioned, we still don't have a simple way or willingness to manage
> and be current on new packages like kafka. And based on the fact that this
> thread is still alive, I believe that other community members might have
> other concerns as well.
>
> After some thought, I believe having a separate project (what was
> mentioned here as Spark Extras) to handle Spark Connectors and Spark
> add-ons in general could be very beneficial to Spark and the overall Spark
> community, which would have a central place in Apache to collaborate around
> related Spark components.
>
> Some of the benefits on this approach
>
> - Enables maintaining the connectors inside Apache, following the Apache
> governance and release rules, while allowing Spark proper to focus on the
> core runtime.
> - Provides more flexibility in controlling the direction (currency) of the
> existing connectors (e.g. willing to find a solution and maintain multiple
> versions of same connectors like kafka 0.8x and 0.9x)
> - Becomes a home for other types of Spark related connectors helping
> expanding the community around Spark (e.g. Zeppelin see most of it's
> current contribution around new/enhanced connectors)
>
> What are some requirements for Spark Extras to be successful:
>
> - Be up to date with Spark Trunk APIs (based on daily CIs against SNAPSHOT)
> - Adhere to Spark release cycles (have a very little window compared to
> Spark release)
> - Be more open and flexible to the set of connectors it will accept and
> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
> have today)
>
> Where to start Spark Extras
>
> Depending on the interest here, we could follow the steps of (Apache
> Arrow) and start this directly as a TLP, or start as an incubator project.
> I would consider the first option first.
>
> Who would participate
>
> Have thought about this for a bit, and if we go to the direction of TLP, I
> would say Spark Committers and Apache Members can request to participate as
> PMC members, while other committers can request to become committers. Non
> committers would be added based on meritocracy after the start of the
> project.
>
> Project Name
>
> It would be ideal if we could have a project name that shows close ties to
> Spark (e.g. Spark Extras or Spark Connectors) but we will need permission
> and support from whoever is going to evaluate the project proposal (e.g.
> Apache Board)
>
>
> Thoughts ?
>
> Does anyone have any big disagreement or objection to moving into this
> direction ?
>
> Otherwise, who would be interested in joining the project, so I can start
> working on some concrete proposal ?
>
>
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Cody Koeninger <co...@koeninger.org>.

100% agree with Sean & Reynold's comments on this.

Adding this as a TLP would just cause more confusion as to "official"
endorsement.



On Fri, Apr 15, 2016 at 11:50 AM, Sean Owen <so...@cloudera.com> wrote:
> On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende <lu...@gmail.com> wrote:
>> I know the name might be confusing, but I also think that the projects have
>> a very big synergy, more like sibling projects, where "Spark Extras" extends
>> the Spark community and develop/maintain components for, and pretty much
>> only for, Apache Spark.  Based on your comment above, if making the project
>> "Spark-Extras" a more acceptable name, I believe this is ok as well.
>
> This also grants special status to a third-party project. It's not
> clear this should be *the* official unofficial third-party Spark
> project over some other one. If something's to be blessed, it should
> be in the Spark project.
>
> And why isn't it in the Spark project? the argument was that these
> bits were not used and pretty de minimis as code. It's not up to me or
> anyone else to tell you code X isn't useful to you. But arguing X
> should be a TLP asserts it is substantial and of broad interest, since
> there's non-zero effort for volunteers to deal with it. I am not sure
> I've heard anyone argue that -- or did I miss it? because removing
> bits of unused code happens all the time and isn't a bad precedent or
> even unusual.
>
> It doesn't actually enable any more cooperation than is already
> possible with any other project (like Kafka, Mesos, etc). You can run
> the same governance model anywhere you like. I realize literally being
> operated under the ASF banner is something different.
>
> What I hear here is a proposal to make an unofficial official Spark
> project as a TLP, that begins with these fairly inconsequential
> extras. I question the value of that on its face. Example: what goes
> into this project? deleted Spark code only? or is this a glorified
> "contrib" folder with a lower and somehow different bar determined by
> different people?
>
> And at that stage... is it really helping to give that special status?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.

Yeah, so it’s the *Apache Spark* project. Just to clarify.
Not once did you say Apache Spark below.






On 4/15/16, 9:50 AM, "Sean Owen" <so...@cloudera.com> wrote:

>On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende <lu...@gmail.com> wrote:
>> I know the name might be confusing, but I also think that the projects have
>> a very big synergy, more like sibling projects, where "Spark Extras" extends
>> the Spark community and develop/maintain components for, and pretty much
>> only for, Apache Spark.  Based on your comment above, if making the project
>> "Spark-Extras" a more acceptable name, I believe this is ok as well.
>
>This also grants special status to a third-party project. It's not
>clear this should be *the* official unofficial third-party Spark
>project over some other one. If something's to be blessed, it should
>be in the Spark project.
>
>And why isn't it in the Spark project? the argument was that these
>bits were not used and pretty de minimis as code. It's not up to me or
>anyone else to tell you code X isn't useful to you. But arguing X
>should be a TLP asserts it is substantial and of broad interest, since
>there's non-zero effort for volunteers to deal with it. I am not sure
>I've heard anyone argue that -- or did I miss it? because removing
>bits of unused code happens all the time and isn't a bad precedent or
>even unusual.
>
>It doesn't actually enable any more cooperation than is already
>possible with any other project (like Kafka, Mesos, etc). You can run
>the same governance model anywhere you like. I realize literally being
>operated under the ASF banner is something different.
>
>What I hear here is a proposal to make an unofficial official Spark
>project as a TLP, that begins with these fairly inconsequential
>extras. I question the value of that on its face. Example: what goes
>into this project? deleted Spark code only? or is this a glorified
>"contrib" folder with a lower and somehow different bar determined by
>different people?
>
>And at that stage... is it really helping to give that special status?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Sean Owen <so...@cloudera.com>.

On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende <lu...@gmail.com> wrote:
> I know the name might be confusing, but I also think that the projects have
> a very big synergy, more like sibling projects, where "Spark Extras" extends
> the Spark community and develop/maintain components for, and pretty much
> only for, Apache Spark.  Based on your comment above, if making the project
> "Spark-Extras" a more acceptable name, I believe this is ok as well.

This also grants special status to a third-party project. It's not
clear this should be *the* official unofficial third-party Spark
project over some other one. If something's to be blessed, it should
be in the Spark project.

And why isn't it in the Spark project? the argument was that these
bits were not used and pretty de minimis as code. It's not up to me or
anyone else to tell you code X isn't useful to you. But arguing X
should be a TLP asserts it is substantial and of broad interest, since
there's non-zero effort for volunteers to deal with it. I am not sure
I've heard anyone argue that -- or did I miss it? because removing
bits of unused code happens all the time and isn't a bad precedent or
even unusual.

It doesn't actually enable any more cooperation than is already
possible with any other project (like Kafka, Mesos, etc). You can run
the same governance model anywhere you like. I realize literally being
operated under the ASF banner is something different.

What I hear here is a proposal to make an unofficial official Spark
project as a TLP, that begins with these fairly inconsequential
extras. I question the value of that on its face. Example: what goes
into this project? deleted Spark code only? or is this a glorified
"contrib" folder with a lower and somehow different bar determined by
different people?

And at that stage... is it really helping to give that special status?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Luciano Resende <lu...@gmail.com>.

On Fri, Apr 15, 2016 at 9:18 AM, Sean Owen <so...@cloudera.com> wrote:

> Why would this need to be an ASF project of its own? I don't think
> it's possible to have a yet another separate "Spark Extras" TLP (?)
>
> There is already a project to manage these bits of code on Github. How
> about all of the interested parties manage the code there, under the
> same process, under the same license, etc?
>

This whole discussion started when some of the connectors were moved from
Apache to Github, which makes a statement that The "Spark Governance" of
the bits is something very valuable by the community, consumers, and other
companies that are consuming open source code. Being an Apache project also
allows the project to use and share the Apache infrastructure to run the
project.


>
> I'm not against calling it Spark Extras myself but I wonder if that
> needlessly confuses the situation. They aren't part of the Spark TLP
> on purpose, so trying to give it some special middle-ground status
> might just be confusing. The thing that comes to mind immediately is
> "Connectors for Apache Spark", spark-connectors, etc.
>
>
I know the name might be confusing, but I also think that the projects have
a very big synergy, more like sibling projects, where "Spark Extras"
extends the Spark community and develop/maintain components for, and pretty
much only for, Apache Spark.  Based on your comment above, if making the
project "Spark-Extras" a more acceptable name, I believe this is ok as well.

I also understand that the Spark PMC might have concerns with branding, and
that's why we are inviting all members of the Spark PMC to join the project
and help oversee and manage the project.



>
> On Fri, Apr 15, 2016 at 5:01 PM, Luciano Resende <lu...@gmail.com>
> wrote:
> > After some collaboration with other community members, we have created a
> > initial draft for Spark Extras which is available for review at
> >
> >
> https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing
> >
> > We would like to invite other community members to participate in the
> > project, particularly the Spark Committers and PMC (feel free to express
> > interest and I will update the proposal). Another option here is just to
> > give ALL Spark committers write access to "Spark Extras".
> >
> >
> > We also have couple asks from the Spark PMC :
> >
> > - Permission to use "Spark Extras" as the project name. We already
> checked
> > this with Apache Brand Management, and the recommendation was to discuss
> and
> > reach consensus with the Spark PMC.
> >
> > - We would also want to check with the Spark PMC that, in case of
> > successfully creation of  "Spark Extras", if the PMC would be willing to
> > continue the development of the remaining connectors that stayed in Spark
> > 2.0 codebase in the "Spark Extras" project.
> >
> >
> > Thanks in advance, and we welcome any feedback around this proposal
> before
> > we present to the Apache Board for consideration.
> >
> >
> >
> > On Sat, Mar 26, 2016 at 10:07 AM, Luciano Resende <lu...@gmail.com>
> > wrote:
> >>
> >> I believe some of this has been resolved in the context of some parts
> that
> >> had interest in one extra connector, but we still have a few removed,
> and as
> >> you mentioned, we still don't have a simple way or willingness to
> manage and
> >> be current on new packages like kafka. And based on the fact that this
> >> thread is still alive, I believe that other community members might have
> >> other concerns as well.
> >>
> >> After some thought, I believe having a separate project (what was
> >> mentioned here as Spark Extras) to handle Spark Connectors and Spark
> add-ons
> >> in general could be very beneficial to Spark and the overall Spark
> >> community, which would have a central place in Apache to collaborate
> around
> >> related Spark components.
> >>
> >> Some of the benefits on this approach
> >>
> >> - Enables maintaining the connectors inside Apache, following the Apache
> >> governance and release rules, while allowing Spark proper to focus on
> the
> >> core runtime.
> >> - Provides more flexibility in controlling the direction (currency) of
> the
> >> existing connectors (e.g. willing to find a solution and maintain
> multiple
> >> versions of same connectors like kafka 0.8x and 0.9x)
> >> - Becomes a home for other types of Spark related connectors helping
> >> expanding the community around Spark (e.g. Zeppelin see most of it's
> current
> >> contribution around new/enhanced connectors)
> >>
> >> What are some requirements for Spark Extras to be successful:
> >>
> >> - Be up to date with Spark Trunk APIs (based on daily CIs against
> >> SNAPSHOT)
> >> - Adhere to Spark release cycles (have a very little window compared to
> >> Spark release)
> >> - Be more open and flexible to the set of connectors it will accept and
> >> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
> >> have today)
> >>
> >> Where to start Spark Extras
> >>
> >> Depending on the interest here, we could follow the steps of (Apache
> >> Arrow) and start this directly as a TLP, or start as an incubator
> project. I
> >> would consider the first option first.
> >>
> >> Who would participate
> >>
> >> Have thought about this for a bit, and if we go to the direction of
> TLP, I
> >> would say Spark Committers and Apache Members can request to
> participate as
> >> PMC members, while other committers can request to become committers.
> Non
> >> committers would be added based on meritocracy after the start of the
> >> project.
> >>
> >> Project Name
> >>
> >> It would be ideal if we could have a project name that shows close ties
> to
> >> Spark (e.g. Spark Extras or Spark Connectors) but we will need
> permission
> >> and support from whoever is going to evaluate the project proposal (e.g.
> >> Apache Board)
> >>
> >>
> >> Thoughts ?
> >>
> >> Does anyone have any big disagreement or objection to moving into this
> >> direction ?
> >>
> >> Otherwise, who would be interested in joining the project, so I can
> start
> >> working on some concrete proposal ?
> >>
> >>
> >
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Sean Owen <so...@cloudera.com>.

I think this meant to be understood as a community site, and as a
directory listing pointers to third-party projects. It's not a project
of its own, and not part of Spark itself, with no special status. At
least, I think that's how it should be presented and pretty much seems
to come across that way.

On Fri, Apr 15, 2016 at 5:33 PM, Chris Fregly <ch...@fregly.com> wrote:
> and how does this all relate to the existing 1-and-a-half-class citizen
> known as spark-packages.org?
>
> support for this citizen is buried deep in the Spark source (which was
> always a bit odd, in my opinion):
>
> https://github.com/apache/spark/search?utf8=%E2%9C%93&q=spark-packages
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Chris Fregly <ch...@fregly.com>.

and how does this all relate to the existing 1-and-a-half-class citizen
known as spark-packages.org?

support for this citizen is buried deep in the Spark source (which was
always a bit odd, in my opinion):

https://github.com/apache/spark/search?utf8=%E2%9C%93&q=spark-packages


On Fri, Apr 15, 2016 at 12:18 PM, Sean Owen <so...@cloudera.com> wrote:

> Why would this need to be an ASF project of its own? I don't think
> it's possible to have a yet another separate "Spark Extras" TLP (?)
>
> There is already a project to manage these bits of code on Github. How
> about all of the interested parties manage the code there, under the
> same process, under the same license, etc?
>
> I'm not against calling it Spark Extras myself but I wonder if that
> needlessly confuses the situation. They aren't part of the Spark TLP
> on purpose, so trying to give it some special middle-ground status
> might just be confusing. The thing that comes to mind immediately is
> "Connectors for Apache Spark", spark-connectors, etc.
>
>
> On Fri, Apr 15, 2016 at 5:01 PM, Luciano Resende <lu...@gmail.com>
> wrote:
> > After some collaboration with other community members, we have created a
> > initial draft for Spark Extras which is available for review at
> >
> >
> https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing
> >
> > We would like to invite other community members to participate in the
> > project, particularly the Spark Committers and PMC (feel free to express
> > interest and I will update the proposal). Another option here is just to
> > give ALL Spark committers write access to "Spark Extras".
> >
> >
> > We also have couple asks from the Spark PMC :
> >
> > - Permission to use "Spark Extras" as the project name. We already
> checked
> > this with Apache Brand Management, and the recommendation was to discuss
> and
> > reach consensus with the Spark PMC.
> >
> > - We would also want to check with the Spark PMC that, in case of
> > successfully creation of  "Spark Extras", if the PMC would be willing to
> > continue the development of the remaining connectors that stayed in Spark
> > 2.0 codebase in the "Spark Extras" project.
> >
> >
> > Thanks in advance, and we welcome any feedback around this proposal
> before
> > we present to the Apache Board for consideration.
> >
> >
> >
> > On Sat, Mar 26, 2016 at 10:07 AM, Luciano Resende <lu...@gmail.com>
> > wrote:
> >>
> >> I believe some of this has been resolved in the context of some parts
> that
> >> had interest in one extra connector, but we still have a few removed,
> and as
> >> you mentioned, we still don't have a simple way or willingness to
> manage and
> >> be current on new packages like kafka. And based on the fact that this
> >> thread is still alive, I believe that other community members might have
> >> other concerns as well.
> >>
> >> After some thought, I believe having a separate project (what was
> >> mentioned here as Spark Extras) to handle Spark Connectors and Spark
> add-ons
> >> in general could be very beneficial to Spark and the overall Spark
> >> community, which would have a central place in Apache to collaborate
> around
> >> related Spark components.
> >>
> >> Some of the benefits on this approach
> >>
> >> - Enables maintaining the connectors inside Apache, following the Apache
> >> governance and release rules, while allowing Spark proper to focus on
> the
> >> core runtime.
> >> - Provides more flexibility in controlling the direction (currency) of
> the
> >> existing connectors (e.g. willing to find a solution and maintain
> multiple
> >> versions of same connectors like kafka 0.8x and 0.9x)
> >> - Becomes a home for other types of Spark related connectors helping
> >> expanding the community around Spark (e.g. Zeppelin see most of it's
> current
> >> contribution around new/enhanced connectors)
> >>
> >> What are some requirements for Spark Extras to be successful:
> >>
> >> - Be up to date with Spark Trunk APIs (based on daily CIs against
> >> SNAPSHOT)
> >> - Adhere to Spark release cycles (have a very little window compared to
> >> Spark release)
> >> - Be more open and flexible to the set of connectors it will accept and
> >> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
> >> have today)
> >>
> >> Where to start Spark Extras
> >>
> >> Depending on the interest here, we could follow the steps of (Apache
> >> Arrow) and start this directly as a TLP, or start as an incubator
> project. I
> >> would consider the first option first.
> >>
> >> Who would participate
> >>
> >> Have thought about this for a bit, and if we go to the direction of
> TLP, I
> >> would say Spark Committers and Apache Members can request to
> participate as
> >> PMC members, while other committers can request to become committers.
> Non
> >> committers would be added based on meritocracy after the start of the
> >> project.
> >>
> >> Project Name
> >>
> >> It would be ideal if we could have a project name that shows close ties
> to
> >> Spark (e.g. Spark Extras or Spark Connectors) but we will need
> permission
> >> and support from whoever is going to evaluate the project proposal (e.g.
> >> Apache Board)
> >>
> >>
> >> Thoughts ?
> >>
> >> Does anyone have any big disagreement or objection to moving into this
> >> direction ?
> >>
> >> Otherwise, who would be interested in joining the project, so I can
> start
> >> working on some concrete proposal ?
> >>
> >>
> >
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Mridul Muralidharan <mr...@gmail.com>.

On Friday, April 15, 2016, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Yeah in support of this statement I think that my primary interest in
> this Spark Extras and the good work by Luciano here is that anytime we
> take bits out of a code base and “move it to GitHub” I see a bad precedent
> being set.


Can't agree more !



>
> Creating this project at the ASF creates a synergy between *Apache Spark*
> which is *at the ASF*.


In addition, this will give all the "goodness " of being an Apache project
from a user/consumer point of view compared to a general  github project.




>
> We welcome comments and as Luciano said, this is meant to invite and be
> open to those in the Apache Spark PMC to join and help.
>
>
This would definitely be something worthwhile to explore.
+1

Regards
Mridul



> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov <javascript:;>
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
> On 4/15/16, 9:39 AM, "Luciano Resende" <luckbr1975@gmail.com
> <javascript:;>> wrote:
>
> >
> >
> >On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger
> ><cody@koeninger.org <javascript:;>> wrote:
> >
> >Given that not all of the connectors were removed, I think this
> >creates a weird / confusing three tier system
> >
> >1. connectors in the official project's spark/extras or spark/external
> >2. connectors in "Spark Extras"
> >3. connectors in some random organization's github
> >
> >
> >
> >
> >
> >
> >
> >Agree Cody, and I think this is one of the goals of "Spark Extras",
> centralize the development of these connectors under one central place at
> Apache, and that's why one of our asks is to invite the Spark PMC to
> continue developing the remaining connectors
> > that stayed in Spark proper, in "Spark Extras". We will also discuss
> some process policies on enabling lowering the bar to allow proposal of
> these other github extensions to be part of "Spark Extras" while also
> considering a way to move code to a maintenance
> > mode location.
> >
> >
> >
> >
> >--
> >Luciano Resende
> >http://twitter.com/lresende1975
> >http://lresende.blogspot.com/
> >
> >
> >
> >
>

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.

Hey Reynold,

Thanks. Getting to the heart of this, I think that this project would
be successful if the Apache Spark PMC decided to participate and there
was some overlap. As much as I think it would be great to stand up another
project, the goal here from Luciano and crew (myself included) would be
to suggest it’s just as easy to start an Apache Incubator project to 
manage “extra” pieces of Apache Spark code outside of the release cycle
and the other reasons stated that it made sense to move this code out of
the code base. This isn’t a competing effort to some code on GitHub that
was moved out of Apache source control from Apache Spark - it’s meant to 
be an enabler to suggest that code could be managed here just as easily
(see the difference?)

Let me know what you think thanks Reynold.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++









On 4/15/16, 9:47 AM, "Reynold Xin" <rx...@databricks.com> wrote:

>
>
>
>Anybody is free and welcomed to create another ASF project, but I don't think "Spark extras" is a good name. It unnecessarily creates another tier of code that ASF is "endorsing".
>On Friday, April 15, 2016, Mattmann, Chris A (3980) <ch...@jpl.nasa.gov> wrote:
>
>Yeah in support of this statement I think that my primary interest in
>this Spark Extras and the good work by Luciano here is that anytime we
>take bits out of a code base and “move it to GitHub” I see a bad precedent
>being set.
>
>Creating this project at the ASF creates a synergy between *Apache Spark*
>which is *at the ASF*.
>
>We welcome comments and as Luciano said, this is meant to invite and be
>open to those in the Apache Spark PMC to join and help.
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: 
>chris.a.mattmann@nasa.gov <javascript:;>
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Director, Information Retrieval and Data Science Group (IRDS)
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>WWW: http://irds.usc.edu/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
>On 4/15/16, 9:39 AM, "Luciano Resende" <luckbr1975@gmail.com <javascript:;>> wrote:
>
>>
>>
>>On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger
>><cody@koeninger.org <javascript:;>> wrote:
>>
>>Given that not all of the connectors were removed, I think this
>>creates a weird / confusing three tier system
>>
>>1. connectors in the official project's spark/extras or spark/external
>>2. connectors in "Spark Extras"
>>3. connectors in some random organization's github
>>
>>
>>
>>
>>
>>
>>
>>Agree Cody, and I think this is one of the goals of "Spark Extras", centralize the development of these connectors under one central place at Apache, and that's why one of our asks is to invite the Spark PMC to continue developing the remaining connectors
>> that stayed in Spark proper, in "Spark Extras". We will also discuss some process policies on enabling lowering the bar to allow proposal of these other github extensions to be part of "Spark Extras" while also considering a way to move code to a maintenance
>> mode location.
>>
>>
>>
>>
>>--
>>Luciano Resende
>>http://twitter.com/lresende1975
>>http://lresende.blogspot.com/
>>
>>
>>
>>
>
>
>

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Reynold Xin <rx...@databricks.com>.

Anybody is free and welcomed to create another ASF project, but I don't
think "Spark extras" is a good name. It unnecessarily creates another tier
of code that ASF is "endorsing".

On Friday, April 15, 2016, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Yeah in support of this statement I think that my primary interest in
> this Spark Extras and the good work by Luciano here is that anytime we
> take bits out of a code base and “move it to GitHub” I see a bad precedent
> being set.
>
> Creating this project at the ASF creates a synergy between *Apache Spark*
> which is *at the ASF*.
>
> We welcome comments and as Luciano said, this is meant to invite and be
> open to those in the Apache Spark PMC to join and help.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov <javascript:;>
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
> On 4/15/16, 9:39 AM, "Luciano Resende" <luckbr1975@gmail.com
> <javascript:;>> wrote:
>
> >
> >
> >On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger
> ><cody@koeninger.org <javascript:;>> wrote:
> >
> >Given that not all of the connectors were removed, I think this
> >creates a weird / confusing three tier system
> >
> >1. connectors in the official project's spark/extras or spark/external
> >2. connectors in "Spark Extras"
> >3. connectors in some random organization's github
> >
> >
> >
> >
> >
> >
> >
> >Agree Cody, and I think this is one of the goals of "Spark Extras",
> centralize the development of these connectors under one central place at
> Apache, and that's why one of our asks is to invite the Spark PMC to
> continue developing the remaining connectors
> > that stayed in Spark proper, in "Spark Extras". We will also discuss
> some process policies on enabling lowering the bar to allow proposal of
> these other github extensions to be part of "Spark Extras" while also
> considering a way to move code to a maintenance
> > mode location.
> >
> >
> >
> >
> >--
> >Luciano Resende
> >http://twitter.com/lresende1975
> >http://lresende.blogspot.com/
> >
> >
> >
> >
>

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Luciano Resende <lu...@gmail.com>.

Just want to provide a quick update that we have submitted the "Spark
Extras" proposal for review by the Apache board (see link below with the
contents).

https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing

Note that we are in the quest for a project name that does not have "Spark"
as part of it, and we will provide an update here when we find a suitable
name. Suggestions are welcome (please send them directly to my inbox to
avoid flooding the mailing list).

Thanks


On Sun, Apr 17, 2016 at 9:16 AM, Luciano Resende <lu...@gmail.com>
wrote:

>
>
> On Sat, Apr 16, 2016 at 11:12 PM, Reynold Xin <rx...@apache.org> wrote:
>
>> First, really thank you for leading the discussion.
>>
>> I am concerned that it'd hurt Spark more than it helps. As many others
>> have pointed out, this unnecessarily creates a new tier of connectors or
>> 3rd party libraries appearing to be endorsed by the Spark PMC or the ASF.
>> We can alleviate this concern by not having "Spark" in the name, and the
>> project proposal and documentation should label clearly that this is not
>> affiliated with Spark.
>>
>
> I really thought we could use the Spark name (e.g. similar to
> spark-packages) as this project is really aligned and dedicated to curating
> extensions to Apache Spark and that's why we were inviting Spark PMC
> members to join the new project PMC so that Apache Spark has the necessary
> oversight and influence on the project direction. I understand folks have
> concerns with the name, and thus we will start looking into name
> alternatives unless there is any way I could address the community concerns
> around this.
>
>
>>
>> Also Luciano - assuming you are interested in creating a project like
>> this and find a home for the connectors that were removed, I find it
>> surprising that few of the initially proposed PMC members have actually
>> contributed much to the connectors, and people that have contributed a lot
>> were left out. I am sure that is just an oversight.
>>
>>
> Reynold, thanks for your concern, we are not leaving anyone out, we took
> the following criteria to identify initial PMC/Committers list as described
> on the first e-mail on this thread:
>
>    - Spark Committers and Apache Members can request to participate as PMC
> members
>    - All active spark committers (committed on the last one year) will
> have write access to the project (committer access)
>    - Other committers can request to become committers.
>    - Non committers would be added based on meritocracy after the start of
> the project.
>
> Based on this criteria, all people that have expressed interest in joining
> the project PMC has been added to it, but I don't feel comfortable adding
> names to it at my will. And I have updated the list of committers and
> currently we have the following on the draft proposal:
>
>
> Initial PMC
>
>
>    -
>
>    Luciano Resende (lresende AT apache DOT org) (Apache Member)
>    -
>
>    Chris Mattmann (mattmann  AT apache DOT org) (Apache Member, Apache
>    board member)
>    -
>
>    Steve Loughran (stevel AT apache DOT org) (Apache Member)
>    -
>
>    Jean-Baptiste Onofré (jbonofre  AT apache DOT org) (Apache Member)
>    -
>
>    Marcelo Masiero Vanzin (vanzin AT apache DOT org) (Apache Spark
>    committer)
>    -
>
>    Sean R. Owen (srowen AT apache DOT org) (Apache Member and Spark PMC)
>    -
>
>    Mridul Muralidharan (mridulm80 AT apache DOT org) (Apache Spark PMC)
>
>
> Initial Committers (write access to active Spark committers that have
> committed in the last one year)
>
>
>    -
>
>    Andy Konwinski (andrew AT apache DOT org) (Apache Spark)
>    -
>
>    Andrew Or (andrewor14 AT apache DOT org) (Apache Spark)
>    -
>
>    Ankur Dave (ankurdave AT apache DOT org) (Apache Spark)
>    -
>
>    Davies Liu (davies AT apache DOT org) (Apache Spark)
>    -
>
>    DB Tsai (dbtsai AT apache DOT org) (Apache Spark)
>    -
>
>    Haoyuan Li (haoyuan AT apache DOT org) (Apache Spark)
>    -
>
>    Ram Sriharsha (harsha AT apache DOT org) (Apache Spark)
>    -
>
>    Herman van Hövell (hvanhovell AT apache DOT org) (Apache Spark)
>    -
>
>    Imran Rashid (irashid AT apache DOT org) (Apache Spark)
>    -
>
>    Joseph Kurata Bradley (jkbradley AT apache DOT org) (Apache Spark)
>    -
>
>    Josh Rosen (joshrosen AT apache DOT org) (Apache Spark)
>    -
>
>    Kay Ousterhout (kayousterhout AT apache DOT org) (Apache Spark)
>    -
>
>    Cheng Lian (lian AT apache DOT org) (Apache Spark)
>    -
>
>    Mark Hamstra (markhamstra AT apache DOT org) (Apache Spark)
>    -
>
>    Michael Armbrust (marmbrus AT apache DOT org) (Apache Spark)
>    -
>
>    Matei Alexandru Zaharia (matei AT apache DOT org) (Apache Spark)
>    -
>
>    Xiangrui Meng (meng AT apache DOT org) (Apache Spark)
>    -
>
>    Prashant Sharma (prashant AT apache DOT org) (Apache Spark)
>    -
>
>    Patrick Wendell (pwendell AT apache DOT org) (Apache Spark)
>    -
>
>    Reynold Xin (rxin AT apache DOT org) (Apache Spark)
>    -
>
>    Sanford Ryza (sandy AT apache DOT org) (Apache Spark)
>    -
>
>    Kousuke Saruta (sarutak AT apache DOT org) (Apache Spark)
>    -
>
>    Shivaram Venkataraman (shivaram AT apache DOT org) (Apache Spark)
>    -
>
>    Tathagata Das (tdas AT apache DOT org) (Apache Spark)
>    -
>
>    Thomas Graves  (tgraves AT apache DOT org) (Apache Spark)
>    -
>
>    Wenchen Fan (wenchen AT apache DOT org) (Apache Spark)
>    -
>
>    Yin Huai (yhuai AT apache DOT org) (Apache Spark)
>    - Shixiong Zhu (zsxwing AT apache DOT org) (Apache Spark)
>
>
>
> BTW, It would be really good to have you on the PMC as well, and any
> others that volunteer based on the criteria above. May I add you as PMC to
> the new project proposal ?
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Luciano Resende <lu...@gmail.com>.

Evan,

As long as you meet the criteria we discussed on this thread, you are
welcome to join.

Having said that, I have already seen other contributors that are very
active on some of connectors but are not Apache Committers yet, and i
wanted to be fair, and also avoid using the project as an avenue to bring
new committers to Apache.


On Sun, Apr 17, 2016 at 10:07 PM, Evan Chan <ve...@gmail.com> wrote:

> Hi Luciano,
>
> I see that you are inviting all the Spark committers to this new project.
> What about the chief maintainers of important Spark ecosystem projects,
> which are not on the Spark PMC?
>
> For example, I am the chief maintainer of the Spark Job Server, which is
> one of the most active projects in the larger Spark ecosystem.  Would
> projects like this be part of your vision?   If so, it would be a good step
> of faith to reach out to us that maintain the active ecosystem projects.
>  (I’m not saying you should put me in :)  but rather suggesting that if
> this is your aim, it would be good to reach out beyond just the Spark PMC
> members.
>
> thanks,
> Evan
>
> On Apr 17, 2016, at 9:16 AM, Luciano Resende <lu...@gmail.com> wrote:
>
>
>
> On Sat, Apr 16, 2016 at 11:12 PM, Reynold Xin <rx...@apache.org> wrote:
>
>> First, really thank you for leading the discussion.
>>
>> I am concerned that it'd hurt Spark more than it helps. As many others
>> have pointed out, this unnecessarily creates a new tier of connectors or
>> 3rd party libraries appearing to be endorsed by the Spark PMC or the ASF.
>> We can alleviate this concern by not having "Spark" in the name, and the
>> project proposal and documentation should label clearly that this is not
>> affiliated with Spark.
>>
>
> I really thought we could use the Spark name (e.g. similar to
> spark-packages) as this project is really aligned and dedicated to curating
> extensions to Apache Spark and that's why we were inviting Spark PMC
> members to join the new project PMC so that Apache Spark has the necessary
> oversight and influence on the project direction. I understand folks have
> concerns with the name, and thus we will start looking into name
> alternatives unless there is any way I could address the community concerns
> around this.
>
>
>>
>> Also Luciano - assuming you are interested in creating a project like
>> this and find a home for the connectors that were removed, I find it
>> surprising that few of the initially proposed PMC members have actually
>> contributed much to the connectors, and people that have contributed a lot
>> were left out. I am sure that is just an oversight.
>>
>>
> Reynold, thanks for your concern, we are not leaving anyone out, we took
> the following criteria to identify initial PMC/Committers list as described
> on the first e-mail on this thread:
>
>    - Spark Committers and Apache Members can request to participate as PMC
> members
>    - All active spark committers (committed on the last one year) will
> have write access to the project (committer access)
>    - Other committers can request to become committers.
>    - Non committers would be added based on meritocracy after the start of
> the project.
>
> Based on this criteria, all people that have expressed interest in joining
> the project PMC has been added to it, but I don't feel comfortable adding
> names to it at my will. And I have updated the list of committers and
> currently we have the following on the draft proposal:
>
>
> Initial PMC
>
>
>    - Luciano Resende (lresende AT apache DOT org) (Apache Member)
>    - Chris Mattmann (mattmann  AT apache DOT org) (Apache Member, Apache
>    board member)
>    - Steve Loughran (stevel AT apache DOT org) (Apache Member)
>    - Jean-Baptiste Onofré (jbonofre  AT apache DOT org) (Apache Member)
>    - Marcelo Masiero Vanzin (vanzin AT apache DOT org) (Apache Spark
>    committer)
>    - Sean R. Owen (srowen AT apache DOT org) (Apache Member and Spark PMC)
>    - Mridul Muralidharan (mridulm80 AT apache DOT org) (Apache Spark PMC)
>
>
> Initial Committers (write access to active Spark committers that have
> committed in the last one year)
>
>
>    - Andy Konwinski (andrew AT apache DOT org) (Apache Spark)
>    - Andrew Or (andrewor14 AT apache DOT org) (Apache Spark)
>    - Ankur Dave (ankurdave AT apache DOT org) (Apache Spark)
>    - Davies Liu (davies AT apache DOT org) (Apache Spark)
>    - DB Tsai (dbtsai AT apache DOT org) (Apache Spark)
>    - Haoyuan Li (haoyuan AT apache DOT org) (Apache Spark)
>    - Ram Sriharsha (harsha AT apache DOT org) (Apache Spark)
>    - Herman van Hövell (hvanhovell AT apache DOT org) (Apache Spark)
>    - Imran Rashid (irashid AT apache DOT org) (Apache Spark)
>    - Joseph Kurata Bradley (jkbradley AT apache DOT org) (Apache Spark)
>    - Josh Rosen (joshrosen AT apache DOT org) (Apache Spark)
>    - Kay Ousterhout (kayousterhout AT apache DOT org) (Apache Spark)
>    - Cheng Lian (lian AT apache DOT org) (Apache Spark)
>    - Mark Hamstra (markhamstra AT apache DOT org) (Apache Spark)
>    - Michael Armbrust (marmbrus AT apache DOT org) (Apache Spark)
>    - Matei Alexandru Zaharia (matei AT apache DOT org) (Apache Spark)
>    - Xiangrui Meng (meng AT apache DOT org) (Apache Spark)
>    - Prashant Sharma (prashant AT apache DOT org) (Apache Spark)
>    - Patrick Wendell (pwendell AT apache DOT org) (Apache Spark)
>    - Reynold Xin (rxin AT apache DOT org) (Apache Spark)
>    - Sanford Ryza (sandy AT apache DOT org) (Apache Spark)
>    - Kousuke Saruta (sarutak AT apache DOT org) (Apache Spark)
>    - Shivaram Venkataraman (shivaram AT apache DOT org) (Apache Spark)
>    - Tathagata Das (tdas AT apache DOT org) (Apache Spark)
>    - Thomas Graves  (tgraves AT apache DOT org) (Apache Spark)
>    - Wenchen Fan (wenchen AT apache DOT org) (Apache Spark)
>    - Yin Huai (yhuai AT apache DOT org) (Apache Spark)
>    - Shixiong Zhu (zsxwing AT apache DOT org) (Apache Spark)
>
>
>
> BTW, It would be really good to have you on the PMC as well, and any
> others that volunteer based on the criteria above. May I add you as PMC to
> the new project proposal ?
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Luciano Resende <lu...@gmail.com>.

On Sat, Apr 16, 2016 at 11:12 PM, Reynold Xin <rx...@apache.org> wrote:

> First, really thank you for leading the discussion.
>
> I am concerned that it'd hurt Spark more than it helps. As many others
> have pointed out, this unnecessarily creates a new tier of connectors or
> 3rd party libraries appearing to be endorsed by the Spark PMC or the ASF.
> We can alleviate this concern by not having "Spark" in the name, and the
> project proposal and documentation should label clearly that this is not
> affiliated with Spark.
>

I really thought we could use the Spark name (e.g. similar to
spark-packages) as this project is really aligned and dedicated to curating
extensions to Apache Spark and that's why we were inviting Spark PMC
members to join the new project PMC so that Apache Spark has the necessary
oversight and influence on the project direction. I understand folks have
concerns with the name, and thus we will start looking into name
alternatives unless there is any way I could address the community concerns
around this.


>
> Also Luciano - assuming you are interested in creating a project like this
> and find a home for the connectors that were removed, I find it surprising
> that few of the initially proposed PMC members have actually contributed
> much to the connectors, and people that have contributed a lot were left
> out. I am sure that is just an oversight.
>
>
Reynold, thanks for your concern, we are not leaving anyone out, we took
the following criteria to identify initial PMC/Committers list as described
on the first e-mail on this thread:

   - Spark Committers and Apache Members can request to participate as PMC
members
   - All active spark committers (committed on the last one year) will have
write access to the project (committer access)
   - Other committers can request to become committers.
   - Non committers would be added based on meritocracy after the start of
the project.

Based on this criteria, all people that have expressed interest in joining
the project PMC has been added to it, but I don't feel comfortable adding
names to it at my will. And I have updated the list of committers and
currently we have the following on the draft proposal:


Initial PMC


   -

   Luciano Resende (lresende AT apache DOT org) (Apache Member)
   -

   Chris Mattmann (mattmann  AT apache DOT org) (Apache Member, Apache
   board member)
   -

   Steve Loughran (stevel AT apache DOT org) (Apache Member)
   -

   Jean-Baptiste Onofré (jbonofre  AT apache DOT org) (Apache Member)
   -

   Marcelo Masiero Vanzin (vanzin AT apache DOT org) (Apache Spark
   committer)
   -

   Sean R. Owen (srowen AT apache DOT org) (Apache Member and Spark PMC)
   -

   Mridul Muralidharan (mridulm80 AT apache DOT org) (Apache Spark PMC)


Initial Committers (write access to active Spark committers that have
committed in the last one year)


   -

   Andy Konwinski (andrew AT apache DOT org) (Apache Spark)
   -

   Andrew Or (andrewor14 AT apache DOT org) (Apache Spark)
   -

   Ankur Dave (ankurdave AT apache DOT org) (Apache Spark)
   -

   Davies Liu (davies AT apache DOT org) (Apache Spark)
   -

   DB Tsai (dbtsai AT apache DOT org) (Apache Spark)
   -

   Haoyuan Li (haoyuan AT apache DOT org) (Apache Spark)
   -

   Ram Sriharsha (harsha AT apache DOT org) (Apache Spark)
   -

   Herman van Hövell (hvanhovell AT apache DOT org) (Apache Spark)
   -

   Imran Rashid (irashid AT apache DOT org) (Apache Spark)
   -

   Joseph Kurata Bradley (jkbradley AT apache DOT org) (Apache Spark)
   -

   Josh Rosen (joshrosen AT apache DOT org) (Apache Spark)
   -

   Kay Ousterhout (kayousterhout AT apache DOT org) (Apache Spark)
   -

   Cheng Lian (lian AT apache DOT org) (Apache Spark)
   -

   Mark Hamstra (markhamstra AT apache DOT org) (Apache Spark)
   -

   Michael Armbrust (marmbrus AT apache DOT org) (Apache Spark)
   -

   Matei Alexandru Zaharia (matei AT apache DOT org) (Apache Spark)
   -

   Xiangrui Meng (meng AT apache DOT org) (Apache Spark)
   -

   Prashant Sharma (prashant AT apache DOT org) (Apache Spark)
   -

   Patrick Wendell (pwendell AT apache DOT org) (Apache Spark)
   -

   Reynold Xin (rxin AT apache DOT org) (Apache Spark)
   -

   Sanford Ryza (sandy AT apache DOT org) (Apache Spark)
   -

   Kousuke Saruta (sarutak AT apache DOT org) (Apache Spark)
   -

   Shivaram Venkataraman (shivaram AT apache DOT org) (Apache Spark)
   -

   Tathagata Das (tdas AT apache DOT org) (Apache Spark)
   -

   Thomas Graves  (tgraves AT apache DOT org) (Apache Spark)
   -

   Wenchen Fan (wenchen AT apache DOT org) (Apache Spark)
   -

   Yin Huai (yhuai AT apache DOT org) (Apache Spark)
   - Shixiong Zhu (zsxwing AT apache DOT org) (Apache Spark)



BTW, It would be really good to have you on the PMC as well, and any others
that volunteer based on the criteria above. May I add you as PMC to the new
project proposal ?



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Reynold Xin <rx...@apache.org>.

First, really thank you for leading the discussion.

I am concerned that it'd hurt Spark more than it helps. As many others have
pointed out, this unnecessarily creates a new tier of connectors or 3rd
party libraries appearing to be endorsed by the Spark PMC or the ASF. We
can alleviate this concern by not having "Spark" in the name, and the
project proposal and documentation should label clearly that this is not
affiliated with Spark.

Also Luciano - assuming you are interested in creating a project like this
and find a home for the connectors that were removed, I find it surprising
that few of the initially proposed PMC members have actually contributed
much to the connectors, and people that have contributed a lot were left
out. I am sure that is just an oversight.



On Sat, Apr 16, 2016 at 10:42 PM, Luciano Resende <lu...@gmail.com>
wrote:

>
>
> On Sat, Apr 16, 2016 at 5:38 PM, Evan Chan <ve...@gmail.com>
> wrote:
>
>> Hi folks,
>>
>> Sorry to join the discussion late.  I had a look at the design doc
>> earlier in this thread, and it was not mentioned what types of
>> projects are the targets of this new "spark extras" ASF umbrella....
>>
>> Is the desire to have a maintained set of spark-related projects that
>> keep pace with the main Spark development schedule?  Is it just for
>> streaming connectors?  what about data sources, and other important
>> projects in the Spark ecosystem?
>>
>
> The proposal draft below has some more details on what type of projects,
> but in summary, "Spark-Extras" would be a good place for any of these
> components you mentioned.
>
>
> https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing
>
>
>>
>> I'm worried that this would relegate spark-packages to third tier
>> status,
>
>
> Owen answered a similar question about spark-packages earlier on this
> thread, but while "Spark-Extras" would a place in Apache for collaboration
> on the development of these extensions, they might still be published to
> spark-packages as they existing streaming connectors are today.
>
>
>> and the promotion of a select set of committers, and the
>> project itself, to top level ASF status (a la Arrow) would create a
>> further split in the community.
>>
>>
> As for the select set of committers, we have invited all Spark committers
> to be committers on the project, and I have updated the project proposal
> with the existing set of active Spark committers ( that have committed in
> the last one year)
>
>
>>
>> -Evan
>>
>> On Sat, Apr 16, 2016 at 4:46 AM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> >
>> >
>> >
>> >
>> >
>> > On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>> >
>> >>Yeah in support of this statement I think that my primary interest in
>> >>this Spark Extras and the good work by Luciano here is that anytime we
>> >>take bits out of a code base and “move it to GitHub” I see a bad
>> precedent
>> >>being set.
>> >>
>> >>Creating this project at the ASF creates a synergy between *Apache
>> Spark*
>> >>which is *at the ASF*.
>> >>
>> >>We welcome comments and as Luciano said, this is meant to invite and be
>> >>open to those in the Apache Spark PMC to join and help.
>> >>
>> >>Cheers,
>> >>Chris
>> >
>> > As one of the people named, here's my rationale:
>> >
>> > Throwing stuff into github creates that world of branches, and its no
>> longer something that could be managed through the ASF, where managed is:
>> governance, participation and a release process that includes auditing
>> dependencies, code-signoff, etc,
>> >
>> >
>> > As an example, there's a mutant hive JAR which spark uses, that's
>> something which currently evolved between my repo and Patrick Wendell's;
>> now that Josh Rosen has taken on the bold task of "trying to move spark and
>> twill to Kryo 3", he's going to own that code, and now the reference branch
>> will move somewhere else.
>> >
>> > In contrast, if there was an ASF location for this, then it'd be
>> something anyone with commit rights could maintain and publish
>> >
>> > (actually, I've just realised life is hard here as the hive is a fork
>> of ASF hive —really the spark branch should be a separate branch in Hive's
>> own repo ... But the concept is the same: those bits of the codebase which
>> are core parts of the spark project should really live in or near it)
>> >
>> >
>> > If everyone on the spark commit list gets write access to this extras
>> repo, moving things is straightforward. Release wise, things could/should
>> be in sync.
>> >
>> > If there's a risk, its the eternal problem of the contrib/ dir ....
>> Stuff ends up there that never gets maintained. I don't see that being any
>> worse than if things were thrown to the wind of a thousand github repos: at
>> least now there'd be a central issue tracking location.
>>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Luciano Resende <lu...@gmail.com>.

On Sat, Apr 16, 2016 at 5:38 PM, Evan Chan <ve...@gmail.com> wrote:

> Hi folks,
>
> Sorry to join the discussion late.  I had a look at the design doc
> earlier in this thread, and it was not mentioned what types of
> projects are the targets of this new "spark extras" ASF umbrella....
>
> Is the desire to have a maintained set of spark-related projects that
> keep pace with the main Spark development schedule?  Is it just for
> streaming connectors?  what about data sources, and other important
> projects in the Spark ecosystem?
>

The proposal draft below has some more details on what type of projects,
but in summary, "Spark-Extras" would be a good place for any of these
components you mentioned.

https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing


>
> I'm worried that this would relegate spark-packages to third tier
> status,


Owen answered a similar question about spark-packages earlier on this
thread, but while "Spark-Extras" would a place in Apache for collaboration
on the development of these extensions, they might still be published to
spark-packages as they existing streaming connectors are today.


> and the promotion of a select set of committers, and the
> project itself, to top level ASF status (a la Arrow) would create a
> further split in the community.
>
>
As for the select set of committers, we have invited all Spark committers
to be committers on the project, and I have updated the project proposal
with the existing set of active Spark committers ( that have committed in
the last one year)


>
> -Evan
>
> On Sat, Apr 16, 2016 at 4:46 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
> >
> >
> >
> >
> >
> > On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" <
> chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> >>Yeah in support of this statement I think that my primary interest in
> >>this Spark Extras and the good work by Luciano here is that anytime we
> >>take bits out of a code base and “move it to GitHub” I see a bad
> precedent
> >>being set.
> >>
> >>Creating this project at the ASF creates a synergy between *Apache Spark*
> >>which is *at the ASF*.
> >>
> >>We welcome comments and as Luciano said, this is meant to invite and be
> >>open to those in the Apache Spark PMC to join and help.
> >>
> >>Cheers,
> >>Chris
> >
> > As one of the people named, here's my rationale:
> >
> > Throwing stuff into github creates that world of branches, and its no
> longer something that could be managed through the ASF, where managed is:
> governance, participation and a release process that includes auditing
> dependencies, code-signoff, etc,
> >
> >
> > As an example, there's a mutant hive JAR which spark uses, that's
> something which currently evolved between my repo and Patrick Wendell's;
> now that Josh Rosen has taken on the bold task of "trying to move spark and
> twill to Kryo 3", he's going to own that code, and now the reference branch
> will move somewhere else.
> >
> > In contrast, if there was an ASF location for this, then it'd be
> something anyone with commit rights could maintain and publish
> >
> > (actually, I've just realised life is hard here as the hive is a fork of
> ASF hive —really the spark branch should be a separate branch in Hive's own
> repo ... But the concept is the same: those bits of the codebase which are
> core parts of the spark project should really live in or near it)
> >
> >
> > If everyone on the spark commit list gets write access to this extras
> repo, moving things is straightforward. Release wise, things could/should
> be in sync.
> >
> > If there's a risk, its the eternal problem of the contrib/ dir ....
> Stuff ends up there that never gets maintained. I don't see that being any
> worse than if things were thrown to the wind of a thousand github repos: at
> least now there'd be a central issue tracking location.
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Evan Chan <ve...@gmail.com>.

Hi folks,

Sorry to join the discussion late.  I had a look at the design doc
earlier in this thread, and it was not mentioned what types of
projects are the targets of this new "spark extras" ASF umbrella....

Is the desire to have a maintained set of spark-related projects that
keep pace with the main Spark development schedule?  Is it just for
streaming connectors?  what about data sources, and other important
projects in the Spark ecosystem?

I'm worried that this would relegate spark-packages to third tier
status, and the promotion of a select set of committers, and the
project itself, to top level ASF status (a la Arrow) would create a
further split in the community.

-Evan

On Sat, Apr 16, 2016 at 4:46 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
>
>
>
> On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> wrote:
>
>>Yeah in support of this statement I think that my primary interest in
>>this Spark Extras and the good work by Luciano here is that anytime we
>>take bits out of a code base and “move it to GitHub” I see a bad precedent
>>being set.
>>
>>Creating this project at the ASF creates a synergy between *Apache Spark*
>>which is *at the ASF*.
>>
>>We welcome comments and as Luciano said, this is meant to invite and be
>>open to those in the Apache Spark PMC to join and help.
>>
>>Cheers,
>>Chris
>
> As one of the people named, here's my rationale:
>
> Throwing stuff into github creates that world of branches, and its no longer something that could be managed through the ASF, where managed is: governance, participation and a release process that includes auditing dependencies, code-signoff, etc,
>
>
> As an example, there's a mutant hive JAR which spark uses, that's something which currently evolved between my repo and Patrick Wendell's; now that Josh Rosen has taken on the bold task of "trying to move spark and twill to Kryo 3", he's going to own that code, and now the reference branch will move somewhere else.
>
> In contrast, if there was an ASF location for this, then it'd be something anyone with commit rights could maintain and publish
>
> (actually, I've just realised life is hard here as the hive is a fork of ASF hive —really the spark branch should be a separate branch in Hive's own repo ... But the concept is the same: those bits of the codebase which are core parts of the spark project should really live in or near it)
>
>
> If everyone on the spark commit list gets write access to this extras repo, moving things is straightforward. Release wise, things could/should be in sync.
>
> If there's a risk, its the eternal problem of the contrib/ dir .... Stuff ends up there that never gets maintained. I don't see that being any worse than if things were thrown to the wind of a thousand github repos: at least now there'd be a central issue tracking location.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Steve Loughran <st...@hortonworks.com>.

On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> wrote:

>Yeah in support of this statement I think that my primary interest in
>this Spark Extras and the good work by Luciano here is that anytime we
>take bits out of a code base and “move it to GitHub” I see a bad precedent
>being set.
>
>Creating this project at the ASF creates a synergy between *Apache Spark*
>which is *at the ASF*.
>
>We welcome comments and as Luciano said, this is meant to invite and be
>open to those in the Apache Spark PMC to join and help.
>
>Cheers,
>Chris

As one of the people named, here's my rationale:

Throwing stuff into github creates that world of branches, and its no longer something that could be managed through the ASF, where managed is: governance, participation and a release process that includes auditing dependencies, code-signoff, etc,

As an example, there's a mutant hive JAR which spark uses, that's something which currently evolved between my repo and Patrick Wendell's; now that Josh Rosen has taken on the bold task of "trying to move spark and twill to Kryo 3", he's going to own that code, and now the reference branch will move somewhere else.

In contrast, if there was an ASF location for this, then it'd be something anyone with commit rights could maintain and publish

(actually, I've just realised life is hard here as the hive is a fork of ASF hive —really the spark branch should be a separate branch in Hive's own repo ... But the concept is the same: those bits of the codebase which are core parts of the spark project should really live in or near it)

If everyone on the spark commit list gets write access to this extras repo, moving things is straightforward. Release wise, things could/should be in sync.

If there's a risk, its the eternal problem of the contrib/ dir .... Stuff ends up there that never gets maintained. I don't see that being any worse than if things were thrown to the wind of a thousand github repos: at least now there'd be a central issue tracking location.

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

+1

Regards
JB

On 04/15/2016 06:41 PM, Mattmann, Chris A (3980) wrote:
> Yeah in support of this statement I think that my primary interest in
> this Spark Extras and the good work by Luciano here is that anytime we
> take bits out of a code base and “move it to GitHub” I see a bad precedent
> being set.
>
> Creating this project at the ASF creates a synergy between *Apache Spark*
> which is *at the ASF*.
>
> We welcome comments and as Luciano said, this is meant to invite and be
> open to those in the Apache Spark PMC to join and help.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
> On 4/15/16, 9:39 AM, "Luciano Resende" <lu...@gmail.com> wrote:
>
>>
>>
>> On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger
>> <co...@koeninger.org> wrote:
>>
>> Given that not all of the connectors were removed, I think this
>> creates a weird / confusing three tier system
>>
>> 1. connectors in the official project's spark/extras or spark/external
>> 2. connectors in "Spark Extras"
>> 3. connectors in some random organization's github
>>
>>
>>
>>
>>
>>
>>
>> Agree Cody, and I think this is one of the goals of "Spark Extras", centralize the development of these connectors under one central place at Apache, and that's why one of our asks is to invite the Spark PMC to continue developing the remaining connectors
>> that stayed in Spark proper, in "Spark Extras". We will also discuss some process policies on enabling lowering the bar to allow proposal of these other github extensions to be part of "Spark Extras" while also considering a way to move code to a maintenance
>> mode location.
>>
>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.

Yeah in support of this statement I think that my primary interest in
this Spark Extras and the good work by Luciano here is that anytime we
take bits out of a code base and “move it to GitHub” I see a bad precedent
being set.

Creating this project at the ASF creates a synergy between *Apache Spark*
which is *at the ASF*.

We welcome comments and as Luciano said, this is meant to invite and be
open to those in the Apache Spark PMC to join and help.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++










On 4/15/16, 9:39 AM, "Luciano Resende" <lu...@gmail.com> wrote:

>
>
>On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger 
><co...@koeninger.org> wrote:
>
>Given that not all of the connectors were removed, I think this
>creates a weird / confusing three tier system
>
>1. connectors in the official project's spark/extras or spark/external
>2. connectors in "Spark Extras"
>3. connectors in some random organization's github
>
>
>
>
>
>
>
>Agree Cody, and I think this is one of the goals of "Spark Extras", centralize the development of these connectors under one central place at Apache, and that's why one of our asks is to invite the Spark PMC to continue developing the remaining connectors
> that stayed in Spark proper, in "Spark Extras". We will also discuss some process policies on enabling lowering the bar to allow proposal of these other github extensions to be part of "Spark Extras" while also considering a way to move code to a maintenance
> mode location.
>
> 
>
>
>-- 
>Luciano Resende
>http://twitter.com/lresende1975
>http://lresende.blogspot.com/
>
>
>
>

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Luciano Resende <lu...@gmail.com>.

On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger <co...@koeninger.org> wrote:

> Given that not all of the connectors were removed, I think this
> creates a weird / confusing three tier system
>
> 1. connectors in the official project's spark/extras or spark/external
> 2. connectors in "Spark Extras"
> 3. connectors in some random organization's github
>
>
Agree Cody, and I think this is one of the goals of "Spark Extras",
centralize the development of these connectors under one central place at
Apache, and that's why one of our asks is to invite the Spark PMC to
continue developing the remaining connectors that stayed in Spark proper,
in "Spark Extras". We will also discuss some process policies on enabling
lowering the bar to allow proposal of these other github extensions to be
part of "Spark Extras" while also considering a way to move code to a
maintenance mode location.


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Cody Koeninger <co...@koeninger.org>.

Given that not all of the connectors were removed, I think this
creates a weird / confusing three tier system

1. connectors in the official project's spark/extras or spark/external
2. connectors in "Spark Extras"
3. connectors in some random organization's github



On Fri, Apr 15, 2016 at 11:18 AM, Sean Owen <so...@cloudera.com> wrote:
> Why would this need to be an ASF project of its own? I don't think
> it's possible to have a yet another separate "Spark Extras" TLP (?)
>
> There is already a project to manage these bits of code on Github. How
> about all of the interested parties manage the code there, under the
> same process, under the same license, etc?
>
> I'm not against calling it Spark Extras myself but I wonder if that
> needlessly confuses the situation. They aren't part of the Spark TLP
> on purpose, so trying to give it some special middle-ground status
> might just be confusing. The thing that comes to mind immediately is
> "Connectors for Apache Spark", spark-connectors, etc.
>
>
> On Fri, Apr 15, 2016 at 5:01 PM, Luciano Resende <lu...@gmail.com> wrote:
>> After some collaboration with other community members, we have created a
>> initial draft for Spark Extras which is available for review at
>>
>> https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing
>>
>> We would like to invite other community members to participate in the
>> project, particularly the Spark Committers and PMC (feel free to express
>> interest and I will update the proposal). Another option here is just to
>> give ALL Spark committers write access to "Spark Extras".
>>
>>
>> We also have couple asks from the Spark PMC :
>>
>> - Permission to use "Spark Extras" as the project name. We already checked
>> this with Apache Brand Management, and the recommendation was to discuss and
>> reach consensus with the Spark PMC.
>>
>> - We would also want to check with the Spark PMC that, in case of
>> successfully creation of  "Spark Extras", if the PMC would be willing to
>> continue the development of the remaining connectors that stayed in Spark
>> 2.0 codebase in the "Spark Extras" project.
>>
>>
>> Thanks in advance, and we welcome any feedback around this proposal before
>> we present to the Apache Board for consideration.
>>
>>
>>
>> On Sat, Mar 26, 2016 at 10:07 AM, Luciano Resende <lu...@gmail.com>
>> wrote:
>>>
>>> I believe some of this has been resolved in the context of some parts that
>>> had interest in one extra connector, but we still have a few removed, and as
>>> you mentioned, we still don't have a simple way or willingness to manage and
>>> be current on new packages like kafka. And based on the fact that this
>>> thread is still alive, I believe that other community members might have
>>> other concerns as well.
>>>
>>> After some thought, I believe having a separate project (what was
>>> mentioned here as Spark Extras) to handle Spark Connectors and Spark add-ons
>>> in general could be very beneficial to Spark and the overall Spark
>>> community, which would have a central place in Apache to collaborate around
>>> related Spark components.
>>>
>>> Some of the benefits on this approach
>>>
>>> - Enables maintaining the connectors inside Apache, following the Apache
>>> governance and release rules, while allowing Spark proper to focus on the
>>> core runtime.
>>> - Provides more flexibility in controlling the direction (currency) of the
>>> existing connectors (e.g. willing to find a solution and maintain multiple
>>> versions of same connectors like kafka 0.8x and 0.9x)
>>> - Becomes a home for other types of Spark related connectors helping
>>> expanding the community around Spark (e.g. Zeppelin see most of it's current
>>> contribution around new/enhanced connectors)
>>>
>>> What are some requirements for Spark Extras to be successful:
>>>
>>> - Be up to date with Spark Trunk APIs (based on daily CIs against
>>> SNAPSHOT)
>>> - Adhere to Spark release cycles (have a very little window compared to
>>> Spark release)
>>> - Be more open and flexible to the set of connectors it will accept and
>>> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
>>> have today)
>>>
>>> Where to start Spark Extras
>>>
>>> Depending on the interest here, we could follow the steps of (Apache
>>> Arrow) and start this directly as a TLP, or start as an incubator project. I
>>> would consider the first option first.
>>>
>>> Who would participate
>>>
>>> Have thought about this for a bit, and if we go to the direction of TLP, I
>>> would say Spark Committers and Apache Members can request to participate as
>>> PMC members, while other committers can request to become committers. Non
>>> committers would be added based on meritocracy after the start of the
>>> project.
>>>
>>> Project Name
>>>
>>> It would be ideal if we could have a project name that shows close ties to
>>> Spark (e.g. Spark Extras or Spark Connectors) but we will need permission
>>> and support from whoever is going to evaluate the project proposal (e.g.
>>> Apache Board)
>>>
>>>
>>> Thoughts ?
>>>
>>> Does anyone have any big disagreement or objection to moving into this
>>> direction ?
>>>
>>> Otherwise, who would be interested in joining the project, so I can start
>>> working on some concrete proposal ?
>>>
>>>
>>
>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Posted by Sean Owen <so...@cloudera.com>.

Why would this need to be an ASF project of its own? I don't think
it's possible to have a yet another separate "Spark Extras" TLP (?)

There is already a project to manage these bits of code on Github. How
about all of the interested parties manage the code there, under the
same process, under the same license, etc?

I'm not against calling it Spark Extras myself but I wonder if that
needlessly confuses the situation. They aren't part of the Spark TLP
on purpose, so trying to give it some special middle-ground status
might just be confusing. The thing that comes to mind immediately is
"Connectors for Apache Spark", spark-connectors, etc.


On Fri, Apr 15, 2016 at 5:01 PM, Luciano Resende <lu...@gmail.com> wrote:
> After some collaboration with other community members, we have created a
> initial draft for Spark Extras which is available for review at
>
> https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing
>
> We would like to invite other community members to participate in the
> project, particularly the Spark Committers and PMC (feel free to express
> interest and I will update the proposal). Another option here is just to
> give ALL Spark committers write access to "Spark Extras".
>
>
> We also have couple asks from the Spark PMC :
>
> - Permission to use "Spark Extras" as the project name. We already checked
> this with Apache Brand Management, and the recommendation was to discuss and
> reach consensus with the Spark PMC.
>
> - We would also want to check with the Spark PMC that, in case of
> successfully creation of  "Spark Extras", if the PMC would be willing to
> continue the development of the remaining connectors that stayed in Spark
> 2.0 codebase in the "Spark Extras" project.
>
>
> Thanks in advance, and we welcome any feedback around this proposal before
> we present to the Apache Board for consideration.
>
>
>
> On Sat, Mar 26, 2016 at 10:07 AM, Luciano Resende <lu...@gmail.com>
> wrote:
>>
>> I believe some of this has been resolved in the context of some parts that
>> had interest in one extra connector, but we still have a few removed, and as
>> you mentioned, we still don't have a simple way or willingness to manage and
>> be current on new packages like kafka. And based on the fact that this
>> thread is still alive, I believe that other community members might have
>> other concerns as well.
>>
>> After some thought, I believe having a separate project (what was
>> mentioned here as Spark Extras) to handle Spark Connectors and Spark add-ons
>> in general could be very beneficial to Spark and the overall Spark
>> community, which would have a central place in Apache to collaborate around
>> related Spark components.
>>
>> Some of the benefits on this approach
>>
>> - Enables maintaining the connectors inside Apache, following the Apache
>> governance and release rules, while allowing Spark proper to focus on the
>> core runtime.
>> - Provides more flexibility in controlling the direction (currency) of the
>> existing connectors (e.g. willing to find a solution and maintain multiple
>> versions of same connectors like kafka 0.8x and 0.9x)
>> - Becomes a home for other types of Spark related connectors helping
>> expanding the community around Spark (e.g. Zeppelin see most of it's current
>> contribution around new/enhanced connectors)
>>
>> What are some requirements for Spark Extras to be successful:
>>
>> - Be up to date with Spark Trunk APIs (based on daily CIs against
>> SNAPSHOT)
>> - Adhere to Spark release cycles (have a very little window compared to
>> Spark release)
>> - Be more open and flexible to the set of connectors it will accept and
>> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
>> have today)
>>
>> Where to start Spark Extras
>>
>> Depending on the interest here, we could follow the steps of (Apache
>> Arrow) and start this directly as a TLP, or start as an incubator project. I
>> would consider the first option first.
>>
>> Who would participate
>>
>> Have thought about this for a bit, and if we go to the direction of TLP, I
>> would say Spark Committers and Apache Members can request to participate as
>> PMC members, while other committers can request to become committers. Non
>> committers would be added based on meritocracy after the start of the
>> project.
>>
>> Project Name
>>
>> It would be ideal if we could have a project name that shows close ties to
>> Spark (e.g. Spark Extras or Spark Connectors) but we will need permission
>> and support from whoever is going to evaluate the project proposal (e.g.
>> Apache Board)
>>
>>
>> Thoughts ?
>>
>> Does anyone have any big disagreement or objection to moving into this
>> direction ?
>>
>> Otherwise, who would be interested in joining the project, so I can start
>> working on some concrete proposal ?
>>
>>
>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org