You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Matei Zaharia <ma...@gmail.com> on 2014/11/06 02:31:58 UTC

[VOTE] Designating maintainers for some Spark components

Hi all,

I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.

As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.

In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.

IMO, adopting this model would have two benefits:

1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.

2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.

We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:

- Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
- Each component with maintainers will have at least 2 maintainers.
- Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
- Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.

If you'd like to see examples for this model, check out the following projects:
- CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> 
- Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>

Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.

- Spark core public API: Matei, Patrick, Reynold
- Job scheduler: Matei, Kay, Patrick
- Shuffle and network: Reynold, Aaron, Matei
- Block manager: Reynold, Aaron
- YARN: Tom, Andrew Or
- Python: Josh, Matei
- MLlib: Xiangrui, Matei
- SQL: Michael, Reynold
- Streaming: TD, Matei
- GraphX: Ankur, Joey, Reynold

I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.

Matei

Re: [VOTE] Designating maintainers for some Spark components

Posted by Xuefeng Wu <be...@gmail.com>.

+1  it make more focus and more consistence. 
 

Yours, Xuefeng Wu 吴雪峰 敬上

> On 2014年11月6日, at 上午9:31, Matei Zaharia <ma...@gmail.com> wrote:
> 
> Hi all,
> 
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> 
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> 
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> 
> IMO, adopting this model would have two benefits:
> 
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> 
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> 
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> 
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> 
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> 
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
> 
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> 
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
> 
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> 
> Matei

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

RE: [VOTE] Designating maintainers for some Spark components

Posted by Ravindra pesala <ra...@huawei.com>.

+1

Thanks & Regards,
Ravindra Pesala

________________________________________
From: Manoj Babu [manoj444@gmail.com]
Sent: Thursday, November 06, 2014 1:02 PM
To: Matei Zaharia
Cc: Sean Owen; dev
Subject: Re: [VOTE] Designating maintainers for some Spark components

+1

Cheers!
Manoj.

On Thu, Nov 6, 2014 at 12:51 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> Several people asked about having maintainers review the PR queue for
> their modules regularly, and I like that idea. We have a new tool now to
> help with that in https://spark-prs.appspot.com.
>
> In terms of the set of open PRs itself, it is large but note that there
> are also 2800 *closed* PRs, which means we close the majority of PRs (and I
> don't know the exact stats but I'd guess that 90% of those are accepted and
> merged). I think one problem is that with GitHub, people often develop
> something as a PR and have a lot of discussion on there (including whether
> we even want the feature). I recently updated our "how to contribute" page
> to encourage opening a JIRA and having discussions on the dev list first,
> but I do think we need to be faster with closing ones that we don't have a
> plan to merge. Note that Hadoop, Hive, HBase, etc also have about 300
> issues each in the "patch available" state, so this is some kind of
> universal constant :P.
>
> Matei
>
>
> > On Nov 5, 2014, at 10:46 PM, Sean Owen <so...@cloudera.com> wrote:
> >
> > Naturally, this sounds great. FWIW my only but significant worry about
> > Spark is scaling up to meet unprecedented demand in the form of
> > questions and contribution. Clarifying responsibility and ownership
> > helps more than it hurts by adding process.
> >
> > This is related but different topic, but, I wonder out loud what this
> > can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
> > PRs, most of which have de facto already fallen between some cracks.
> > This harms the usefulness of these tools and processes.
> >
> > I'd love to see this translate into triage / closing of most of it by
> > maintainers, and new actions and strategies for increasing
> > 'throughput' in review and/or helping people make better contributions
> > in the first place.
> >
> > On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia <ma...@gmail.com>
> wrote:
> >> Hi all,
> >>
> >> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Liquan Pei <li...@gmail.com>.

+1

Liquan

On Wed, Nov 5, 2014 at 11:32 PM, Manoj Babu <ma...@gmail.com> wrote:

> +1
>
> Cheers!
> Manoj.
>
> On Thu, Nov 6, 2014 at 12:51 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
> > Several people asked about having maintainers review the PR queue for
> > their modules regularly, and I like that idea. We have a new tool now to
> > help with that in https://spark-prs.appspot.com.
> >
> > In terms of the set of open PRs itself, it is large but note that there
> > are also 2800 *closed* PRs, which means we close the majority of PRs
> (and I
> > don't know the exact stats but I'd guess that 90% of those are accepted
> and
> > merged). I think one problem is that with GitHub, people often develop
> > something as a PR and have a lot of discussion on there (including
> whether
> > we even want the feature). I recently updated our "how to contribute"
> page
> > to encourage opening a JIRA and having discussions on the dev list first,
> > but I do think we need to be faster with closing ones that we don't have
> a
> > plan to merge. Note that Hadoop, Hive, HBase, etc also have about 300
> > issues each in the "patch available" state, so this is some kind of
> > universal constant :P.
> >
> > Matei
> >
> >
> > > On Nov 5, 2014, at 10:46 PM, Sean Owen <so...@cloudera.com> wrote:
> > >
> > > Naturally, this sounds great. FWIW my only but significant worry about
> > > Spark is scaling up to meet unprecedented demand in the form of
> > > questions and contribution. Clarifying responsibility and ownership
> > > helps more than it hurts by adding process.
> > >
> > > This is related but different topic, but, I wonder out loud what this
> > > can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
> > > PRs, most of which have de facto already fallen between some cracks.
> > > This harms the usefulness of these tools and processes.
> > >
> > > I'd love to see this translate into triage / closing of most of it by
> > > maintainers, and new actions and strategies for increasing
> > > 'throughput' in review and/or helping people make better contributions
> > > in the first place.
> > >
> > > On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia <matei.zaharia@gmail.com
> >
> > wrote:
> > >> Hi all,
> > >>
> > >> I wanted to share a discussion we've been having on the PMC list, as
> > well as call for an official vote on it on a public list. Basically, as
> the
> > Spark project scales up, we need to define a model to make sure there is
> > still great oversight of key components (in particular internal
> > architecture and public APIs), and to this end I've proposed
> implementing a
> > maintainer model for some of these components, similar to other large
> > projects.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
>



-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Re: [VOTE] Designating maintainers for some Spark components

Posted by Manoj Babu <ma...@gmail.com>.

+1

Cheers!
Manoj.

On Thu, Nov 6, 2014 at 12:51 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> Several people asked about having maintainers review the PR queue for
> their modules regularly, and I like that idea. We have a new tool now to
> help with that in https://spark-prs.appspot.com.
>
> In terms of the set of open PRs itself, it is large but note that there
> are also 2800 *closed* PRs, which means we close the majority of PRs (and I
> don't know the exact stats but I'd guess that 90% of those are accepted and
> merged). I think one problem is that with GitHub, people often develop
> something as a PR and have a lot of discussion on there (including whether
> we even want the feature). I recently updated our "how to contribute" page
> to encourage opening a JIRA and having discussions on the dev list first,
> but I do think we need to be faster with closing ones that we don't have a
> plan to merge. Note that Hadoop, Hive, HBase, etc also have about 300
> issues each in the "patch available" state, so this is some kind of
> universal constant :P.
>
> Matei
>
>
> > On Nov 5, 2014, at 10:46 PM, Sean Owen <so...@cloudera.com> wrote:
> >
> > Naturally, this sounds great. FWIW my only but significant worry about
> > Spark is scaling up to meet unprecedented demand in the form of
> > questions and contribution. Clarifying responsibility and ownership
> > helps more than it hurts by adding process.
> >
> > This is related but different topic, but, I wonder out loud what this
> > can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
> > PRs, most of which have de facto already fallen between some cracks.
> > This harms the usefulness of these tools and processes.
> >
> > I'd love to see this translate into triage / closing of most of it by
> > maintainers, and new actions and strategies for increasing
> > 'throughput' in review and/or helping people make better contributions
> > in the first place.
> >
> > On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia <ma...@gmail.com>
> wrote:
> >> Hi all,
> >>
> >> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

回复： [VOTE] Designating maintainers for some Spark components

Posted by witgo <wi...@qq.com>.

+1

------------------ 原始邮件 ------------------
发件人: "Matei Zaharia"<ma...@gmail.com>; 
发送时间: 2014年11月6日(星期四) 下午3:21
收件人: "Sean Owen"<so...@cloudera.com>; 
抄送: "dev"<de...@spark.apache.org>; 
主题: Re: [VOTE] Designating maintainers for some Spark components

Several people asked about having maintainers review the PR queue for their modules regularly, and I like that idea. We have a new tool now to help with that in https://spark-prs.appspot.com.

In terms of the set of open PRs itself, it is large but note that there are also 2800 *closed* PRs, which means we close the majority of PRs (and I don't know the exact stats but I'd guess that 90% of those are accepted and merged). I think one problem is that with GitHub, people often develop something as a PR and have a lot of discussion on there (including whether we even want the feature). I recently updated our "how to contribute" page to encourage opening a JIRA and having discussions on the dev list first, but I do think we need to be faster with closing ones that we don't have a plan to merge. Note that Hadoop, Hive, HBase, etc also have about 300 issues each in the "patch available" state, so this is some kind of universal constant :P.

Matei

> On Nov 5, 2014, at 10:46 PM, Sean Owen <so...@cloudera.com> wrote:
> 
> Naturally, this sounds great. FWIW my only but significant worry about
> Spark is scaling up to meet unprecedented demand in the form of
> questions and contribution. Clarifying responsibility and ownership
> helps more than it hurts by adding process.
> 
> This is related but different topic, but, I wonder out loud what this
> can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
> PRs, most of which have de facto already fallen between some cracks.
> This harms the usefulness of these tools and processes.
> 
> I'd love to see this translate into triage / closing of most of it by
> maintainers, and new actions and strategies for increasing
> 'throughput' in review and/or helping people make better contributions
> in the first place.
> 
> On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia <ma...@gmail.com> wrote:
>> Hi all,
>> 
>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Matei Zaharia <ma...@gmail.com>.

Several people asked about having maintainers review the PR queue for their modules regularly, and I like that idea. We have a new tool now to help with that in https://spark-prs.appspot.com.

In terms of the set of open PRs itself, it is large but note that there are also 2800 *closed* PRs, which means we close the majority of PRs (and I don't know the exact stats but I'd guess that 90% of those are accepted and merged). I think one problem is that with GitHub, people often develop something as a PR and have a lot of discussion on there (including whether we even want the feature). I recently updated our "how to contribute" page to encourage opening a JIRA and having discussions on the dev list first, but I do think we need to be faster with closing ones that we don't have a plan to merge. Note that Hadoop, Hive, HBase, etc also have about 300 issues each in the "patch available" state, so this is some kind of universal constant :P.

Matei

> On Nov 5, 2014, at 10:46 PM, Sean Owen <so...@cloudera.com> wrote:
> 
> Naturally, this sounds great. FWIW my only but significant worry about
> Spark is scaling up to meet unprecedented demand in the form of
> questions and contribution. Clarifying responsibility and ownership
> helps more than it hurts by adding process.
> 
> This is related but different topic, but, I wonder out loud what this
> can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
> PRs, most of which have de facto already fallen between some cracks.
> This harms the usefulness of these tools and processes.
> 
> I'd love to see this translate into triage / closing of most of it by
> maintainers, and new actions and strategies for increasing
> 'throughput' in review and/or helping people make better contributions
> in the first place.
> 
> On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia <ma...@gmail.com> wrote:
>> Hi all,
>> 
>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Sean Owen <so...@cloudera.com>.

Naturally, this sounds great. FWIW my only but significant worry about
Spark is scaling up to meet unprecedented demand in the form of
questions and contribution. Clarifying responsibility and ownership
helps more than it hurts by adding process.

This is related but different topic, but, I wonder out loud what this
can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
PRs, most of which have de facto already fallen between some cracks.
This harms the usefulness of these tools and processes.

I'd love to see this translate into triage / closing of most of it by
maintainers, and new actions and strategies for increasing
'throughput' in review and/or helping people make better contributions
in the first place.

On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia <ma...@gmail.com> wrote:
> Hi all,
>
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

+1.
Tom 

     On Wednesday, November 5, 2014 9:21 PM, Matei Zaharia <ma...@gmail.com> wrote:
   

 BTW, my own vote is obviously +1 (binding).

Matei

> On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com> wrote:
> 
> Hi all,
> 
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> 
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> 
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> 
> IMO, adopting this model would have two benefits:
> 
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> 
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> 
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> 
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> 
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> 
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
> 
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> 
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
> 
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> 
> Matei

Re: [VOTE] Designating maintainers for some Spark components

Posted by Imran Rashid <im...@therashids.com>.

+1 overall

also +1 to Sandy's suggestion to getting build maintainers as well.

On Wed, Nov 5, 2014 at 7:57 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> This seems like a good idea.
>
> An area that wasn't listed, but that I think could strongly benefit from
> maintainers, is the build.  Having consistent oversight over Maven, SBT,
> and dependencies would allow us to avoid subtle breakages.
>
> Component maintainers have come up several times within the Hadoop project,
> and I think one of the main reasons the proposals have been rejected is
> that, structurally, its effect is to slow down development.  As you
> mention, this is somewhat mitigated if being a maintainer leads committers
> to take on more responsibility, but it might be worthwhile to draw up more
> specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
> test fixes, etc. always require a maintainer?
>
> -Sandy
>
> On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
> > +1 (binding)
> >
> > On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <ma...@gmail.com>
> > wrote:
> >
> > > BTW, my own vote is obviously +1 (binding).
> > >
> > > Matei
> > >
> > > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I wanted to share a discussion we've been having on the PMC list, as
> > > well as call for an official vote on it on a public list. Basically, as
> > the
> > > Spark project scales up, we need to define a model to make sure there
> is
> > > still great oversight of key components (in particular internal
> > > architecture and public APIs), and to this end I've proposed
> > implementing a
> > > maintainer model for some of these components, similar to other large
> > > projects.
> > > >
> > > > As background on this, Spark has grown a lot since joining Apache.
> > We've
> > > had over 80 contributors/month for the past 3 months, which I believe
> > makes
> > > us the most active project in contributors/month at Apache, as well as
> > over
> > > 500 patches/month. The codebase has also grown significantly, with new
> > > libraries for SQL, ML, graphs and more.
> > > >
> > > > In this kind of large project, one common way to scale development is
> > to
> > > assign "maintainers" to oversee key components, where each patch to
> that
> > > component needs to get sign-off from at least one of its maintainers.
> > Most
> > > existing large projects do this -- at Apache, some large ones with this
> > > model are CloudStack (the second-most active project overall),
> > Subversion,
> > > and Kafka, and other examples include Linux and Python. This is also
> > > by-and-large how Spark operates today -- most components have a
> de-facto
> > > maintainer.
> > > >
> > > > IMO, adopting this model would have two benefits:
> > > >
> > > > 1) Consistent oversight of design for that component, especially
> > > regarding architecture and API. This process would ensure that the
> > > component's maintainers see all proposed changes and consider them to
> fit
> > > together in a good way.
> > > >
> > > > 2) More structure for new contributors and committers -- in
> particular,
> > > it would be easy to look up who’s responsible for each module and ask
> > them
> > > for reviews, etc, rather than having patches slip between the cracks.
> > > >
> > > > We'd like to start with in a light-weight manner, where the model
> only
> > > applies to certain key components (e.g. scheduler, shuffle) and
> > user-facing
> > > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> expand
> > > it if we deem it useful. The specific mechanics would be as follows:
> > > >
> > > > - Some components in Spark will have maintainers assigned to them,
> > where
> > > one of the maintainers needs to sign off on each patch to the
> component.
> > > > - Each component with maintainers will have at least 2 maintainers.
> > > > - Maintainers will be assigned from the most active and knowledgeable
> > > committers on that component by the PMC. The PMC can vote to add /
> remove
> > > maintainers, and maintained components, through consensus.
> > > > - Maintainers are expected to be active in responding to patches for
> > > their components, though they do not need to be the main reviewers for
> > them
> > > (e.g. they might just sign off on architecture / API). To prevent
> > inactive
> > > maintainers from blocking the project, if a maintainer isn't responding
> > in
> > > a reasonable time period (say 2 weeks), other committers can merge the
> > > patch, and the PMC will want to discuss adding another maintainer.
> > > >
> > > > If you'd like to see examples for this model, check out the following
> > > projects:
> > > > - CloudStack:
> > >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > > >
> > > > - Subversion:
> > > https://subversion.apache.org/docs/community-guide/roles.html <
> > > https://subversion.apache.org/docs/community-guide/roles.html>
> > > >
> > > > Finally, I wanted to list our current proposal for initial components
> > > and maintainers. It would be good to get feedback on other components
> we
> > > might add, but please note that personnel discussions (e.g. "I don't
> > think
> > > Matei should maintain *that* component) should only happen on the
> private
> > > list. The initial components were chosen to include all public APIs and
> > the
> > > main core components, and the maintainers were chosen from the most
> > active
> > > contributors to those modules.
> > > >
> > > > - Spark core public API: Matei, Patrick, Reynold
> > > > - Job scheduler: Matei, Kay, Patrick
> > > > - Shuffle and network: Reynold, Aaron, Matei
> > > > - Block manager: Reynold, Aaron
> > > > - YARN: Tom, Andrew Or
> > > > - Python: Josh, Matei
> > > > - MLlib: Xiangrui, Matei
> > > > - SQL: Michael, Reynold
> > > > - Streaming: TD, Matei
> > > > - GraphX: Ankur, Joey, Reynold
> > > >
> > > > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The
> > > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > > >
> > > > Matei
> > >
> > >
> >
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Andrew Or <an...@databricks.com>.

+1

2014-11-05 18:08 GMT-08:00 Patrick Wendell <pw...@gmail.com>:

> I'm a +1 on this as well, I think it will be a useful model as we
> scale the project in the future and recognizes some informal process
> we have now.
>
> To respond to Sandy's comment: for changes that fall in between the
> component boundaries or are straightforward, my understanding of this
> model is you wouldn't need an explicit sign off. I think this is why
> unlike some other projects, we wouldn't e.g. lock down permissions to
> portions of the source tree. If some obvious fix needs to go in,
> people should just merge it.
>
> - Patrick
>
> On Wed, Nov 5, 2014 at 5:57 PM, Sandy Ryza <sa...@cloudera.com>
> wrote:
> > This seems like a good idea.
> >
> > An area that wasn't listed, but that I think could strongly benefit from
> > maintainers, is the build.  Having consistent oversight over Maven, SBT,
> > and dependencies would allow us to avoid subtle breakages.
> >
> > Component maintainers have come up several times within the Hadoop
> project,
> > and I think one of the main reasons the proposals have been rejected is
> > that, structurally, its effect is to slow down development.  As you
> > mention, this is somewhat mitigated if being a maintainer leads
> committers
> > to take on more responsibility, but it might be worthwhile to draw up
> more
> > specific ideas on how to combat this?  E.g. do obvious changes, doc
> fixes,
> > test fixes, etc. always require a maintainer?
> >
> > -Sandy
> >
> > On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <michael@databricks.com
> >
> > wrote:
> >
> >> +1 (binding)
> >>
> >> On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <ma...@gmail.com>
> >> wrote:
> >>
> >> > BTW, my own vote is obviously +1 (binding).
> >> >
> >> > Matei
> >> >
> >> > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> >> > wrote:
> >> > >
> >> > > Hi all,
> >> > >
> >> > > I wanted to share a discussion we've been having on the PMC list, as
> >> > well as call for an official vote on it on a public list. Basically,
> as
> >> the
> >> > Spark project scales up, we need to define a model to make sure there
> is
> >> > still great oversight of key components (in particular internal
> >> > architecture and public APIs), and to this end I've proposed
> >> implementing a
> >> > maintainer model for some of these components, similar to other large
> >> > projects.
> >> > >
> >> > > As background on this, Spark has grown a lot since joining Apache.
> >> We've
> >> > had over 80 contributors/month for the past 3 months, which I believe
> >> makes
> >> > us the most active project in contributors/month at Apache, as well as
> >> over
> >> > 500 patches/month. The codebase has also grown significantly, with new
> >> > libraries for SQL, ML, graphs and more.
> >> > >
> >> > > In this kind of large project, one common way to scale development
> is
> >> to
> >> > assign "maintainers" to oversee key components, where each patch to
> that
> >> > component needs to get sign-off from at least one of its maintainers.
> >> Most
> >> > existing large projects do this -- at Apache, some large ones with
> this
> >> > model are CloudStack (the second-most active project overall),
> >> Subversion,
> >> > and Kafka, and other examples include Linux and Python. This is also
> >> > by-and-large how Spark operates today -- most components have a
> de-facto
> >> > maintainer.
> >> > >
> >> > > IMO, adopting this model would have two benefits:
> >> > >
> >> > > 1) Consistent oversight of design for that component, especially
> >> > regarding architecture and API. This process would ensure that the
> >> > component's maintainers see all proposed changes and consider them to
> fit
> >> > together in a good way.
> >> > >
> >> > > 2) More structure for new contributors and committers -- in
> particular,
> >> > it would be easy to look up who's responsible for each module and ask
> >> them
> >> > for reviews, etc, rather than having patches slip between the cracks.
> >> > >
> >> > > We'd like to start with in a light-weight manner, where the model
> only
> >> > applies to certain key components (e.g. scheduler, shuffle) and
> >> user-facing
> >> > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> expand
> >> > it if we deem it useful. The specific mechanics would be as follows:
> >> > >
> >> > > - Some components in Spark will have maintainers assigned to them,
> >> where
> >> > one of the maintainers needs to sign off on each patch to the
> component.
> >> > > - Each component with maintainers will have at least 2 maintainers.
> >> > > - Maintainers will be assigned from the most active and
> knowledgeable
> >> > committers on that component by the PMC. The PMC can vote to add /
> remove
> >> > maintainers, and maintained components, through consensus.
> >> > > - Maintainers are expected to be active in responding to patches for
> >> > their components, though they do not need to be the main reviewers for
> >> them
> >> > (e.g. they might just sign off on architecture / API). To prevent
> >> inactive
> >> > maintainers from blocking the project, if a maintainer isn't
> responding
> >> in
> >> > a reasonable time period (say 2 weeks), other committers can merge the
> >> > patch, and the PMC will want to discuss adding another maintainer.
> >> > >
> >> > > If you'd like to see examples for this model, check out the
> following
> >> > projects:
> >> > > - CloudStack:
> >> >
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> > <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> > >
> >> > > - Subversion:
> >> > https://subversion.apache.org/docs/community-guide/roles.html <
> >> > https://subversion.apache.org/docs/community-guide/roles.html>
> >> > >
> >> > > Finally, I wanted to list our current proposal for initial
> components
> >> > and maintainers. It would be good to get feedback on other components
> we
> >> > might add, but please note that personnel discussions (e.g. "I don't
> >> think
> >> > Matei should maintain *that* component) should only happen on the
> private
> >> > list. The initial components were chosen to include all public APIs
> and
> >> the
> >> > main core components, and the maintainers were chosen from the most
> >> active
> >> > contributors to those modules.
> >> > >
> >> > > - Spark core public API: Matei, Patrick, Reynold
> >> > > - Job scheduler: Matei, Kay, Patrick
> >> > > - Shuffle and network: Reynold, Aaron, Matei
> >> > > - Block manager: Reynold, Aaron
> >> > > - YARN: Tom, Andrew Or
> >> > > - Python: Josh, Matei
> >> > > - MLlib: Xiangrui, Matei
> >> > > - SQL: Michael, Reynold
> >> > > - Streaming: TD, Matei
> >> > > - GraphX: Ankur, Joey, Reynold
> >> > >
> >> > > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The
> >> > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >> > >
> >> > > Matei
> >> >
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Patrick Wendell <pw...@gmail.com>.

I'm a +1 on this as well, I think it will be a useful model as we
scale the project in the future and recognizes some informal process
we have now.

To respond to Sandy's comment: for changes that fall in between the
component boundaries or are straightforward, my understanding of this
model is you wouldn't need an explicit sign off. I think this is why
unlike some other projects, we wouldn't e.g. lock down permissions to
portions of the source tree. If some obvious fix needs to go in,
people should just merge it.

- Patrick

On Wed, Nov 5, 2014 at 5:57 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> This seems like a good idea.
>
> An area that wasn't listed, but that I think could strongly benefit from
> maintainers, is the build.  Having consistent oversight over Maven, SBT,
> and dependencies would allow us to avoid subtle breakages.
>
> Component maintainers have come up several times within the Hadoop project,
> and I think one of the main reasons the proposals have been rejected is
> that, structurally, its effect is to slow down development.  As you
> mention, this is somewhat mitigated if being a maintainer leads committers
> to take on more responsibility, but it might be worthwhile to draw up more
> specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
> test fixes, etc. always require a maintainer?
>
> -Sandy
>
> On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> +1 (binding)
>>
>> On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <ma...@gmail.com>
>> wrote:
>>
>> > BTW, my own vote is obviously +1 (binding).
>> >
>> > Matei
>> >
>> > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com>
>> > wrote:
>> > >
>> > > Hi all,
>> > >
>> > > I wanted to share a discussion we've been having on the PMC list, as
>> > well as call for an official vote on it on a public list. Basically, as
>> the
>> > Spark project scales up, we need to define a model to make sure there is
>> > still great oversight of key components (in particular internal
>> > architecture and public APIs), and to this end I've proposed
>> implementing a
>> > maintainer model for some of these components, similar to other large
>> > projects.
>> > >
>> > > As background on this, Spark has grown a lot since joining Apache.
>> We've
>> > had over 80 contributors/month for the past 3 months, which I believe
>> makes
>> > us the most active project in contributors/month at Apache, as well as
>> over
>> > 500 patches/month. The codebase has also grown significantly, with new
>> > libraries for SQL, ML, graphs and more.
>> > >
>> > > In this kind of large project, one common way to scale development is
>> to
>> > assign "maintainers" to oversee key components, where each patch to that
>> > component needs to get sign-off from at least one of its maintainers.
>> Most
>> > existing large projects do this -- at Apache, some large ones with this
>> > model are CloudStack (the second-most active project overall),
>> Subversion,
>> > and Kafka, and other examples include Linux and Python. This is also
>> > by-and-large how Spark operates today -- most components have a de-facto
>> > maintainer.
>> > >
>> > > IMO, adopting this model would have two benefits:
>> > >
>> > > 1) Consistent oversight of design for that component, especially
>> > regarding architecture and API. This process would ensure that the
>> > component's maintainers see all proposed changes and consider them to fit
>> > together in a good way.
>> > >
>> > > 2) More structure for new contributors and committers -- in particular,
>> > it would be easy to look up who's responsible for each module and ask
>> them
>> > for reviews, etc, rather than having patches slip between the cracks.
>> > >
>> > > We'd like to start with in a light-weight manner, where the model only
>> > applies to certain key components (e.g. scheduler, shuffle) and
>> user-facing
>> > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
>> > it if we deem it useful. The specific mechanics would be as follows:
>> > >
>> > > - Some components in Spark will have maintainers assigned to them,
>> where
>> > one of the maintainers needs to sign off on each patch to the component.
>> > > - Each component with maintainers will have at least 2 maintainers.
>> > > - Maintainers will be assigned from the most active and knowledgeable
>> > committers on that component by the PMC. The PMC can vote to add / remove
>> > maintainers, and maintained components, through consensus.
>> > > - Maintainers are expected to be active in responding to patches for
>> > their components, though they do not need to be the main reviewers for
>> them
>> > (e.g. they might just sign off on architecture / API). To prevent
>> inactive
>> > maintainers from blocking the project, if a maintainer isn't responding
>> in
>> > a reasonable time period (say 2 weeks), other committers can merge the
>> > patch, and the PMC will want to discuss adding another maintainer.
>> > >
>> > > If you'd like to see examples for this model, check out the following
>> > projects:
>> > > - CloudStack:
>> >
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> > <
>> >
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> > >
>> > > - Subversion:
>> > https://subversion.apache.org/docs/community-guide/roles.html <
>> > https://subversion.apache.org/docs/community-guide/roles.html>
>> > >
>> > > Finally, I wanted to list our current proposal for initial components
>> > and maintainers. It would be good to get feedback on other components we
>> > might add, but please note that personnel discussions (e.g. "I don't
>> think
>> > Matei should maintain *that* component) should only happen on the private
>> > list. The initial components were chosen to include all public APIs and
>> the
>> > main core components, and the maintainers were chosen from the most
>> active
>> > contributors to those modules.
>> > >
>> > > - Spark core public API: Matei, Patrick, Reynold
>> > > - Job scheduler: Matei, Kay, Patrick
>> > > - Shuffle and network: Reynold, Aaron, Matei
>> > > - Block manager: Reynold, Aaron
>> > > - YARN: Tom, Andrew Or
>> > > - Python: Josh, Matei
>> > > - MLlib: Xiangrui, Matei
>> > > - SQL: Michael, Reynold
>> > > - Streaming: TD, Matei
>> > > - GraphX: Ankur, Joey, Reynold
>> > >
>> > > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
>> > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> > >
>> > > Matei
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Sandy Ryza <sa...@cloudera.com>.

This seems like a good idea.

An area that wasn't listed, but that I think could strongly benefit from
maintainers, is the build.  Having consistent oversight over Maven, SBT,
and dependencies would allow us to avoid subtle breakages.

Component maintainers have come up several times within the Hadoop project,
and I think one of the main reasons the proposals have been rejected is
that, structurally, its effect is to slow down development.  As you
mention, this is somewhat mitigated if being a maintainer leads committers
to take on more responsibility, but it might be worthwhile to draw up more
specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
test fixes, etc. always require a maintainer?

-Sandy

On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> +1 (binding)
>
> On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
> > BTW, my own vote is obviously +1 (binding).
> >
> > Matei
> >
> > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> > wrote:
> > >
> > > Hi all,
> > >
> > > I wanted to share a discussion we've been having on the PMC list, as
> > well as call for an official vote on it on a public list. Basically, as
> the
> > Spark project scales up, we need to define a model to make sure there is
> > still great oversight of key components (in particular internal
> > architecture and public APIs), and to this end I've proposed
> implementing a
> > maintainer model for some of these components, similar to other large
> > projects.
> > >
> > > As background on this, Spark has grown a lot since joining Apache.
> We've
> > had over 80 contributors/month for the past 3 months, which I believe
> makes
> > us the most active project in contributors/month at Apache, as well as
> over
> > 500 patches/month. The codebase has also grown significantly, with new
> > libraries for SQL, ML, graphs and more.
> > >
> > > In this kind of large project, one common way to scale development is
> to
> > assign "maintainers" to oversee key components, where each patch to that
> > component needs to get sign-off from at least one of its maintainers.
> Most
> > existing large projects do this -- at Apache, some large ones with this
> > model are CloudStack (the second-most active project overall),
> Subversion,
> > and Kafka, and other examples include Linux and Python. This is also
> > by-and-large how Spark operates today -- most components have a de-facto
> > maintainer.
> > >
> > > IMO, adopting this model would have two benefits:
> > >
> > > 1) Consistent oversight of design for that component, especially
> > regarding architecture and API. This process would ensure that the
> > component's maintainers see all proposed changes and consider them to fit
> > together in a good way.
> > >
> > > 2) More structure for new contributors and committers -- in particular,
> > it would be easy to look up who’s responsible for each module and ask
> them
> > for reviews, etc, rather than having patches slip between the cracks.
> > >
> > > We'd like to start with in a light-weight manner, where the model only
> > applies to certain key components (e.g. scheduler, shuffle) and
> user-facing
> > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> > it if we deem it useful. The specific mechanics would be as follows:
> > >
> > > - Some components in Spark will have maintainers assigned to them,
> where
> > one of the maintainers needs to sign off on each patch to the component.
> > > - Each component with maintainers will have at least 2 maintainers.
> > > - Maintainers will be assigned from the most active and knowledgeable
> > committers on that component by the PMC. The PMC can vote to add / remove
> > maintainers, and maintained components, through consensus.
> > > - Maintainers are expected to be active in responding to patches for
> > their components, though they do not need to be the main reviewers for
> them
> > (e.g. they might just sign off on architecture / API). To prevent
> inactive
> > maintainers from blocking the project, if a maintainer isn't responding
> in
> > a reasonable time period (say 2 weeks), other committers can merge the
> > patch, and the PMC will want to discuss adding another maintainer.
> > >
> > > If you'd like to see examples for this model, check out the following
> > projects:
> > > - CloudStack:
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > <
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > >
> > > - Subversion:
> > https://subversion.apache.org/docs/community-guide/roles.html <
> > https://subversion.apache.org/docs/community-guide/roles.html>
> > >
> > > Finally, I wanted to list our current proposal for initial components
> > and maintainers. It would be good to get feedback on other components we
> > might add, but please note that personnel discussions (e.g. "I don't
> think
> > Matei should maintain *that* component) should only happen on the private
> > list. The initial components were chosen to include all public APIs and
> the
> > main core components, and the maintainers were chosen from the most
> active
> > contributors to those modules.
> > >
> > > - Spark core public API: Matei, Patrick, Reynold
> > > - Job scheduler: Matei, Kay, Patrick
> > > - Shuffle and network: Reynold, Aaron, Matei
> > > - Block manager: Reynold, Aaron
> > > - YARN: Tom, Andrew Or
> > > - Python: Josh, Matei
> > > - MLlib: Xiangrui, Matei
> > > - SQL: Michael, Reynold
> > > - Streaming: TD, Matei
> > > - GraphX: Ankur, Joey, Reynold
> > >
> > > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > >
> > > Matei
> >
> >
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Michael Armbrust <mi...@databricks.com>.

+1 (binding)

On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> BTW, my own vote is obviously +1 (binding).
>
> Matei
>
> > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >
> > As background on this, Spark has grown a lot since joining Apache. We've
> had over 80 contributors/month for the past 3 months, which I believe makes
> us the most active project in contributors/month at Apache, as well as over
> 500 patches/month. The codebase has also grown significantly, with new
> libraries for SQL, ML, graphs and more.
> >
> > In this kind of large project, one common way to scale development is to
> assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >
> > IMO, adopting this model would have two benefits:
> >
> > 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >
> > 2) More structure for new contributors and committers -- in particular,
> it would be easy to look up who’s responsible for each module and ask them
> for reviews, etc, rather than having patches slip between the cracks.
> >
> > We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >
> > - Some components in Spark will have maintainers assigned to them, where
> one of the maintainers needs to sign off on each patch to the component.
> > - Each component with maintainers will have at least 2 maintainers.
> > - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> > - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >
> > If you'd like to see examples for this model, check out the following
> projects:
> > - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> > - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >
> > Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >
> > - Spark core public API: Matei, Patrick, Reynold
> > - Job scheduler: Matei, Kay, Patrick
> > - Shuffle and network: Reynold, Aaron, Matei
> > - Block manager: Reynold, Aaron
> > - YARN: Tom, Andrew Or
> > - Python: Josh, Matei
> > - MLlib: Xiangrui, Matei
> > - SQL: Michael, Reynold
> > - Streaming: TD, Matei
> > - GraphX: Ankur, Joey, Reynold
> >
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >
> > Matei
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Yu Ishikawa <yu...@gmail.com>.

+1 (binding) 

On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <[hidden email]> 
wrote: 

> BTW, my own vote is obviously +1 (binding). 
> 
> Matei 
> 
> > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <[hidden email]> 
> wrote: 
> > 
> > Hi all, 
> > 
> > I wanted to share a discussion we've been having on the PMC list, as 
> well as call for an official vote on it on a public list. Basically, as
> the 
> Spark project scales up, we need to define a model to make sure there is 
> still great oversight of key components (in particular internal 
> architecture and public APIs), and to this end I've proposed implementing
> a 
> maintainer model for some of these components, similar to other large 
> projects. 
> > 
> > As background on this, Spark has grown a lot since joining Apache. We've 
> had over 80 contributors/month for the past 3 months, which I believe
> makes 
> us the most active project in contributors/month at Apache, as well as
> over 
> 500 patches/month. The codebase has also grown significantly, with new 
> libraries for SQL, ML, graphs and more. 
> > 
> > In this kind of large project, one common way to scale development is to 
> assign "maintainers" to oversee key components, where each patch to that 
> component needs to get sign-off from at least one of its maintainers. Most 
> existing large projects do this -- at Apache, some large ones with this 
> model are CloudStack (the second-most active project overall), Subversion, 
> and Kafka, and other examples include Linux and Python. This is also 
> by-and-large how Spark operates today -- most components have a de-facto 
> maintainer. 
> > 
> > IMO, adopting this model would have two benefits: 
> > 
> > 1) Consistent oversight of design for that component, especially 
> regarding architecture and API. This process would ensure that the 
> component's maintainers see all proposed changes and consider them to fit 
> together in a good way. 
> > 
> > 2) More structure for new contributors and committers -- in particular, 
> it would be easy to look up who’s responsible for each module and ask them 
> for reviews, etc, rather than having patches slip between the cracks. 
> > 
> > We'd like to start with in a light-weight manner, where the model only 
> applies to certain key components (e.g. scheduler, shuffle) and
> user-facing 
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand 
> it if we deem it useful. The specific mechanics would be as follows: 
> > 
> > - Some components in Spark will have maintainers assigned to them, where 
> one of the maintainers needs to sign off on each patch to the component. 
> > - Each component with maintainers will have at least 2 maintainers. 
> > - Maintainers will be assigned from the most active and knowledgeable 
> committers on that component by the PMC. The PMC can vote to add / remove 
> maintainers, and maintained components, through consensus. 
> > - Maintainers are expected to be active in responding to patches for 
> their components, though they do not need to be the main reviewers for
> them 
> (e.g. they might just sign off on architecture / API). To prevent inactive 
> maintainers from blocking the project, if a maintainer isn't responding in 
> a reasonable time period (say 2 weeks), other committers can merge the 
> patch, and the PMC will want to discuss adding another maintainer. 
> > 
> > If you'd like to see examples for this model, check out the following 
> projects: 
> > - CloudStack: 
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> < 
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > 
> > - Subversion: 
> https://subversion.apache.org/docs/community-guide/roles.html < 
> https://subversion.apache.org/docs/community-guide/roles.html> 
> > 
> > Finally, I wanted to list our current proposal for initial components 
> and maintainers. It would be good to get feedback on other components we 
> might add, but please note that personnel discussions (e.g. "I don't think 
> Matei should maintain *that* component) should only happen on the private 
> list. The initial components were chosen to include all public APIs and
> the 
> main core components, and the maintainers were chosen from the most active 
> contributors to those modules. 
> > 
> > - Spark core public API: Matei, Patrick, Reynold 
> > - Job scheduler: Matei, Kay, Patrick 
> > - Shuffle and network: Reynold, Aaron, Matei 
> > - Block manager: Reynold, Aaron 
> > - YARN: Tom, Andrew Or 
> > - Python: Josh, Matei 
> > - MLlib: Xiangrui, Matei 
> > - SQL: Michael, Reynold 
> > - Streaming: TD, Matei 
> > - GraphX: Ankur, Joey, Reynold 
> > 
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
> [VOTE] will end on Nov 8, 2014 at 6 PM PST. 
> > 
> > Matei 
> 
> 



-----
-- Yu Ishikawa
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Designating-maintainers-for-some-Spark-components-tp9115p9281.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

RE: [VOTE] Designating maintainers for some Spark components

Posted by "Cheng, Hao" <ha...@intel.com>.

+1, that definitely will speeds up the PR reviewing / merging.

-----Original Message-----
From: Cheng Lian [mailto:lian.cs.zju@gmail.com] 
Sent: Thursday, November 6, 2014 12:46 PM
To: dev
Subject: Re: [VOTE] Designating maintainers for some Spark components

+1 since this is already the de facto model we are using.

On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) <wa...@huawei.com> wrote:

> +1
>
> 发自我的 iPhone
>
> > 在 2014年11月5日，20:06，"Denny Lee" <de...@gmail.com> 写道：
> >
> > +1 great idea.
> >> On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng <me...@gmail.com> wrote:
> >>
> >> +1 (binding)
> >>
> >> On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra 
> >> <ma...@clearstorydata.com>
> >> wrote:
> >>> +1 (binding)
> >>>
> >>> On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <
> >> nicholas.chammas@gmail.com
> >>>> wrote:
> >>>
> >>>> +1 on this proposal.
> >>>>
> >>>>> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com>
> wrote:
> >>>>>
> >>>>> Will these maintainers have a cleanup for those pending PRs upon 
> >>>>> we
> >> start
> >>>>> to apply this model?
> >>>>
> >>>>
> >>>> I second Nan's question. I would like to see this initiative 
> >>>> drive a reduction in the number of stale PRs we have out there. 
> >>>> We're
> >> approaching
> >>>> 300 open PRs again.
> >>>>
> >>>> Nick
> >>
> >> -------------------------------------------------------------------
> >> -- To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For 
> >> additional commands, e-mail: dev-help@spark.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For 
> additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Jeremy Freeman <fr...@gmail.com>.

Great idea! +1

— Jeremy

-------------------------
jeremyfreeman.net
@thefreemanlab

On Nov 5, 2014, at 11:48 PM, Timothy Chen <tn...@gmail.com> wrote:

> Matei that makes sense, +1 (non-binding)
> 
> Tim
> 
> On Wed, Nov 5, 2014 at 8:46 PM, Cheng Lian <li...@gmail.com> wrote:
>> +1 since this is already the de facto model we are using.
>> 
>> On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) <wa...@huawei.com> wrote:
>> 
>>> +1
>>> 
>>> 发自我的 iPhone
>>> 
>>>> 在 2014年11月5日，20:06，"Denny Lee" <de...@gmail.com> 写道：
>>>> 
>>>> +1 great idea.
>>>>> On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng <me...@gmail.com> wrote:
>>>>> 
>>>>> +1 (binding)
>>>>> 
>>>>> On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra <ma...@clearstorydata.com>
>>>>> wrote:
>>>>>> +1 (binding)
>>>>>> 
>>>>>> On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> +1 on this proposal.
>>>>>>> 
>>>>>>>> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com>
>>> wrote:
>>>>>>>> 
>>>>>>>> Will these maintainers have a cleanup for those pending PRs upon we
>>>>> start
>>>>>>>> to apply this model?
>>>>>>> 
>>>>>>> 
>>>>>>> I second Nan's question. I would like to see this initiative drive a
>>>>>>> reduction in the number of stale PRs we have out there. We're
>>>>> approaching
>>>>>>> 300 open PRs again.
>>>>>>> 
>>>>>>> Nick
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>> 
>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>> 
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Timothy Chen <tn...@gmail.com>.

Matei that makes sense, +1 (non-binding)

Tim

On Wed, Nov 5, 2014 at 8:46 PM, Cheng Lian <li...@gmail.com> wrote:
> +1 since this is already the de facto model we are using.
>
> On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) <wa...@huawei.com> wrote:
>
>> +1
>>
>> 发自我的 iPhone
>>
>> > 在 2014年11月5日，20:06，"Denny Lee" <de...@gmail.com> 写道：
>> >
>> > +1 great idea.
>> >> On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng <me...@gmail.com> wrote:
>> >>
>> >> +1 (binding)
>> >>
>> >> On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra <ma...@clearstorydata.com>
>> >> wrote:
>> >>> +1 (binding)
>> >>>
>> >>> On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <
>> >> nicholas.chammas@gmail.com
>> >>>> wrote:
>> >>>
>> >>>> +1 on this proposal.
>> >>>>
>> >>>>> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> Will these maintainers have a cleanup for those pending PRs upon we
>> >> start
>> >>>>> to apply this model?
>> >>>>
>> >>>>
>> >>>> I second Nan's question. I would like to see this initiative drive a
>> >>>> reduction in the number of stale PRs we have out there. We're
>> >> approaching
>> >>>> 300 open PRs again.
>> >>>>
>> >>>> Nick
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: dev-help@spark.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Cheng Lian <li...@gmail.com>.

+1 since this is already the de facto model we are using.

On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) <wa...@huawei.com> wrote:

> +1
>
> 发自我的 iPhone
>
> > 在 2014年11月5日，20:06，"Denny Lee" <de...@gmail.com> 写道：
> >
> > +1 great idea.
> >> On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng <me...@gmail.com> wrote:
> >>
> >> +1 (binding)
> >>
> >> On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra <ma...@clearstorydata.com>
> >> wrote:
> >>> +1 (binding)
> >>>
> >>> On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <
> >> nicholas.chammas@gmail.com
> >>>> wrote:
> >>>
> >>>> +1 on this proposal.
> >>>>
> >>>>> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com>
> wrote:
> >>>>>
> >>>>> Will these maintainers have a cleanup for those pending PRs upon we
> >> start
> >>>>> to apply this model?
> >>>>
> >>>>
> >>>> I second Nan's question. I would like to see this initiative drive a
> >>>> reduction in the number of stale PRs we have out there. We're
> >> approaching
> >>>> 300 open PRs again.
> >>>>
> >>>> Nick
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by "Wangfei (X)" <wa...@huawei.com>.

+1

发自我的 iPhone

> 在 2014年11月5日，20:06，"Denny Lee" <de...@gmail.com> 写道：
> 
> +1 great idea.
>> On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> +1 (binding)
>> 
>> On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra <ma...@clearstorydata.com>
>> wrote:
>>> +1 (binding)
>>> 
>>> On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com
>>>> wrote:
>>> 
>>>> +1 on this proposal.
>>>> 
>>>>> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com> wrote:
>>>>> 
>>>>> Will these maintainers have a cleanup for those pending PRs upon we
>> start
>>>>> to apply this model?
>>>> 
>>>> 
>>>> I second Nan's question. I would like to see this initiative drive a
>>>> reduction in the number of stale PRs we have out there. We're
>> approaching
>>>> 300 open PRs again.
>>>> 
>>>> Nick
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Denny Lee <de...@gmail.com>.

+1 great idea.
On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng <me...@gmail.com> wrote:

> +1 (binding)
>
> On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
> > +1 (binding)
> >
> > On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com
> >> wrote:
> >
> >> +1 on this proposal.
> >>
> >> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com> wrote:
> >>
> >> > Will these maintainers have a cleanup for those pending PRs upon we
> start
> >> > to apply this model?
> >>
> >>
> >> I second Nan's question. I would like to see this initiative drive a
> >> reduction in the number of stale PRs we have out there. We're
> approaching
> >> 300 open PRs again.
> >>
> >> Nick
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Xiangrui Meng <me...@gmail.com>.

+1 (binding)

On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
> +1 (binding)
>
> On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <nicholas.chammas@gmail.com
>> wrote:
>
>> +1 on this proposal.
>>
>> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com> wrote:
>>
>> > Will these maintainers have a cleanup for those pending PRs upon we start
>> > to apply this model?
>>
>>
>> I second Nan's question. I would like to see this initiative drive a
>> reduction in the number of stale PRs we have out there. We're approaching
>> 300 open PRs again.
>>
>> Nick
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Mark Hamstra <ma...@clearstorydata.com>.

+1 (binding)

On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> +1 on this proposal.
>
> On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com> wrote:
>
> > Will these maintainers have a cleanup for those pending PRs upon we start
> > to apply this model?
>
>
> I second Nan's question. I would like to see this initiative drive a
> reduction in the number of stale PRs we have out there. We're approaching
> 300 open PRs again.
>
> Nick
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Nicholas Chammas <ni...@gmail.com>.

+1 on this proposal.

On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu <zh...@gmail.com> wrote:

> Will these maintainers have a cleanup for those pending PRs upon we start
> to apply this model?

I second Nan's question. I would like to see this initiative drive a
reduction in the number of stale PRs we have out there. We're approaching
300 open PRs again.

Nick

Re: [VOTE] Designating maintainers for some Spark components

Posted by Nan Zhu <zh...@gmail.com>.

+1, with a question

Will these maintainers have a cleanup for those pending PRs upon we start to apply this model? there are some patches always being there but haven’t been  merged, some of which are periodically maintained (rebase, ping , etc….), the others are just phased out  

Best,  

--  
Nan Zhu


On Wednesday, November 5, 2014 at 8:33 PM, Matei Zaharia wrote:

> BTW, my own vote is obviously +1 (binding).
>  
> Matei
>  
> > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com (mailto:matei.zaharia@gmail.com)> wrote:
> >  
> > Hi all,
> >  
> > I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> >  
> > As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> >  
> > In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> >  
> > IMO, adopting this model would have two benefits:
> >  
> > 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> >  
> > 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> >  
> > We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> >  
> > - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> > - Each component with maintainers will have at least 2 maintainers.
> > - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> > - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> >  
> > If you'd like to see examples for this model, check out the following projects:
> > - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>  
> > - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
> >  
> > Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> >  
> > - Spark core public API: Matei, Patrick, Reynold
> > - Job scheduler: Matei, Kay, Patrick
> > - Shuffle and network: Reynold, Aaron, Matei
> > - Block manager: Reynold, Aaron
> > - YARN: Tom, Andrew Or
> > - Python: Josh, Matei
> > - MLlib: Xiangrui, Matei
> > - SQL: Michael, Reynold
> > - Streaming: TD, Matei
> > - GraphX: Ankur, Joey, Reynold
> >  
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >  
> > Matei

Re: [VOTE] Designating maintainers for some Spark components

Posted by Reynold Xin <rx...@databricks.com>.

+1 (binding)

We are already doing this implicitly. In my experience, this can create
longer term personal commitment, which usually leads to better design
decisions if somebody knows they would need to look after something for a
while.

On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> BTW, my own vote is obviously +1 (binding).
>
> Matei
>
> > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >
> > As background on this, Spark has grown a lot since joining Apache. We've
> had over 80 contributors/month for the past 3 months, which I believe makes
> us the most active project in contributors/month at Apache, as well as over
> 500 patches/month. The codebase has also grown significantly, with new
> libraries for SQL, ML, graphs and more.
> >
> > In this kind of large project, one common way to scale development is to
> assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >
> > IMO, adopting this model would have two benefits:
> >
> > 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >
> > 2) More structure for new contributors and committers -- in particular,
> it would be easy to look up who’s responsible for each module and ask them
> for reviews, etc, rather than having patches slip between the cracks.
> >
> > We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >
> > - Some components in Spark will have maintainers assigned to them, where
> one of the maintainers needs to sign off on each patch to the component.
> > - Each component with maintainers will have at least 2 maintainers.
> > - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> > - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >
> > If you'd like to see examples for this model, check out the following
> projects:
> > - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> > - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >
> > Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >
> > - Spark core public API: Matei, Patrick, Reynold
> > - Job scheduler: Matei, Kay, Patrick
> > - Shuffle and network: Reynold, Aaron, Matei
> > - Block manager: Reynold, Aaron
> > - YARN: Tom, Andrew Or
> > - Python: Josh, Matei
> > - MLlib: Xiangrui, Matei
> > - SQL: Michael, Reynold
> > - Streaming: TD, Matei
> > - GraphX: Ankur, Joey, Reynold
> >
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >
> > Matei
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Matei Zaharia <ma...@gmail.com>.

BTW, my own vote is obviously +1 (binding).

Matei

> On Nov 5, 2014, at 5:31 PM, Matei Zaharia <ma...@gmail.com> wrote:
> 
> Hi all,
> 
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> 
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> 
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> 
> IMO, adopting this model would have two benefits:
> 
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> 
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> 
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> 
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> 
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> 
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
> 
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> 
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
> 
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> 
> Matei

Re: [VOTE] Designating maintainers for some Spark components

Posted by Reza Zadeh <re...@databricks.com>.

+1, sounds good.

On Wed, Nov 5, 2014 at 9:19 PM, Kousuke Saruta <sa...@oss.nttdata.co.jp>
wrote:

> +1, It makes sense!
>
> - Kousuke
>
>
> (2014/11/05 17:31), Matei Zaharia wrote:
>
>> Hi all,
>>
>> I wanted to share a discussion we've been having on the PMC list, as well
>> as call for an official vote on it on a public list. Basically, as the
>> Spark project scales up, we need to define a model to make sure there is
>> still great oversight of key components (in particular internal
>> architecture and public APIs), and to this end I've proposed implementing a
>> maintainer model for some of these components, similar to other large
>> projects.
>>
>> As background on this, Spark has grown a lot since joining Apache. We've
>> had over 80 contributors/month for the past 3 months, which I believe makes
>> us the most active project in contributors/month at Apache, as well as over
>> 500 patches/month. The codebase has also grown significantly, with new
>> libraries for SQL, ML, graphs and more.
>>
>> In this kind of large project, one common way to scale development is to
>> assign "maintainers" to oversee key components, where each patch to that
>> component needs to get sign-off from at least one of its maintainers. Most
>> existing large projects do this -- at Apache, some large ones with this
>> model are CloudStack (the second-most active project overall), Subversion,
>> and Kafka, and other examples include Linux and Python. This is also
>> by-and-large how Spark operates today -- most components have a de-facto
>> maintainer.
>>
>> IMO, adopting this model would have two benefits:
>>
>> 1) Consistent oversight of design for that component, especially
>> regarding architecture and API. This process would ensure that the
>> component's maintainers see all proposed changes and consider them to fit
>> together in a good way.
>>
>> 2) More structure for new contributors and committers -- in particular,
>> it would be easy to look up who’s responsible for each module and ask them
>> for reviews, etc, rather than having patches slip between the cracks.
>>
>> We'd like to start with in a light-weight manner, where the model only
>> applies to certain key components (e.g. scheduler, shuffle) and user-facing
>> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
>> it if we deem it useful. The specific mechanics would be as follows:
>>
>> - Some components in Spark will have maintainers assigned to them, where
>> one of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable
>> committers on that component by the PMC. The PMC can vote to add / remove
>> maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for
>> their components, though they do not need to be the main reviewers for them
>> (e.g. they might just sign off on architecture / API). To prevent inactive
>> maintainers from blocking the project, if a maintainer isn't responding in
>> a reasonable time period (say 2 weeks), other committers can merge the
>> patch, and the PMC will want to discuss adding another maintainer.
>>
>> If you'd like to see examples for this model, check out the following
>> projects:
>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/
>> CloudStack+Maintainers+Guide <https://cwiki.apache.org/
>> confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.
>> html <https://subversion.apache.org/docs/community-guide/roles.html>
>>
>> Finally, I wanted to list our current proposal for initial components and
>> maintainers. It would be good to get feedback on other components we might
>> add, but please note that personnel discussions (e.g. "I don't think Matei
>> should maintain *that* component) should only happen on the private list.
>> The initial components were chosen to include all public APIs and the main
>> core components, and the maintainers were chosen from the most active
>> contributors to those modules.
>>
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>>
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
>> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>>
>> Matei
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Kousuke Saruta <sa...@oss.nttdata.co.jp>.

+1, It makes sense!

- Kousuke

(2014/11/05 17:31), Matei Zaharia wrote:
> Hi all,
>
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>
> IMO, adopting this model would have two benefits:
>
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
>
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
>
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>
> Matei


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Corey Nolet <cj...@gmail.com>.

I'm actually going to change my non-binding to +0 for the proposal as-is.

I overlooked some parts of the original proposal that, when reading over
them again, do not sit well with me. "one of the maintainers needs to sign
off on each patch to the component", as Greg has pointed out, does seem to
imply that there are committers with more power than others with regards to
specific components- which does imply ownership.

My thinking would be to re-work in some way as to take out the accent on
ownership. I would maybe focus on things such as:

1) Other committers and contributors being forced to consult with
maintainers of modules before patches can get rolled in.
2) Maintainers being assigned specifically from PMC.
3) Oversight to have more accent on keeping the community happy in a
specific area of interest vice being a consultant for the design of a
specific piece.

On Thu, Nov 6, 2014 at 8:46 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> With my ASF Member hat on, I fully agree with Greg.
>
> As he points out, this is an anti-pattern in the ASF and is severely
> frowned upon.
>
> We, in Hadoop, had a similar trajectory where we had were politely told to
> go away from having sub-project committers (HDFS, MapReduce etc.) to a
> common list of committers. There were some concerns initially, but we have
> successfully managed to work together and build a more healthy community as
> a result of following the advice on the ASF Way.
>
> I do have sympathy for good oversight etc. as the project grows and
> attracts many contributors - it's essentially the need to have smaller,
> well-knit developer communities. One way to achieve that would be to have
> separate TLPs  (e.g. Spark, MLLIB, GraphX) with separate committer lists
> for each representing the appropriate community. Hadoop went a similar
> route where we had Pig, Hive, HBase etc. as sub-projects initially and then
> split them into TLPs with more focussed communities to the benefit of
> everyone. Maybe you guys want to try this too?
>
> ----
>
> Few more observations:
> # In general, *discussions* on project directions (such as new concept of
> *maintainers*) should happen first on the public lists *before* voting, not
> in the private PMC list.
> # If you chose to go this route in spite of this advice, seems to me Spark
> would be better of having more maintainers per component (at least 4-5),
> probably with a lot more diversity in terms of affiliations. Not sure if
> that is a concern - do you have good diversity in the proposed list? This
> will ensure that there are no concerns about a dominant employer
> controlling a project.
>
> ----
>
> Hope this helps - we've gone through similar journey, got through similar
> issues and fully embraced the Apache Way (™) as Greg points out to our
> benefit.
>
> thanks,
> Arun
>
>
> On Nov 6, 2014, at 4:18 PM, Greg Stein <gs...@gmail.com> wrote:
>
> > -1 (non-binding)
> >
> > This is an idea that runs COMPLETELY counter to the Apache Way, and is
> > to be severely frowned up. This creates *unequal* ownership of the
> > codebase.
> >
> > Each Member of the PMC should have *equal* rights to all areas of the
> > codebase until their purview. It should not be subjected to others'
> > "ownership" except throught the standard mechanisms of reviews and
> > if/when absolutely necessary, to vetos.
> >
> > Apache does not want "leads", "benevolent dictators" or "assigned
> > maintainers", no matter how you may dress it up with multiple
> > maintainers per component. The fact is that this creates an unequal
> > level of ownership and responsibility. The Board has shut down
> > projects that attempted or allowed for "Leads". Just a few months ago,
> > there was a problem with somebody calling themself a "Lead".
> >
> > I don't know why you suggest that Apache Subversion does this. We
> > absolutely do not. Never have. Never will. The Subversion codebase is
> > owned by all of us, and we all care for every line of it. Some people
> > know more than others, of course. But any one of us, can change any
> > part, without being subjected to a "maintainer". Of course, we ask
> > people with more knowledge of the component when we feel
> > uncomfortable, but we also know when it is safe or not to make a
> > specific change. And *always*, our fellow committers can review our
> > work and let us know when we've done something wrong.
> >
> > Equal ownership reduces fiefdoms, enhances a feeling of community and
> > project ownership, and creates a more open and inviting project.
> >
> > So again: -1 on this entire concept. Not good, to be polite.
> >
> > Regards,
> > Greg Stein
> > Director, Vice Chairman
> > Apache Software Foundation
> >
> > On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> >> Hi all,
> >>
> >> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >>
> >> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> >>
> >> In this kind of large project, one common way to scale development is
> to assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >>
> >> IMO, adopting this model would have two benefits:
> >>
> >> 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >>
> >> 2) More structure for new contributors and committers -- in particular,
> it would be easy to look up who’s responsible for each module and ask them
> for reviews, etc, rather than having patches slip between the cracks.
> >>
> >> We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >>
> >> - Some components in Spark will have maintainers assigned to them,
> where one of the maintainers needs to sign off on each patch to the
> component.
> >> - Each component with maintainers will have at least 2 maintainers.
> >> - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> >> - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >>
> >> If you'd like to see examples for this model, check out the following
> projects:
> >> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> >> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >>
> >> Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >>
> >> - Spark core public API: Matei, Patrick, Reynold
> >> - Job scheduler: Matei, Kay, Patrick
> >> - Shuffle and network: Reynold, Aaron, Matei
> >> - Block manager: Reynold, Aaron
> >> - YARN: Tom, Andrew Or
> >> - Python: Josh, Matei
> >> - MLlib: Xiangrui, Matei
> >> - SQL: Michael, Reynold
> >> - Streaming: TD, Matei
> >> - GraphX: Ankur, Joey, Reynold
> >>
> >> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >>
> >> Matei
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Cody Koeninger <co...@koeninger.org>.

My 2 cents:

Spark since pre-Apache days has been the most friendly and welcoming open
source project I've seen, and that's reflected in its success.

It seems pretty obvious to me that, for example, Michael should be looking
at major changes to the SQL codebase.  I trust him to do that in a way
that's technically and socially appropriate.

What Matei is saying makes sense, regardless of whether it gets codified in
a process.



On Thu, Nov 6, 2014 at 7:46 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> With my ASF Member hat on, I fully agree with Greg.
>
> As he points out, this is an anti-pattern in the ASF and is severely
> frowned upon.
>
> We, in Hadoop, had a similar trajectory where we had were politely told to
> go away from having sub-project committers (HDFS, MapReduce etc.) to a
> common list of committers. There were some concerns initially, but we have
> successfully managed to work together and build a more healthy community as
> a result of following the advice on the ASF Way.
>
> I do have sympathy for good oversight etc. as the project grows and
> attracts many contributors - it's essentially the need to have smaller,
> well-knit developer communities. One way to achieve that would be to have
> separate TLPs  (e.g. Spark, MLLIB, GraphX) with separate committer lists
> for each representing the appropriate community. Hadoop went a similar
> route where we had Pig, Hive, HBase etc. as sub-projects initially and then
> split them into TLPs with more focussed communities to the benefit of
> everyone. Maybe you guys want to try this too?
>
> ----
>
> Few more observations:
> # In general, *discussions* on project directions (such as new concept of
> *maintainers*) should happen first on the public lists *before* voting, not
> in the private PMC list.
> # If you chose to go this route in spite of this advice, seems to me Spark
> would be better of having more maintainers per component (at least 4-5),
> probably with a lot more diversity in terms of affiliations. Not sure if
> that is a concern - do you have good diversity in the proposed list? This
> will ensure that there are no concerns about a dominant employer
> controlling a project.
>
> ----
>
> Hope this helps - we've gone through similar journey, got through similar
> issues and fully embraced the Apache Way (™) as Greg points out to our
> benefit.
>
> thanks,
> Arun
>
>
> On Nov 6, 2014, at 4:18 PM, Greg Stein <gs...@gmail.com> wrote:
>
> > -1 (non-binding)
> >
> > This is an idea that runs COMPLETELY counter to the Apache Way, and is
> > to be severely frowned up. This creates *unequal* ownership of the
> > codebase.
> >
> > Each Member of the PMC should have *equal* rights to all areas of the
> > codebase until their purview. It should not be subjected to others'
> > "ownership" except throught the standard mechanisms of reviews and
> > if/when absolutely necessary, to vetos.
> >
> > Apache does not want "leads", "benevolent dictators" or "assigned
> > maintainers", no matter how you may dress it up with multiple
> > maintainers per component. The fact is that this creates an unequal
> > level of ownership and responsibility. The Board has shut down
> > projects that attempted or allowed for "Leads". Just a few months ago,
> > there was a problem with somebody calling themself a "Lead".
> >
> > I don't know why you suggest that Apache Subversion does this. We
> > absolutely do not. Never have. Never will. The Subversion codebase is
> > owned by all of us, and we all care for every line of it. Some people
> > know more than others, of course. But any one of us, can change any
> > part, without being subjected to a "maintainer". Of course, we ask
> > people with more knowledge of the component when we feel
> > uncomfortable, but we also know when it is safe or not to make a
> > specific change. And *always*, our fellow committers can review our
> > work and let us know when we've done something wrong.
> >
> > Equal ownership reduces fiefdoms, enhances a feeling of community and
> > project ownership, and creates a more open and inviting project.
> >
> > So again: -1 on this entire concept. Not good, to be polite.
> >
> > Regards,
> > Greg Stein
> > Director, Vice Chairman
> > Apache Software Foundation
> >
> > On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> >> Hi all,
> >>
> >> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >>
> >> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> >>
> >> In this kind of large project, one common way to scale development is
> to assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >>
> >> IMO, adopting this model would have two benefits:
> >>
> >> 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >>
> >> 2) More structure for new contributors and committers -- in particular,
> it would be easy to look up who’s responsible for each module and ask them
> for reviews, etc, rather than having patches slip between the cracks.
> >>
> >> We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >>
> >> - Some components in Spark will have maintainers assigned to them,
> where one of the maintainers needs to sign off on each patch to the
> component.
> >> - Each component with maintainers will have at least 2 maintainers.
> >> - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> >> - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >>
> >> If you'd like to see examples for this model, check out the following
> projects:
> >> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> >> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >>
> >> Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >>
> >> - Spark core public API: Matei, Patrick, Reynold
> >> - Job scheduler: Matei, Kay, Patrick
> >> - Shuffle and network: Reynold, Aaron, Matei
> >> - Block manager: Reynold, Aaron
> >> - YARN: Tom, Andrew Or
> >> - Python: Josh, Matei
> >> - MLlib: Xiangrui, Matei
> >> - SQL: Michael, Reynold
> >> - Streaming: TD, Matei
> >> - GraphX: Ankur, Joey, Reynold
> >>
> >> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >>
> >> Matei
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Arun C Murthy <ac...@hortonworks.com>.

With my ASF Member hat on, I fully agree with Greg.

As he points out, this is an anti-pattern in the ASF and is severely frowned upon.

We, in Hadoop, had a similar trajectory where we had were politely told to go away from having sub-project committers (HDFS, MapReduce etc.) to a common list of committers. There were some concerns initially, but we have successfully managed to work together and build a more healthy community as a result of following the advice on the ASF Way.

I do have sympathy for good oversight etc. as the project grows and attracts many contributors - it's essentially the need to have smaller, well-knit developer communities. One way to achieve that would be to have separate TLPs  (e.g. Spark, MLLIB, GraphX) with separate committer lists for each representing the appropriate community. Hadoop went a similar route where we had Pig, Hive, HBase etc. as sub-projects initially and then split them into TLPs with more focussed communities to the benefit of everyone. Maybe you guys want to try this too?

----

Few more observations:
# In general, *discussions* on project directions (such as new concept of *maintainers*) should happen first on the public lists *before* voting, not in the private PMC list.
# If you chose to go this route in spite of this advice, seems to me Spark would be better of having more maintainers per component (at least 4-5), probably with a lot more diversity in terms of affiliations. Not sure if that is a concern - do you have good diversity in the proposed list? This will ensure that there are no concerns about a dominant employer controlling a project.

----

Hope this helps - we've gone through similar journey, got through similar issues and fully embraced the Apache Way (™) as Greg points out to our benefit.

thanks,
Arun


On Nov 6, 2014, at 4:18 PM, Greg Stein <gs...@gmail.com> wrote:

> -1 (non-binding)
> 
> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> to be severely frowned up. This creates *unequal* ownership of the
> codebase.
> 
> Each Member of the PMC should have *equal* rights to all areas of the
> codebase until their purview. It should not be subjected to others'
> "ownership" except throught the standard mechanisms of reviews and
> if/when absolutely necessary, to vetos.
> 
> Apache does not want "leads", "benevolent dictators" or "assigned
> maintainers", no matter how you may dress it up with multiple
> maintainers per component. The fact is that this creates an unequal
> level of ownership and responsibility. The Board has shut down
> projects that attempted or allowed for "Leads". Just a few months ago,
> there was a problem with somebody calling themself a "Lead".
> 
> I don't know why you suggest that Apache Subversion does this. We
> absolutely do not. Never have. Never will. The Subversion codebase is
> owned by all of us, and we all care for every line of it. Some people
> know more than others, of course. But any one of us, can change any
> part, without being subjected to a "maintainer". Of course, we ask
> people with more knowledge of the component when we feel
> uncomfortable, but we also know when it is safe or not to make a
> specific change. And *always*, our fellow committers can review our
> work and let us know when we've done something wrong.
> 
> Equal ownership reduces fiefdoms, enhances a feeling of community and
> project ownership, and creates a more open and inviting project.
> 
> So again: -1 on this entire concept. Not good, to be polite.
> 
> Regards,
> Greg Stein
> Director, Vice Chairman
> Apache Software Foundation
> 
> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
>> Hi all,
>> 
>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>> 
>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>> 
>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>> 
>> IMO, adopting this model would have two benefits:
>> 
>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>> 
>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>> 
>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>> 
>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>> 
>> If you'd like to see examples for this model, check out the following projects:
>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> 
>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
>> 
>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>> 
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>> 
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> 
>> Matei
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Reynold Xin <rx...@databricks.com>.

Greg,

Thanks a lot for commenting on this, but I feel we are splitting hairs
here. Matei did mention -1, followed by "or give feedback". The original
process outlined by Matei was exactly about review, rather than fighting.
Nobody wants to spend their energy fighting.  Everybody is doing it to
improve the project.


In particular, quoting you in your email

"Be careful here. "Responsibility" is pretty much a taboo word. All of
Apache is a group of volunteers. People can disappear at any point, which
is why you need multiple (as my fellow Director warned, on your private
list). And multiple people can disappear."

Take a look at this page: http://www.apache.org/dev/pmc.html

This Project Management Committee Guide outlines the general
***responsibilities*** of PMC members in managing their projects.

Are you suggesting the wording used by the PMC guideline itself is taboo?





On Thu, Nov 6, 2014 at 11:27 PM, Greg Stein <gs...@gmail.com> wrote:

> [last reply for tonite; let others read; and after the next drink or three,
> I shouldn't be replying...]
>
> On Thu, Nov 6, 2014 at 11:38 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
> > Alright, Greg, I think I understand how Subversion's model is different,
> > which is that the PMC members are all full committers. However, I still
> > think that the model proposed here is purely organizational (how the PMC
> > and committers organize themselves), and in no way changes peoples'
> > ownership or rights.
>
>
> That was not my impression, when your proposal said that maintainers need
> to provide "sign-off".
>
> Okay. Now my next item of feedback starts here:
>
>
> > Certainly the reason I proposed it was organizational, to make sure
> > patches get seen by the right people. I believe that every PMC member
> still
> > has the same responsibility for two reasons:
> >
> > 1) The PMC is actually what selects the maintainers, so basically this
> > mechanism is a way for the PMC to make sure certain people review each
> > patch.
> >
> > 2) Code changes are all still made by consensus, where any individual has
> > veto power over the code. The maintainer model mentioned here is only
> meant
> > to make sure that the "experts" in an area get to see each patch *before*
> > it is merged, and choose whether to exercise their veto power.
> >
> > Let me give a simple example, which is a patch to the Spark core public
> > API. Say I'm a maintainer in this API. Without the maintainer model, the
> > decision on the patch would be made as follows:
> >
> > - Any committer could review the patch and merge it
> > - At any point during this process, I (as the main expert on this) could
> > come in and -1 it, or give feedback
> > - In addition, any other committer beyond me is allowed to -1 this patch
> >
> > With the maintainer model, the process is as follows:
> >
> > - Any committer could review the patch and merge it, but they would need
> > to forward it to me (or another core API maintainer) to make sure we also
> > approve
> > - At any point during this process, I could come in and -1 it, or give
> > feedback
> > - In addition, any other committer beyond me is still allowed to -1 this
> > patch
> >
> > The only change in this model is that committers are responsible to
> > forward patches in these areas to certain other committers. If every
> > committer had perfect oversight of the project, they could have also seen
> > every patch to their component on their own, but this list ensures that
> > they see it even if they somehow overlooked it.
> >
> > It's true that technically this model might "gate" development in the
> > sense of adding some latency, but it doesn't "gate" it any more than
> > consensus as a whole does, where any committer (not even PMC member) can
> -1
> > any code change. In fact I believe this will speed development by
> > motivating the maintainers to be active in reviewing their areas and by
> > reducing the chance that mistakes happen that require a revert.
> >
> > I apologize if this wasn't clear in any way, but I do think it's pretty
> > clear in the original wording of the proposal. The sign-off by a
> maintainer
> > is simply an extra step in the merge process, it does *not* mean that
> other
> > committers can't -1 a patch, or that the maintainers get to review all
> > patches, or that they somehow have more "ownership" of the component
> (since
> > they already had the ability to -1). I also wanted to clarify another
> thing
> > -- it seems there is a misunderstanding that only PMC members can be
> > maintainers, but this was not the point; the PMC *assigns* maintainers
> but
> > they can do it out of the whole committer pool (and if we move to
> > separating the PMC from the committers, I fully expect some non-PMC
> > committers to be made maintainers).
> >
>
> ... and ends here.
>
> All of that text is about a process for applying Vetoes. ... That is just
> the wrong focus (IMO).
>
> Back around 2000, in httpd, we ran into vetoes. It was horrible. The
> community suffered. We actually had a face-to-face at one point, flying in
> people from around the US, gathering a bunch of the httpd committers to
> work through some basic problems. The vetoes were flying fast and furious,
> and it was just the wrong dynamic. Discussion and consensus had been thrown
> aside. Trust was absent. Peer relationships were ruined. (tho thankfully,
> our personal relationships never suffered, and that basis helped us pull it
> back together)
>
> Contrast that with Subversion. We've had some vetoes, yes. But invariably,
> MOST of them would really be considered "woah. -1 on that. let's talk".
> Only a few were about somebody laying down the veto hammer. Outside those
> few, a -1 was always about opening a discussion to fix a particular commit.
>
> It looks like you are creating a process to apply vetoes. That seems
> backwards.
>
> It seems like you want a process to ensure that reviews are performed. IMO,
> all committers/PMC members should begin as *trusted*. Why not? You've
> already voted them in as committers/PMCers. So trust them. Trust.
>
> And that leads to "trust, but verify". The review process. So how about
> creating a workflow that is focused on "what needs to be reviewed" rather
> than "nobody can make changes, unless John says so". ??
>
>
> > I hope this clarifies where we're coming from, and why we believe that
> > this still conforms fully with the spirit of Apache (collaborative, open
> > development that anyone can participate in, and meritocracy for project
> > governance). There were some comments made about the maintainers being
> only
> > some kind of list of people without a requirement to review stuff, but as
> > you can see it's the requirement to review that is the main reason I'm
> > proposing this, to ensure we have an automated process for patches to
> > certain components to be seen. If it helps we may be able to change the
> > wording to something like "it is every committer's responsibility to
> > forward patches for a maintained component to that component's
> maintainer",
> > or something like that, instead of using "sign off".
>
>
> As I've mentioned "sign off" is a term for unequal rights. So yes...
> finding a modification would be great. But honestly, I think it would be
> nice to find a workflow that establishes the *reviews* that you're seeking.
>
> [ Subversion revision props could be used to tag if/when somebody has
> reviewed a particular revision; I dunno if you guys are using svn or git ]
>
>
> > If we don't do this, I'd actually be against any measure that lists some
> > component "maintainers" without them having a specific responsibility.
>
>
> Be careful here. "Responsibility" is pretty much a taboo word. All of
> Apache is a group of volunteers. People can disappear at any point, which
> is why you need multiple (as my fellow Director warned, on your private
> list). And multiple people can disappear.
>
> This is the primary reason why Apache prefers lazy consensus. Participants
> may disappear, so everybody should be aware of that, and be aware that lazy
> consensus is in operation. Placing a *volunteer* in the path of forward
> progress is fraught with disaster, in the long run :-)
>
> >...
>
> Cheers,
> -g
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Greg Stein <gs...@gmail.com>.

[last reply for tonite; let others read; and after the next drink or three,
I shouldn't be replying...]

On Thu, Nov 6, 2014 at 11:38 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> Alright, Greg, I think I understand how Subversion's model is different,
> which is that the PMC members are all full committers. However, I still
> think that the model proposed here is purely organizational (how the PMC
> and committers organize themselves), and in no way changes peoples'
> ownership or rights.

That was not my impression, when your proposal said that maintainers need
to provide "sign-off".

Okay. Now my next item of feedback starts here:

> Certainly the reason I proposed it was organizational, to make sure
> patches get seen by the right people. I believe that every PMC member still
> has the same responsibility for two reasons:
>
> 1) The PMC is actually what selects the maintainers, so basically this
> mechanism is a way for the PMC to make sure certain people review each
> patch.
>
> 2) Code changes are all still made by consensus, where any individual has
> veto power over the code. The maintainer model mentioned here is only meant
> to make sure that the "experts" in an area get to see each patch *before*
> it is merged, and choose whether to exercise their veto power.
>
> Let me give a simple example, which is a patch to the Spark core public
> API. Say I'm a maintainer in this API. Without the maintainer model, the
> decision on the patch would be made as follows:
>
> - Any committer could review the patch and merge it
> - At any point during this process, I (as the main expert on this) could
> come in and -1 it, or give feedback
> - In addition, any other committer beyond me is allowed to -1 this patch
>
> With the maintainer model, the process is as follows:
>
> - Any committer could review the patch and merge it, but they would need
> to forward it to me (or another core API maintainer) to make sure we also
> approve
> - At any point during this process, I could come in and -1 it, or give
> feedback
> - In addition, any other committer beyond me is still allowed to -1 this
> patch
>
> The only change in this model is that committers are responsible to
> forward patches in these areas to certain other committers. If every
> committer had perfect oversight of the project, they could have also seen
> every patch to their component on their own, but this list ensures that
> they see it even if they somehow overlooked it.
>
> It's true that technically this model might "gate" development in the
> sense of adding some latency, but it doesn't "gate" it any more than
> consensus as a whole does, where any committer (not even PMC member) can -1
> any code change. In fact I believe this will speed development by
> motivating the maintainers to be active in reviewing their areas and by
> reducing the chance that mistakes happen that require a revert.
>
> I apologize if this wasn't clear in any way, but I do think it's pretty
> clear in the original wording of the proposal. The sign-off by a maintainer
> is simply an extra step in the merge process, it does *not* mean that other
> committers can't -1 a patch, or that the maintainers get to review all
> patches, or that they somehow have more "ownership" of the component (since
> they already had the ability to -1). I also wanted to clarify another thing
> -- it seems there is a misunderstanding that only PMC members can be
> maintainers, but this was not the point; the PMC *assigns* maintainers but
> they can do it out of the whole committer pool (and if we move to
> separating the PMC from the committers, I fully expect some non-PMC
> committers to be made maintainers).
>

... and ends here.

All of that text is about a process for applying Vetoes. ... That is just
the wrong focus (IMO).

Back around 2000, in httpd, we ran into vetoes. It was horrible. The
community suffered. We actually had a face-to-face at one point, flying in
people from around the US, gathering a bunch of the httpd committers to
work through some basic problems. The vetoes were flying fast and furious,
and it was just the wrong dynamic. Discussion and consensus had been thrown
aside. Trust was absent. Peer relationships were ruined. (tho thankfully,
our personal relationships never suffered, and that basis helped us pull it
back together)

Contrast that with Subversion. We've had some vetoes, yes. But invariably,
MOST of them would really be considered "woah. -1 on that. let's talk".
Only a few were about somebody laying down the veto hammer. Outside those
few, a -1 was always about opening a discussion to fix a particular commit.

It looks like you are creating a process to apply vetoes. That seems
backwards.

It seems like you want a process to ensure that reviews are performed. IMO,
all committers/PMC members should begin as *trusted*. Why not? You've
already voted them in as committers/PMCers. So trust them. Trust.

And that leads to "trust, but verify". The review process. So how about
creating a workflow that is focused on "what needs to be reviewed" rather
than "nobody can make changes, unless John says so". ??

> I hope this clarifies where we're coming from, and why we believe that
> this still conforms fully with the spirit of Apache (collaborative, open
> development that anyone can participate in, and meritocracy for project
> governance). There were some comments made about the maintainers being only
> some kind of list of people without a requirement to review stuff, but as
> you can see it's the requirement to review that is the main reason I'm
> proposing this, to ensure we have an automated process for patches to
> certain components to be seen. If it helps we may be able to change the
> wording to something like "it is every committer's responsibility to
> forward patches for a maintained component to that component's maintainer",
> or something like that, instead of using "sign off".

As I've mentioned "sign off" is a term for unequal rights. So yes...
finding a modification would be great. But honestly, I think it would be
nice to find a workflow that establishes the *reviews* that you're seeking.

[ Subversion revision props could be used to tag if/when somebody has
reviewed a particular revision; I dunno if you guys are using svn or git ]

> If we don't do this, I'd actually be against any measure that lists some
> component "maintainers" without them having a specific responsibility.

Be careful here. "Responsibility" is pretty much a taboo word. All of
Apache is a group of volunteers. People can disappear at any point, which
is why you need multiple (as my fellow Director warned, on your private
list). And multiple people can disappear.

This is the primary reason why Apache prefers lazy consensus. Participants
may disappear, so everybody should be aware of that, and be aware that lazy
consensus is in operation. Placing a *volunteer* in the path of forward
progress is fraught with disaster, in the long run :-)

>...

Cheers,
-g

Re: [VOTE] Designating maintainers for some Spark components

Posted by vaquar khan <va...@gmail.com>.

+1 (binding)
On 8 Nov 2014 07:26, "Davies Liu" <da...@databricks.com> wrote:

> Sorry for my last email, I misunderstood the proposal here, all the
> committer still have equal -1 to all the code changes.
>
> Also, as mentioned in the proposal, the sign off only happens to
> public API and architect, something like discussion about code style
> things are still the same.
>
> So, I'd revert my vote to +1. Sorry for this.
>
> Davies
>
>
> On Fri, Nov 7, 2014 at 3:18 PM, Davies Liu <da...@databricks.com> wrote:
> > -1 (not binding, +1 for maintainer, -1 for sign off)
> >
> > Agree with Greg and Vinod. In the beginning, everything is better
> > (more efficient, more focus), but after some time, fighting begins.
> >
> > Code style is the most hot topic to fight (we already saw it in some
> > PRs). If two committers (one of them is maintainer) have not got a
> > agreement on code style, before this process, they will ask comments
> > from other committers, but after this process, the maintainer have
> > higher priority to -1, then maintainer will keep his/her personal
> > preference, it's hard to make a agreement. Finally, different
> > components will have different code style (or others).
> >
> > Right now, maintainers are kind of first contact or best contacts, the
> > best person to review the PR in that component. We could announce it,
> > then new contributors can easily find the right one to review.
> >
> > My 2 cents.
> >
> > Davies
> >
> >
> > On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
> > <vi...@apache.org> wrote:
> >>> With the maintainer model, the process is as follows:
> >>>
> >>> - Any committer could review the patch and merge it, but they would
> need to forward it to me (or another core API maintainer) to make sure we
> also approve
> >>> - At any point during this process, I could come in and -1 it, or give
> feedback
> >>> - In addition, any other committer beyond me is still allowed to -1
> this patch
> >>>
> >>> The only change in this model is that committers are responsible to
> forward patches in these areas to certain other committers. If every
> committer had perfect oversight of the project, they could have also seen
> every patch to their component on their own, but this list ensures that
> they see it even if they somehow overlooked it.
> >>
> >>
> >> Having done the job of playing an informal 'maintainer' of a project
> myself, this is what I think you really need:
> >>
> >> The so called 'maintainers' do one of the below
> >>  - Actively poll the lists and watch over contributions. And follow
> what is repeated often around here: Trust but verify.
> >>  - Setup automated mechanisms to send all bug-tracker updates of a
> specific component to a list that people can subscribe to
> >>
> >> And/or
> >>  - Individual contributors send review requests to unofficial
> 'maintainers' over dev-lists or through tools. Like many projects do with
> review boards and other tools.
> >>
> >> Note that none of the above is a required step. It must not be, that's
> the point. But once set as a convention, they will all help you address
> your concerns with project scalability.
> >>
> >> Anything else that you add is bestowing privileges to a select few and
> forming dictatorships. And contrary to what the proposal claims, this is
> neither scalable nor confirming to Apache governance rules.
> >>
> >> +Vinod
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Davies Liu <da...@databricks.com>.

Sorry for my last email, I misunderstood the proposal here, all the
committer still have equal -1 to all the code changes.

Also, as mentioned in the proposal, the sign off only happens to
public API and architect, something like discussion about code style
things are still the same.

So, I'd revert my vote to +1. Sorry for this.

Davies


On Fri, Nov 7, 2014 at 3:18 PM, Davies Liu <da...@databricks.com> wrote:
> -1 (not binding, +1 for maintainer, -1 for sign off)
>
> Agree with Greg and Vinod. In the beginning, everything is better
> (more efficient, more focus), but after some time, fighting begins.
>
> Code style is the most hot topic to fight (we already saw it in some
> PRs). If two committers (one of them is maintainer) have not got a
> agreement on code style, before this process, they will ask comments
> from other committers, but after this process, the maintainer have
> higher priority to -1, then maintainer will keep his/her personal
> preference, it's hard to make a agreement. Finally, different
> components will have different code style (or others).
>
> Right now, maintainers are kind of first contact or best contacts, the
> best person to review the PR in that component. We could announce it,
> then new contributors can easily find the right one to review.
>
> My 2 cents.
>
> Davies
>
>
> On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
> <vi...@apache.org> wrote:
>>> With the maintainer model, the process is as follows:
>>>
>>> - Any committer could review the patch and merge it, but they would need to forward it to me (or another core API maintainer) to make sure we also approve
>>> - At any point during this process, I could come in and -1 it, or give feedback
>>> - In addition, any other committer beyond me is still allowed to -1 this patch
>>>
>>> The only change in this model is that committers are responsible to forward patches in these areas to certain other committers. If every committer had perfect oversight of the project, they could have also seen every patch to their component on their own, but this list ensures that they see it even if they somehow overlooked it.
>>
>>
>> Having done the job of playing an informal 'maintainer' of a project myself, this is what I think you really need:
>>
>> The so called 'maintainers' do one of the below
>>  - Actively poll the lists and watch over contributions. And follow what is repeated often around here: Trust but verify.
>>  - Setup automated mechanisms to send all bug-tracker updates of a specific component to a list that people can subscribe to
>>
>> And/or
>>  - Individual contributors send review requests to unofficial 'maintainers' over dev-lists or through tools. Like many projects do with review boards and other tools.
>>
>> Note that none of the above is a required step. It must not be, that's the point. But once set as a convention, they will all help you address your concerns with project scalability.
>>
>> Anything else that you add is bestowing privileges to a select few and forming dictatorships. And contrary to what the proposal claims, this is neither scalable nor confirming to Apache governance rules.
>>
>> +Vinod

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Tathagata Das <ta...@gmail.com>.

+1 (binding)

I agree with the proposal that it just formalizes what we have been
doing till now, and will increase the efficiency and focus of the
review process.

To address Davies' concern, I agree coding style is often a hot topic
of contention. But that is just an indication that our processes are
not perfect and we have much room to improve (which is what this
proposal is all about). Regarding the specific case of coding style,
we should all get together, discuss, and make our coding style guide
more comprehensive so that such concerns can be dealt with once and
not be a recurring concern. And that guide will override any one's
personal preference, be it the maintainer or a new committer.

TD


On Fri, Nov 7, 2014 at 3:18 PM, Davies Liu <da...@databricks.com> wrote:
> -1 (not binding, +1 for maintainer, -1 for sign off)
>
> Agree with Greg and Vinod. In the beginning, everything is better
> (more efficient, more focus), but after some time, fighting begins.
>
> Code style is the most hot topic to fight (we already saw it in some
> PRs). If two committers (one of them is maintainer) have not got a
> agreement on code style, before this process, they will ask comments
> from other committers, but after this process, the maintainer have
> higher priority to -1, then maintainer will keep his/her personal
> preference, it's hard to make a agreement. Finally, different
> components will have different code style (or others).
>
> Right now, maintainers are kind of first contact or best contacts, the
> best person to review the PR in that component. We could announce it,
> then new contributors can easily find the right one to review.
>
> My 2 cents.
>
> Davies
>
>
> On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
> <vi...@apache.org> wrote:
>>> With the maintainer model, the process is as follows:
>>>
>>> - Any committer could review the patch and merge it, but they would need to forward it to me (or another core API maintainer) to make sure we also approve
>>> - At any point during this process, I could come in and -1 it, or give feedback
>>> - In addition, any other committer beyond me is still allowed to -1 this patch
>>>
>>> The only change in this model is that committers are responsible to forward patches in these areas to certain other committers. If every committer had perfect oversight of the project, they could have also seen every patch to their component on their own, but this list ensures that they see it even if they somehow overlooked it.
>>
>>
>> Having done the job of playing an informal 'maintainer' of a project myself, this is what I think you really need:
>>
>> The so called 'maintainers' do one of the below
>>  - Actively poll the lists and watch over contributions. And follow what is repeated often around here: Trust but verify.
>>  - Setup automated mechanisms to send all bug-tracker updates of a specific component to a list that people can subscribe to
>>
>> And/or
>>  - Individual contributors send review requests to unofficial 'maintainers' over dev-lists or through tools. Like many projects do with review boards and other tools.
>>
>> Note that none of the above is a required step. It must not be, that's the point. But once set as a convention, they will all help you address your concerns with project scalability.
>>
>> Anything else that you add is bestowing privileges to a select few and forming dictatorships. And contrary to what the proposal claims, this is neither scalable nor confirming to Apache governance rules.
>>
>> +Vinod
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Davies Liu <da...@databricks.com>.

-1 (not binding, +1 for maintainer, -1 for sign off)

Agree with Greg and Vinod. In the beginning, everything is better
(more efficient, more focus), but after some time, fighting begins.

Code style is the most hot topic to fight (we already saw it in some
PRs). If two committers (one of them is maintainer) have not got a
agreement on code style, before this process, they will ask comments
from other committers, but after this process, the maintainer have
higher priority to -1, then maintainer will keep his/her personal
preference, it's hard to make a agreement. Finally, different
components will have different code style (or others).

Right now, maintainers are kind of first contact or best contacts, the
best person to review the PR in that component. We could announce it,
then new contributors can easily find the right one to review.

My 2 cents.

Davies


On Thu, Nov 6, 2014 at 11:43 PM, Vinod Kumar Vavilapalli
<vi...@apache.org> wrote:
>> With the maintainer model, the process is as follows:
>>
>> - Any committer could review the patch and merge it, but they would need to forward it to me (or another core API maintainer) to make sure we also approve
>> - At any point during this process, I could come in and -1 it, or give feedback
>> - In addition, any other committer beyond me is still allowed to -1 this patch
>>
>> The only change in this model is that committers are responsible to forward patches in these areas to certain other committers. If every committer had perfect oversight of the project, they could have also seen every patch to their component on their own, but this list ensures that they see it even if they somehow overlooked it.
>
>
> Having done the job of playing an informal 'maintainer' of a project myself, this is what I think you really need:
>
> The so called 'maintainers' do one of the below
>  - Actively poll the lists and watch over contributions. And follow what is repeated often around here: Trust but verify.
>  - Setup automated mechanisms to send all bug-tracker updates of a specific component to a list that people can subscribe to
>
> And/or
>  - Individual contributors send review requests to unofficial 'maintainers' over dev-lists or through tools. Like many projects do with review boards and other tools.
>
> Note that none of the above is a required step. It must not be, that's the point. But once set as a convention, they will all help you address your concerns with project scalability.
>
> Anything else that you add is bestowing privileges to a select few and forming dictatorships. And contrary to what the proposal claims, this is neither scalable nor confirming to Apache governance rules.
>
> +Vinod

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

> With the maintainer model, the process is as follows:
> 
> - Any committer could review the patch and merge it, but they would need to forward it to me (or another core API maintainer) to make sure we also approve
> - At any point during this process, I could come in and -1 it, or give feedback
> - In addition, any other committer beyond me is still allowed to -1 this patch
> 
> The only change in this model is that committers are responsible to forward patches in these areas to certain other committers. If every committer had perfect oversight of the project, they could have also seen every patch to their component on their own, but this list ensures that they see it even if they somehow overlooked it.


Having done the job of playing an informal 'maintainer' of a project myself, this is what I think you really need:

The so called 'maintainers' do one of the below
 - Actively poll the lists and watch over contributions. And follow what is repeated often around here: Trust but verify.
 - Setup automated mechanisms to send all bug-tracker updates of a specific component to a list that people can subscribe to

And/or
 - Individual contributors send review requests to unofficial 'maintainers' over dev-lists or through tools. Like many projects do with review boards and other tools.

Note that none of the above is a required step. It must not be, that's the point. But once set as a convention, they will all help you address your concerns with project scalability.

Anything else that you add is bestowing privileges to a select few and forming dictatorships. And contrary to what the proposal claims, this is neither scalable nor confirming to Apache governance rules.

+Vinod

Re: [VOTE] Designating maintainers for some Spark components

Posted by Matei Zaharia <ma...@gmail.com>.

Alright, Greg, I think I understand how Subversion's model is different, which is that the PMC members are all full committers. However, I still think that the model proposed here is purely organizational (how the PMC and committers organize themselves), and in no way changes peoples' ownership or rights. Certainly the reason I proposed it was organizational, to make sure patches get seen by the right people. I believe that every PMC member still has the same responsibility for two reasons:

1) The PMC is actually what selects the maintainers, so basically this mechanism is a way for the PMC to make sure certain people review each patch.

2) Code changes are all still made by consensus, where any individual has veto power over the code. The maintainer model mentioned here is only meant to make sure that the "experts" in an area get to see each patch *before* it is merged, and choose whether to exercise their veto power.

Let me give a simple example, which is a patch to the Spark core public API. Say I'm a maintainer in this API. Without the maintainer model, the decision on the patch would be made as follows:

- Any committer could review the patch and merge it
- At any point during this process, I (as the main expert on this) could come in and -1 it, or give feedback
- In addition, any other committer beyond me is allowed to -1 this patch

With the maintainer model, the process is as follows:

- Any committer could review the patch and merge it, but they would need to forward it to me (or another core API maintainer) to make sure we also approve
- At any point during this process, I could come in and -1 it, or give feedback
- In addition, any other committer beyond me is still allowed to -1 this patch

The only change in this model is that committers are responsible to forward patches in these areas to certain other committers. If every committer had perfect oversight of the project, they could have also seen every patch to their component on their own, but this list ensures that they see it even if they somehow overlooked it.

It's true that technically this model might "gate" development in the sense of adding some latency, but it doesn't "gate" it any more than consensus as a whole does, where any committer (not even PMC member) can -1 any code change. In fact I believe this will speed development by motivating the maintainers to be active in reviewing their areas and by reducing the chance that mistakes happen that require a revert.

I apologize if this wasn't clear in any way, but I do think it's pretty clear in the original wording of the proposal. The sign-off by a maintainer is simply an extra step in the merge process, it does *not* mean that other committers can't -1 a patch, or that the maintainers get to review all patches, or that they somehow have more "ownership" of the component (since they already had the ability to -1). I also wanted to clarify another thing -- it seems there is a misunderstanding that only PMC members can be maintainers, but this was not the point; the PMC *assigns* maintainers but they can do it out of the whole committer pool (and if we move to separating the PMC from the committers, I fully expect some non-PMC committers to be made maintainers).

I hope this clarifies where we're coming from, and why we believe that this still conforms fully with the spirit of Apache (collaborative, open development that anyone can participate in, and meritocracy for project governance). There were some comments made about the maintainers being only some kind of list of people without a requirement to review stuff, but as you can see it's the requirement to review that is the main reason I'm proposing this, to ensure we have an automated process for patches to certain components to be seen. If it helps we may be able to change the wording to something like "it is every committer's responsibility to forward patches for a maintained component to that component's maintainer", or something like that, instead of using "sign off". If we don't do this, I'd actually be against any measure that lists some component "maintainers" without them having a specific responsibility. Apache is not a place for people to gain kudos by having fancier titles given on a website, it's a place for building great communities and software.

Matei

> On Nov 6, 2014, at 9:27 PM, Greg Stein <gs...@gmail.com> wrote:
> 
> On Thu, Nov 6, 2014 at 7:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> 
>> It looks like the difference between the proposed Spark model and the
>> CloudStack / SVN model is:
>> * In the former, maintainers / partial committers are a way of
>> centralizing oversight over particular components among committers
>> * In the latter, maintainers / partial committers are a way of giving
>> non-committers some power to make changes
>> 
> 
> I can't speak for CloudStack, but for Subversion: yes, you're exactly
> right, Sandy.
> 
> We use the "partial committer" role as a way to bring in new committers.
> "Great idea, go work >there<, and have fun". Any PMC member can give a
> single +1, and that new (partial) committer gets and account/access, and is
> off and running. We don't even ask for a PMC vote (though, we almost always
> have a brief discussion).
> 
> The "svnrdump" tool was written by a *Git* Google Summer of Code student.
> He wanted a quick way to get a Subversion dumpfile from a remote
> repository, in order to drop that into Git. We gave him commit access
> directly into trunk/svnrdump, and he wrote the tool. Technically, he could
> commit anywhere in our tree, but we just asked him not to, without a +1
> from a PMC member.
> 
> Partial committers are a way to *include* people into the [coding]
> community. And hopefully, over time, they grow into something more.
> 
> "Maintainers" are a way (IMO) to *exclude* people from certain commit
> activity. (or more precisely: limit/restrict, rather than exclude)
> 
> You can see why it concerns me :-)
> 
> Cheers,
> -g

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Greg Stein <gs...@gmail.com>.

On Thu, Nov 6, 2014 at 7:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> It looks like the difference between the proposed Spark model and the
> CloudStack / SVN model is:
> * In the former, maintainers / partial committers are a way of
> centralizing oversight over particular components among committers
> * In the latter, maintainers / partial committers are a way of giving
> non-committers some power to make changes
>

I can't speak for CloudStack, but for Subversion: yes, you're exactly
right, Sandy.

We use the "partial committer" role as a way to bring in new committers.
"Great idea, go work >there<, and have fun". Any PMC member can give a
single +1, and that new (partial) committer gets and account/access, and is
off and running. We don't even ask for a PMC vote (though, we almost always
have a brief discussion).

The "svnrdump" tool was written by a *Git* Google Summer of Code student.
He wanted a quick way to get a Subversion dumpfile from a remote
repository, in order to drop that into Git. We gave him commit access
directly into trunk/svnrdump, and he wrote the tool. Technically, he could
commit anywhere in our tree, but we just asked him not to, without a +1
from a PMC member.

Partial committers are a way to *include* people into the [coding]
community. And hopefully, over time, they grow into something more.

"Maintainers" are a way (IMO) to *exclude* people from certain commit
activity. (or more precisely: limit/restrict, rather than exclude)

You can see why it concerns me :-)

Cheers,
-g

Re: [VOTE] Designating maintainers for some Spark components

Posted by Sandy Ryza <sa...@cloudera.com>.

It looks like the difference between the proposed Spark model and the
CloudStack / SVN model is:
* In the former, maintainers / partial committers are a way of centralizing
oversight over particular components among committers
* In the latter, maintainers / partial committers are a way of giving
non-committers some power to make changes

-Sandy

On Thu, Nov 6, 2014 at 5:17 PM, Corey Nolet <cj...@gmail.com> wrote:

> PMC [1] is responsible for oversight and does not designate partial or full
> committer. There are projects where all committers become PMC and others
> where PMC is reserved for committers with the most merit (and willingness
> to take on the responsibility of project oversight, releases, etc...).
> Community maintains the codebase through committers. Committers to mentor,
> roll in patches, and spread the project throughout other communities.
>
> Adding someone's name to a list as a "maintainer" is not a barrier. With a
> community as large as Spark's, and myself not being a committer on this
> project, I see it as a welcome opportunity to find a mentor in the areas in
> which I'm interested in contributing. We'd expect the list of names to grow
> as more volunteers gain more interest, correct? To me, that seems quite
> contrary to a "barrier".
>
> [1] http://www.apache.org/dev/pmc.html
>
>
> On Thu, Nov 6, 2014 at 7:49 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
> > So I don't understand, Greg, are the partial committers committers, or
> are
> > they not? Spark also has a PMC, but our PMC currently consists of all
> > committers (we decided not to have a differentiation when we left the
> > incubator). I see the Subversion partial committers listed as
> "committers"
> > on https://people.apache.org/committers-by-project.html#subversion, so I
> > assume they are committers. As far as I can see, CloudStack is similar.
> >
> > Matei
> >
> > > On Nov 6, 2014, at 4:43 PM, Greg Stein <gs...@gmail.com> wrote:
> > >
> > > Partial committers are people invited to work on a particular area, and
> > they do not require sign-off to work on that area. They can get a
> sign-off
> > and commit outside that area. That approach doesn't compare to this
> > proposal.
> > >
> > > Full committers are PMC members. As each PMC member is responsible for
> > *every* line of code, then every PMC member should have complete rights
> to
> > every line of code. Creating disparity flies in the face of a PMC
> member's
> > responsibility. If I am a Spark PMC member, then I have responsibility
> for
> > GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
> > interposing a barrier inhibits my responsibility to ensure GraphX is
> > designed, maintained, and delivered to the Public.
> > >
> > > Cheers,
> > > -g
> > >
> > > (and yes, I'm aware of COMMITTERS; I've been changing that file for the
> > past 12 years :-) )
> > >
> > > On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell <pwendell@gmail.com
> > <ma...@gmail.com>> wrote:
> > > In fact, if you look at the subversion commiter list, the majority of
> > > people here have commit access only for particular areas of the
> > > project:
> > >
> > > http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS <
> > http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS>
> > >
> > > On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pwendell@gmail.com
> > <ma...@gmail.com>> wrote:
> > > > Hey Greg,
> > > >
> > > > Regarding subversion - I think the reference is to partial vs full
> > > > committers here:
> > > > https://subversion.apache.org/docs/community-guide/roles.html <
> > https://subversion.apache.org/docs/community-guide/roles.html>
> > > >
> > > > - Patrick
> > > >
> > > > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gstein@gmail.com
> <mailto:
> > gstein@gmail.com>> wrote:
> > > >> -1 (non-binding)
> > > >>
> > > >> This is an idea that runs COMPLETELY counter to the Apache Way, and
> is
> > > >> to be severely frowned up. This creates *unequal* ownership of the
> > > >> codebase.
> > > >>
> > > >> Each Member of the PMC should have *equal* rights to all areas of
> the
> > > >> codebase until their purview. It should not be subjected to others'
> > > >> "ownership" except throught the standard mechanisms of reviews and
> > > >> if/when absolutely necessary, to vetos.
> > > >>
> > > >> Apache does not want "leads", "benevolent dictators" or "assigned
> > > >> maintainers", no matter how you may dress it up with multiple
> > > >> maintainers per component. The fact is that this creates an unequal
> > > >> level of ownership and responsibility. The Board has shut down
> > > >> projects that attempted or allowed for "Leads". Just a few months
> ago,
> > > >> there was a problem with somebody calling themself a "Lead".
> > > >>
> > > >> I don't know why you suggest that Apache Subversion does this. We
> > > >> absolutely do not. Never have. Never will. The Subversion codebase
> is
> > > >> owned by all of us, and we all care for every line of it. Some
> people
> > > >> know more than others, of course. But any one of us, can change any
> > > >> part, without being subjected to a "maintainer". Of course, we ask
> > > >> people with more knowledge of the component when we feel
> > > >> uncomfortable, but we also know when it is safe or not to make a
> > > >> specific change. And *always*, our fellow committers can review our
> > > >> work and let us know when we've done something wrong.
> > > >>
> > > >> Equal ownership reduces fiefdoms, enhances a feeling of community
> and
> > > >> project ownership, and creates a more open and inviting project.
> > > >>
> > > >> So again: -1 on this entire concept. Not good, to be polite.
> > > >>
> > > >> Regards,
> > > >> Greg Stein
> > > >> Director, Vice Chairman
> > > >> Apache Software Foundation
> > > >>
> > > >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> I wanted to share a discussion we've been having on the PMC list,
> as
> > well as call for an official vote on it on a public list. Basically, as
> the
> > Spark project scales up, we need to define a model to make sure there is
> > still great oversight of key components (in particular internal
> > architecture and public APIs), and to this end I've proposed
> implementing a
> > maintainer model for some of these components, similar to other large
> > projects.
> > > >>>
> > > >>> As background on this, Spark has grown a lot since joining Apache.
> > We've had over 80 contributors/month for the past 3 months, which I
> believe
> > makes us the most active project in contributors/month at Apache, as well
> > as over 500 patches/month. The codebase has also grown significantly,
> with
> > new libraries for SQL, ML, graphs and more.
> > > >>>
> > > >>> In this kind of large project, one common way to scale development
> > is to assign "maintainers" to oversee key components, where each patch to
> > that component needs to get sign-off from at least one of its
> maintainers.
> > Most existing large projects do this -- at Apache, some large ones with
> > this model are CloudStack (the second-most active project overall),
> > Subversion, and Kafka, and other examples include Linux and Python. This
> is
> > also by-and-large how Spark operates today -- most components have a
> > de-facto maintainer.
> > > >>>
> > > >>> IMO, adopting this model would have two benefits:
> > > >>>
> > > >>> 1) Consistent oversight of design for that component, especially
> > regarding architecture and API. This process would ensure that the
> > component's maintainers see all proposed changes and consider them to fit
> > together in a good way.
> > > >>>
> > > >>> 2) More structure for new contributors and committers -- in
> > particular, it would be easy to look up who's responsible for each module
> > and ask them for reviews, etc, rather than having patches slip between
> the
> > cracks.
> > > >>>
> > > >>> We'd like to start with in a light-weight manner, where the model
> > only applies to certain key components (e.g. scheduler, shuffle) and
> > user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows,
> we
> > can expand it if we deem it useful. The specific mechanics would be as
> > follows:
> > > >>>
> > > >>> - Some components in Spark will have maintainers assigned to them,
> > where one of the maintainers needs to sign off on each patch to the
> > component.
> > > >>> - Each component with maintainers will have at least 2 maintainers.
> > > >>> - Maintainers will be assigned from the most active and
> > knowledgeable committers on that component by the PMC. The PMC can vote
> to
> > add / remove maintainers, and maintained components, through consensus.
> > > >>> - Maintainers are expected to be active in responding to patches
> for
> > their components, though they do not need to be the main reviewers for
> them
> > (e.g. they might just sign off on architecture / API). To prevent
> inactive
> > maintainers from blocking the project, if a maintainer isn't responding
> in
> > a reasonable time period (say 2 weeks), other committers can merge the
> > patch, and the PMC will want to discuss adding another maintainer.
> > > >>>
> > > >>> If you'd like to see examples for this model, check out the
> > following projects:
> > > >>> - CloudStack:
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > <
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> > <
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > <
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > >>
> > > >>> - Subversion:
> > https://subversion.apache.org/docs/community-guide/roles.html <
> > https://subversion.apache.org/docs/community-guide/roles.html> <
> > https://subversion.apache.org/docs/community-guide/roles.html <
> > https://subversion.apache.org/docs/community-guide/roles.html>>
> > > >>>
> > > >>> Finally, I wanted to list our current proposal for initial
> > components and maintainers. It would be good to get feedback on other
> > components we might add, but please note that personnel discussions (e.g.
> > "I don't think Matei should maintain *that* component) should only happen
> > on the private list. The initial components were chosen to include all
> > public APIs and the main core components, and the maintainers were chosen
> > from the most active contributors to those modules.
> > > >>>
> > > >>> - Spark core public API: Matei, Patrick, Reynold
> > > >>> - Job scheduler: Matei, Kay, Patrick
> > > >>> - Shuffle and network: Reynold, Aaron, Matei
> > > >>> - Block manager: Reynold, Aaron
> > > >>> - YARN: Tom, Andrew Or
> > > >>> - Python: Josh, Matei
> > > >>> - MLlib: Xiangrui, Matei
> > > >>> - SQL: Michael, Reynold
> > > >>> - Streaming: TD, Matei
> > > >>> - GraphX: Ankur, Joey, Reynold
> > > >>>
> > > >>> I'd like to formally call a [VOTE] on this model, to last 72 hours.
> > The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > > >>>
> > > >>> Matei
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <mailto:
> > dev-unsubscribe@spark.apache.org>
> > > >> For additional commands, e-mail: dev-help@spark.apache.org <mailto:
> > dev-help@spark.apache.org>
> > > >>
> > >
> >
> >
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Corey Nolet <cj...@gmail.com>.

PMC [1] is responsible for oversight and does not designate partial or full
committer. There are projects where all committers become PMC and others
where PMC is reserved for committers with the most merit (and willingness
to take on the responsibility of project oversight, releases, etc...).
Community maintains the codebase through committers. Committers to mentor,
roll in patches, and spread the project throughout other communities.

Adding someone's name to a list as a "maintainer" is not a barrier. With a
community as large as Spark's, and myself not being a committer on this
project, I see it as a welcome opportunity to find a mentor in the areas in
which I'm interested in contributing. We'd expect the list of names to grow
as more volunteers gain more interest, correct? To me, that seems quite
contrary to a "barrier".

[1] http://www.apache.org/dev/pmc.html


On Thu, Nov 6, 2014 at 7:49 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> So I don't understand, Greg, are the partial committers committers, or are
> they not? Spark also has a PMC, but our PMC currently consists of all
> committers (we decided not to have a differentiation when we left the
> incubator). I see the Subversion partial committers listed as "committers"
> on https://people.apache.org/committers-by-project.html#subversion, so I
> assume they are committers. As far as I can see, CloudStack is similar.
>
> Matei
>
> > On Nov 6, 2014, at 4:43 PM, Greg Stein <gs...@gmail.com> wrote:
> >
> > Partial committers are people invited to work on a particular area, and
> they do not require sign-off to work on that area. They can get a sign-off
> and commit outside that area. That approach doesn't compare to this
> proposal.
> >
> > Full committers are PMC members. As each PMC member is responsible for
> *every* line of code, then every PMC member should have complete rights to
> every line of code. Creating disparity flies in the face of a PMC member's
> responsibility. If I am a Spark PMC member, then I have responsibility for
> GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
> interposing a barrier inhibits my responsibility to ensure GraphX is
> designed, maintained, and delivered to the Public.
> >
> > Cheers,
> > -g
> >
> > (and yes, I'm aware of COMMITTERS; I've been changing that file for the
> past 12 years :-) )
> >
> > On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell <pwendell@gmail.com
> <ma...@gmail.com>> wrote:
> > In fact, if you look at the subversion commiter list, the majority of
> > people here have commit access only for particular areas of the
> > project:
> >
> > http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS <
> http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS>
> >
> > On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pwendell@gmail.com
> <ma...@gmail.com>> wrote:
> > > Hey Greg,
> > >
> > > Regarding subversion - I think the reference is to partial vs full
> > > committers here:
> > > https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> > >
> > > - Patrick
> > >
> > > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gstein@gmail.com <mailto:
> gstein@gmail.com>> wrote:
> > >> -1 (non-binding)
> > >>
> > >> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> > >> to be severely frowned up. This creates *unequal* ownership of the
> > >> codebase.
> > >>
> > >> Each Member of the PMC should have *equal* rights to all areas of the
> > >> codebase until their purview. It should not be subjected to others'
> > >> "ownership" except throught the standard mechanisms of reviews and
> > >> if/when absolutely necessary, to vetos.
> > >>
> > >> Apache does not want "leads", "benevolent dictators" or "assigned
> > >> maintainers", no matter how you may dress it up with multiple
> > >> maintainers per component. The fact is that this creates an unequal
> > >> level of ownership and responsibility. The Board has shut down
> > >> projects that attempted or allowed for "Leads". Just a few months ago,
> > >> there was a problem with somebody calling themself a "Lead".
> > >>
> > >> I don't know why you suggest that Apache Subversion does this. We
> > >> absolutely do not. Never have. Never will. The Subversion codebase is
> > >> owned by all of us, and we all care for every line of it. Some people
> > >> know more than others, of course. But any one of us, can change any
> > >> part, without being subjected to a "maintainer". Of course, we ask
> > >> people with more knowledge of the component when we feel
> > >> uncomfortable, but we also know when it is safe or not to make a
> > >> specific change. And *always*, our fellow committers can review our
> > >> work and let us know when we've done something wrong.
> > >>
> > >> Equal ownership reduces fiefdoms, enhances a feeling of community and
> > >> project ownership, and creates a more open and inviting project.
> > >>
> > >> So again: -1 on this entire concept. Not good, to be polite.
> > >>
> > >> Regards,
> > >> Greg Stein
> > >> Director, Vice Chairman
> > >> Apache Software Foundation
> > >>
> > >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> > >>> Hi all,
> > >>>
> > >>> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> > >>>
> > >>> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> > >>>
> > >>> In this kind of large project, one common way to scale development
> is to assign "maintainers" to oversee key components, where each patch to
> that component needs to get sign-off from at least one of its maintainers.
> Most existing large projects do this -- at Apache, some large ones with
> this model are CloudStack (the second-most active project overall),
> Subversion, and Kafka, and other examples include Linux and Python. This is
> also by-and-large how Spark operates today -- most components have a
> de-facto maintainer.
> > >>>
> > >>> IMO, adopting this model would have two benefits:
> > >>>
> > >>> 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> > >>>
> > >>> 2) More structure for new contributors and committers -- in
> particular, it would be easy to look up who's responsible for each module
> and ask them for reviews, etc, rather than having patches slip between the
> cracks.
> > >>>
> > >>> We'd like to start with in a light-weight manner, where the model
> only applies to certain key components (e.g. scheduler, shuffle) and
> user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we
> can expand it if we deem it useful. The specific mechanics would be as
> follows:
> > >>>
> > >>> - Some components in Spark will have maintainers assigned to them,
> where one of the maintainers needs to sign off on each patch to the
> component.
> > >>> - Each component with maintainers will have at least 2 maintainers.
> > >>> - Maintainers will be assigned from the most active and
> knowledgeable committers on that component by the PMC. The PMC can vote to
> add / remove maintainers, and maintained components, through consensus.
> > >>> - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> > >>>
> > >>> If you'd like to see examples for this model, check out the
> following projects:
> > >>> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >>
> > >>> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html> <
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>>
> > >>>
> > >>> Finally, I wanted to list our current proposal for initial
> components and maintainers. It would be good to get feedback on other
> components we might add, but please note that personnel discussions (e.g.
> "I don't think Matei should maintain *that* component) should only happen
> on the private list. The initial components were chosen to include all
> public APIs and the main core components, and the maintainers were chosen
> from the most active contributors to those modules.
> > >>>
> > >>> - Spark core public API: Matei, Patrick, Reynold
> > >>> - Job scheduler: Matei, Kay, Patrick
> > >>> - Shuffle and network: Reynold, Aaron, Matei
> > >>> - Block manager: Reynold, Aaron
> > >>> - YARN: Tom, Andrew Or
> > >>> - Python: Josh, Matei
> > >>> - MLlib: Xiangrui, Matei
> > >>> - SQL: Michael, Reynold
> > >>> - Streaming: TD, Matei
> > >>> - GraphX: Ankur, Joey, Reynold
> > >>>
> > >>> I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > >>>
> > >>> Matei
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <mailto:
> dev-unsubscribe@spark.apache.org>
> > >> For additional commands, e-mail: dev-help@spark.apache.org <mailto:
> dev-help@spark.apache.org>
> > >>
> >
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Matei Zaharia <ma...@gmail.com>.

So I don't understand, Greg, are the partial committers committers, or are they not? Spark also has a PMC, but our PMC currently consists of all committers (we decided not to have a differentiation when we left the incubator). I see the Subversion partial committers listed as "committers" on https://people.apache.org/committers-by-project.html#subversion, so I assume they are committers. As far as I can see, CloudStack is similar.

Matei

> On Nov 6, 2014, at 4:43 PM, Greg Stein <gs...@gmail.com> wrote:
> 
> Partial committers are people invited to work on a particular area, and they do not require sign-off to work on that area. They can get a sign-off and commit outside that area. That approach doesn't compare to this proposal.
> 
> Full committers are PMC members. As each PMC member is responsible for *every* line of code, then every PMC member should have complete rights to every line of code. Creating disparity flies in the face of a PMC member's responsibility. If I am a Spark PMC member, then I have responsibility for GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And interposing a barrier inhibits my responsibility to ensure GraphX is designed, maintained, and delivered to the Public.
> 
> Cheers,
> -g
> 
> (and yes, I'm aware of COMMITTERS; I've been changing that file for the past 12 years :-) )
> 
> On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell <pwendell@gmail.com <ma...@gmail.com>> wrote:
> In fact, if you look at the subversion commiter list, the majority of
> people here have commit access only for particular areas of the
> project:
> 
> http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS <http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS>
> 
> On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pwendell@gmail.com <ma...@gmail.com>> wrote:
> > Hey Greg,
> >
> > Regarding subversion - I think the reference is to partial vs full
> > committers here:
> > https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
> >
> > - Patrick
> >
> > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gstein@gmail.com <ma...@gmail.com>> wrote:
> >> -1 (non-binding)
> >>
> >> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> >> to be severely frowned up. This creates *unequal* ownership of the
> >> codebase.
> >>
> >> Each Member of the PMC should have *equal* rights to all areas of the
> >> codebase until their purview. It should not be subjected to others'
> >> "ownership" except throught the standard mechanisms of reviews and
> >> if/when absolutely necessary, to vetos.
> >>
> >> Apache does not want "leads", "benevolent dictators" or "assigned
> >> maintainers", no matter how you may dress it up with multiple
> >> maintainers per component. The fact is that this creates an unequal
> >> level of ownership and responsibility. The Board has shut down
> >> projects that attempted or allowed for "Leads". Just a few months ago,
> >> there was a problem with somebody calling themself a "Lead".
> >>
> >> I don't know why you suggest that Apache Subversion does this. We
> >> absolutely do not. Never have. Never will. The Subversion codebase is
> >> owned by all of us, and we all care for every line of it. Some people
> >> know more than others, of course. But any one of us, can change any
> >> part, without being subjected to a "maintainer". Of course, we ask
> >> people with more knowledge of the component when we feel
> >> uncomfortable, but we also know when it is safe or not to make a
> >> specific change. And *always*, our fellow committers can review our
> >> work and let us know when we've done something wrong.
> >>
> >> Equal ownership reduces fiefdoms, enhances a feeling of community and
> >> project ownership, and creates a more open and inviting project.
> >>
> >> So again: -1 on this entire concept. Not good, to be polite.
> >>
> >> Regards,
> >> Greg Stein
> >> Director, Vice Chairman
> >> Apache Software Foundation
> >>
> >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> >>> Hi all,
> >>>
> >>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> >>>
> >>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> >>>
> >>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> >>>
> >>> IMO, adopting this model would have two benefits:
> >>>
> >>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> >>>
> >>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who's responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> >>>
> >>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> >>>
> >>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> >>> - Each component with maintainers will have at least 2 maintainers.
> >>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> >>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> >>>
> >>> If you'd like to see examples for this model, check out the following projects:
> >>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>>
> >>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html> <https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>>
> >>>
> >>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> >>>
> >>> - Spark core public API: Matei, Patrick, Reynold
> >>> - Job scheduler: Matei, Kay, Patrick
> >>> - Shuffle and network: Reynold, Aaron, Matei
> >>> - Block manager: Reynold, Aaron
> >>> - YARN: Tom, Andrew Or
> >>> - Python: Josh, Matei
> >>> - MLlib: Xiangrui, Matei
> >>> - SQL: Michael, Reynold
> >>> - Streaming: TD, Matei
> >>> - GraphX: Ankur, Joey, Reynold
> >>>
> >>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >>>
> >>> Matei
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> >> For additional commands, e-mail: dev-help@spark.apache.org <ma...@spark.apache.org>
> >>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Greg Stein <gs...@gmail.com>.

Partial committers are people invited to work on a particular area, and
they do not require sign-off to work on that area. They can get a sign-off
and commit outside that area. That approach doesn't compare to this
proposal.

Full committers are PMC members. As each PMC member is responsible for
*every* line of code, then every PMC member should have complete rights to
every line of code. Creating disparity flies in the face of a PMC member's
responsibility. If I am a Spark PMC member, then I have responsibility for
GraphX code, whether my name is Ankur, Joey, Reynold, or Greg. And
interposing a barrier inhibits my responsibility to ensure GraphX is
designed, maintained, and delivered to the Public.

Cheers,
-g

(and yes, I'm aware of COMMITTERS; I've been changing that file for the
past 12 years :-) )

On Thu, Nov 6, 2014 at 6:28 PM, Patrick Wendell <pw...@gmail.com> wrote:

> In fact, if you look at the subversion commiter list, the majority of
> people here have commit access only for particular areas of the
> project:
>
> http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS
>
> On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > Hey Greg,
> >
> > Regarding subversion - I think the reference is to partial vs full
> > committers here:
> > https://subversion.apache.org/docs/community-guide/roles.html
> >
> > - Patrick
> >
> > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gs...@gmail.com> wrote:
> >> -1 (non-binding)
> >>
> >> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> >> to be severely frowned up. This creates *unequal* ownership of the
> >> codebase.
> >>
> >> Each Member of the PMC should have *equal* rights to all areas of the
> >> codebase until their purview. It should not be subjected to others'
> >> "ownership" except throught the standard mechanisms of reviews and
> >> if/when absolutely necessary, to vetos.
> >>
> >> Apache does not want "leads", "benevolent dictators" or "assigned
> >> maintainers", no matter how you may dress it up with multiple
> >> maintainers per component. The fact is that this creates an unequal
> >> level of ownership and responsibility. The Board has shut down
> >> projects that attempted or allowed for "Leads". Just a few months ago,
> >> there was a problem with somebody calling themself a "Lead".
> >>
> >> I don't know why you suggest that Apache Subversion does this. We
> >> absolutely do not. Never have. Never will. The Subversion codebase is
> >> owned by all of us, and we all care for every line of it. Some people
> >> know more than others, of course. But any one of us, can change any
> >> part, without being subjected to a "maintainer". Of course, we ask
> >> people with more knowledge of the component when we feel
> >> uncomfortable, but we also know when it is safe or not to make a
> >> specific change. And *always*, our fellow committers can review our
> >> work and let us know when we've done something wrong.
> >>
> >> Equal ownership reduces fiefdoms, enhances a feeling of community and
> >> project ownership, and creates a more open and inviting project.
> >>
> >> So again: -1 on this entire concept. Not good, to be polite.
> >>
> >> Regards,
> >> Greg Stein
> >> Director, Vice Chairman
> >> Apache Software Foundation
> >>
> >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> >>> Hi all,
> >>>
> >>> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >>>
> >>> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> >>>
> >>> In this kind of large project, one common way to scale development is
> to assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >>>
> >>> IMO, adopting this model would have two benefits:
> >>>
> >>> 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >>>
> >>> 2) More structure for new contributors and committers -- in
> particular, it would be easy to look up who's responsible for each module
> and ask them for reviews, etc, rather than having patches slip between the
> cracks.
> >>>
> >>> We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >>>
> >>> - Some components in Spark will have maintainers assigned to them,
> where one of the maintainers needs to sign off on each patch to the
> component.
> >>> - Each component with maintainers will have at least 2 maintainers.
> >>> - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> >>> - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >>>
> >>> If you'd like to see examples for this model, check out the following
> projects:
> >>> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> >>> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >>>
> >>> Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >>>
> >>> - Spark core public API: Matei, Patrick, Reynold
> >>> - Job scheduler: Matei, Kay, Patrick
> >>> - Shuffle and network: Reynold, Aaron, Matei
> >>> - Block manager: Reynold, Aaron
> >>> - YARN: Tom, Andrew Or
> >>> - Python: Josh, Matei
> >>> - MLlib: Xiangrui, Matei
> >>> - SQL: Michael, Reynold
> >>> - Streaming: TD, Matei
> >>> - GraphX: Ankur, Joey, Reynold
> >>>
> >>> I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >>>
> >>> Matei
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Greg Stein <gs...@gmail.com>.

[ I'm going to try and pull a couple thread directions into this one, to
avoid explosion :-) ]

On Thu, Nov 6, 2014 at 6:44 PM, Corey Nolet <cj...@gmail.com> wrote:

Note: I'm going to use "you" generically; I understand you [Corey] are not
a PMC member, at this time.

+1 (non-binding) [for original process proposal]
>
> Greg, the first time I've seen the word "ownership" on this thread is in
> your message. The first time the word "lead" has appeared in this thread is
> in your message as well. I don't think that was the intent. The PMC and
> Committers have a
>

The word "ownership" is there, but with a different term. If you are a PMC
member, and *cannot* alter a line of code without another's consent, then
you don't "own" that code. Your ownership is subservient to another. You
are not a *peer*, but a second-class citizen at this point.

The term "maintainer" in this context is being used as a word for "lead".
The maintainers are a *gate* for any change. That is not consensus. The
proposal attempts to soften that, and turn it into an oligarchy of several
maintainers. But the simple fact is that you have "some" with the ability
to set direction, and those who do not. They are called "leaders" in most
contexts, but however you want to slice it... the dynamic creates people
with unequal commit ability.

But as the PMC member you *are* responsible for it. That is the very basic
definition of being a PMC member. You are responsible for "all things
Spark".

responsibility to the community to make sure that their patches are being
> reviewed and committed. I don't see in Apache's recommended bylaws anywhere
> that says establishing responsibility on paper for specific areas cannot be
> taken on by different members of the PMC. What's been proposed looks, to
> me, to be an empirical process and it looks like it has pretty much a
> consensus from the side able to give binding votes. I don't at all this
> model establishes any form of ownership over anything. I also don't see in
> the process proposal where it mentions that nobody other than the persons
> responsible for a module can review or commit code.
>

"where each patch to that component needs to get sign-off from at least one
of its maintainers"

That establishes two types of PMC members: those who require sign-off, and
those who don't. Apache is intended to be a group of peers, none "more
equal" than others.

That said, we *do* recognize various levels of merit. This is where you see
differences between committers, their range of access, and PMC members. But
when you hit the *PMC member* role, then you are talking about a legal
construct established by the Foundation. You move outside of community
norms, and into how the umbrella of the Foundation operates. PMC members
are individually responsible for all of the code under their purview, which
is then at the direction of the Foundation itself. I'll skip that
conversation, and leave it with the simple phrase: as a PMC member, you're
responsible for the whole codebase.

So following from that, anything that *restricts* your ability to work on
that code, is a problem.

In fact, I'll go as far as to say that since Apache is a meritocracy, the
> people who have been aligned to the responsibilities probably were aligned
> based on some sort of meric, correct? Perhaps we could dig in and find out
> for sure... I'm still getting familiar with the Spark community myself.
>

Once you are a PMC member, then there is no difference in your merit. Merit
ends. You're a PMC member, and that is all there is to it. Just because
Jane commits 1000 times per month, makes her no better than John who
commits 10/month. They are peers on the PMC and have equal rights and
responsibility to the codebase.

Historically, some PMCs have attempted to create variant levels within the
PMC, or create different groups and rights, or different partitions over
the code, and ... again, historically: it has failed. This is why Apache
stresses consensus. The failure modes are crazy and numerous when moving
away from that, into silos.

>...
On Thu, Nov 6, 2014 at 6:49 PM, Matei Zaharia <ma...@gmail.com>
 wrote:

> So I don't understand, Greg, are the partial committers committers, or are
> they not? Spark also has a PMC, but our PMC currently consists of all
> committers (we decided not to have a differentiation when we left the
> incubator). I see the Subversion partial committers listed as "committers"
> on https://people.apache.org/committers-by-project.html#subversion, so I
> assume they are committers. As far as I can see, CloudStack is similar.
>

PMC members are responsible for the code. They provide the oversight,
direction, and management. (they're also responsible for the community, but
that distinction isn't relevant in this contrasting example)

Committers can make changes to the code, with the
acknowledgement/agreement/direction of the PMC.

When these groups are equal, like Spark, then things are pretty simple.

But many communities in Apache define them as disparate. Committers may
work on the code (a single area, or all of it), but don't have any direct
input into its direction (ie. they're not on the PMC).

Within Subversion, we give people commit rights to areas, and let them go
wild. But they aren't part of the *whole* project's direction. Maybe just
the SWIG bindings, or a migration tool, or a supplemental administrative
tool. These are "partial committers", in Subversion's parlance. PMC
members, on the other hand, are known historically as "full committers" and
have rights over the whole codebase. There are no partitions. There are no
component maintainers.

Many projects at Apache provide whole-project commit access to people, but
don't give them PMC rights. I find that strange, but it is quite common.
The trust level is "code" rather than "direction". Subversion trusts people
to limited areas, or to the whole project, and (thus) to the project's
direction.

...

Within this context of those who are responsible and involved with the
project, I find it very disconcerting to partition things and tell people
"you cannot make any change [even though you're on the PMC, and responsible
for it] unless John says it is okay."

Historically, Apache allowed fine-grained access control lists over
who-could-change-what. Most of those have been removed, as we learned how
dangerous they were to a community. How they set up cliques, and leads, and
killed the peer relationships. As we've reviewed the relationships and
oversight needs, to create a proper legal umbrella for all of our
committers, the relationship of PMC members to its codebase has become much
more clear.

Unfortunately, much of this is historical knowledge, rather than written
down. But writing it down makes it sounds like "rules", and proscriptions
just never seem to work out right. It is a very hard problem, to share what
works (or not) across the communities here at the Foundation.

Cheers,
-g

Re: [VOTE] Designating maintainers for some Spark components

Posted by Corey Nolet <cj...@gmail.com>.

+1 (non-binding) [for original process proposal]

Greg, the first time I've seen the word "ownership" on this thread is in
your message. The first time the word "lead" has appeared in this thread is
in your message as well. I don't think that was the intent. The PMC and
Committers have a responsibility to the community to make sure that their
patches are being reviewed and committed. I don't see in Apache's
recommended bylaws anywhere that says establishing responsibility on paper
for specific areas cannot be taken on by different members of the PMC.
What's been proposed looks, to me, to be an empirical process and it looks
like it has pretty much a consensus from the side able to give binding
votes. I don't at all this model establishes any form of ownership over
anything. I also don't see in the process proposal where it mentions that
nobody other than the persons responsible for a module can review or commit
code.

In fact, I'll go as far as to say that since Apache is a meritocracy, the
people who have been aligned to the responsibilities probably were aligned
based on some sort of meric, correct? Perhaps we could dig in and find out
for sure... I'm still getting familiar with the Spark community myself.



On Thu, Nov 6, 2014 at 7:28 PM, Patrick Wendell <pw...@gmail.com> wrote:

> In fact, if you look at the subversion commiter list, the majority of
> people here have commit access only for particular areas of the
> project:
>
> http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS
>
> On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > Hey Greg,
> >
> > Regarding subversion - I think the reference is to partial vs full
> > committers here:
> > https://subversion.apache.org/docs/community-guide/roles.html
> >
> > - Patrick
> >
> > On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gs...@gmail.com> wrote:
> >> -1 (non-binding)
> >>
> >> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> >> to be severely frowned up. This creates *unequal* ownership of the
> >> codebase.
> >>
> >> Each Member of the PMC should have *equal* rights to all areas of the
> >> codebase until their purview. It should not be subjected to others'
> >> "ownership" except throught the standard mechanisms of reviews and
> >> if/when absolutely necessary, to vetos.
> >>
> >> Apache does not want "leads", "benevolent dictators" or "assigned
> >> maintainers", no matter how you may dress it up with multiple
> >> maintainers per component. The fact is that this creates an unequal
> >> level of ownership and responsibility. The Board has shut down
> >> projects that attempted or allowed for "Leads". Just a few months ago,
> >> there was a problem with somebody calling themself a "Lead".
> >>
> >> I don't know why you suggest that Apache Subversion does this. We
> >> absolutely do not. Never have. Never will. The Subversion codebase is
> >> owned by all of us, and we all care for every line of it. Some people
> >> know more than others, of course. But any one of us, can change any
> >> part, without being subjected to a "maintainer". Of course, we ask
> >> people with more knowledge of the component when we feel
> >> uncomfortable, but we also know when it is safe or not to make a
> >> specific change. And *always*, our fellow committers can review our
> >> work and let us know when we've done something wrong.
> >>
> >> Equal ownership reduces fiefdoms, enhances a feeling of community and
> >> project ownership, and creates a more open and inviting project.
> >>
> >> So again: -1 on this entire concept. Not good, to be polite.
> >>
> >> Regards,
> >> Greg Stein
> >> Director, Vice Chairman
> >> Apache Software Foundation
> >>
> >> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> >>> Hi all,
> >>>
> >>> I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >>>
> >>> As background on this, Spark has grown a lot since joining Apache.
> We've had over 80 contributors/month for the past 3 months, which I believe
> makes us the most active project in contributors/month at Apache, as well
> as over 500 patches/month. The codebase has also grown significantly, with
> new libraries for SQL, ML, graphs and more.
> >>>
> >>> In this kind of large project, one common way to scale development is
> to assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >>>
> >>> IMO, adopting this model would have two benefits:
> >>>
> >>> 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >>>
> >>> 2) More structure for new contributors and committers -- in
> particular, it would be easy to look up who's responsible for each module
> and ask them for reviews, etc, rather than having patches slip between the
> cracks.
> >>>
> >>> We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >>>
> >>> - Some components in Spark will have maintainers assigned to them,
> where one of the maintainers needs to sign off on each patch to the
> component.
> >>> - Each component with maintainers will have at least 2 maintainers.
> >>> - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> >>> - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >>>
> >>> If you'd like to see examples for this model, check out the following
> projects:
> >>> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> >>> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >>>
> >>> Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >>>
> >>> - Spark core public API: Matei, Patrick, Reynold
> >>> - Job scheduler: Matei, Kay, Patrick
> >>> - Shuffle and network: Reynold, Aaron, Matei
> >>> - Block manager: Reynold, Aaron
> >>> - YARN: Tom, Andrew Or
> >>> - Python: Josh, Matei
> >>> - MLlib: Xiangrui, Matei
> >>> - SQL: Michael, Reynold
> >>> - Streaming: TD, Matei
> >>> - GraphX: Ankur, Joey, Reynold
> >>>
> >>> I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >>>
> >>> Matei
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Patrick Wendell <pw...@gmail.com>.

In fact, if you look at the subversion commiter list, the majority of
people here have commit access only for particular areas of the
project:

http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS

On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell <pw...@gmail.com> wrote:
> Hey Greg,
>
> Regarding subversion - I think the reference is to partial vs full
> committers here:
> https://subversion.apache.org/docs/community-guide/roles.html
>
> - Patrick
>
> On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gs...@gmail.com> wrote:
>> -1 (non-binding)
>>
>> This is an idea that runs COMPLETELY counter to the Apache Way, and is
>> to be severely frowned up. This creates *unequal* ownership of the
>> codebase.
>>
>> Each Member of the PMC should have *equal* rights to all areas of the
>> codebase until their purview. It should not be subjected to others'
>> "ownership" except throught the standard mechanisms of reviews and
>> if/when absolutely necessary, to vetos.
>>
>> Apache does not want "leads", "benevolent dictators" or "assigned
>> maintainers", no matter how you may dress it up with multiple
>> maintainers per component. The fact is that this creates an unequal
>> level of ownership and responsibility. The Board has shut down
>> projects that attempted or allowed for "Leads". Just a few months ago,
>> there was a problem with somebody calling themself a "Lead".
>>
>> I don't know why you suggest that Apache Subversion does this. We
>> absolutely do not. Never have. Never will. The Subversion codebase is
>> owned by all of us, and we all care for every line of it. Some people
>> know more than others, of course. But any one of us, can change any
>> part, without being subjected to a "maintainer". Of course, we ask
>> people with more knowledge of the component when we feel
>> uncomfortable, but we also know when it is safe or not to make a
>> specific change. And *always*, our fellow committers can review our
>> work and let us know when we've done something wrong.
>>
>> Equal ownership reduces fiefdoms, enhances a feeling of community and
>> project ownership, and creates a more open and inviting project.
>>
>> So again: -1 on this entire concept. Not good, to be polite.
>>
>> Regards,
>> Greg Stein
>> Director, Vice Chairman
>> Apache Software Foundation
>>
>> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
>>> Hi all,
>>>
>>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>>>
>>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>>>
>>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>>>
>>> IMO, adopting this model would have two benefits:
>>>
>>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>>>
>>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who's responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>>>
>>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>>>
>>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>>> - Each component with maintainers will have at least 2 maintainers.
>>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>>>
>>> If you'd like to see examples for this model, check out the following projects:
>>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
>>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
>>>
>>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>>>
>>> - Spark core public API: Matei, Patrick, Reynold
>>> - Job scheduler: Matei, Kay, Patrick
>>> - Shuffle and network: Reynold, Aaron, Matei
>>> - Block manager: Reynold, Aaron
>>> - YARN: Tom, Andrew Or
>>> - Python: Josh, Matei
>>> - MLlib: Xiangrui, Matei
>>> - SQL: Michael, Reynold
>>> - Streaming: TD, Matei
>>> - GraphX: Ankur, Joey, Reynold
>>>
>>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>>>
>>> Matei
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Patrick Wendell <pw...@gmail.com>.

Hey Greg,

Regarding subversion - I think the reference is to partial vs full
committers here:
https://subversion.apache.org/docs/community-guide/roles.html

- Patrick

On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein <gs...@gmail.com> wrote:
> -1 (non-binding)
>
> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> to be severely frowned up. This creates *unequal* ownership of the
> codebase.
>
> Each Member of the PMC should have *equal* rights to all areas of the
> codebase until their purview. It should not be subjected to others'
> "ownership" except throught the standard mechanisms of reviews and
> if/when absolutely necessary, to vetos.
>
> Apache does not want "leads", "benevolent dictators" or "assigned
> maintainers", no matter how you may dress it up with multiple
> maintainers per component. The fact is that this creates an unequal
> level of ownership and responsibility. The Board has shut down
> projects that attempted or allowed for "Leads". Just a few months ago,
> there was a problem with somebody calling themself a "Lead".
>
> I don't know why you suggest that Apache Subversion does this. We
> absolutely do not. Never have. Never will. The Subversion codebase is
> owned by all of us, and we all care for every line of it. Some people
> know more than others, of course. But any one of us, can change any
> part, without being subjected to a "maintainer". Of course, we ask
> people with more knowledge of the component when we feel
> uncomfortable, but we also know when it is safe or not to make a
> specific change. And *always*, our fellow committers can review our
> work and let us know when we've done something wrong.
>
> Equal ownership reduces fiefdoms, enhances a feeling of community and
> project ownership, and creates a more open and inviting project.
>
> So again: -1 on this entire concept. Not good, to be polite.
>
> Regards,
> Greg Stein
> Director, Vice Chairman
> Apache Software Foundation
>
> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
>> Hi all,
>>
>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>>
>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>>
>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>>
>> IMO, adopting this model would have two benefits:
>>
>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>>
>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who's responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>>
>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>>
>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>>
>> If you'd like to see examples for this model, check out the following projects:
>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
>>
>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>>
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>>
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>>
>> Matei
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Greg Stein <gs...@gmail.com>.

-1 (non-binding)

This is an idea that runs COMPLETELY counter to the Apache Way, and is
to be severely frowned up. This creates *unequal* ownership of the
codebase.

Each Member of the PMC should have *equal* rights to all areas of the
codebase until their purview. It should not be subjected to others'
"ownership" except throught the standard mechanisms of reviews and
if/when absolutely necessary, to vetos.

Apache does not want "leads", "benevolent dictators" or "assigned
maintainers", no matter how you may dress it up with multiple
maintainers per component. The fact is that this creates an unequal
level of ownership and responsibility. The Board has shut down
projects that attempted or allowed for "Leads". Just a few months ago,
there was a problem with somebody calling themself a "Lead".

I don't know why you suggest that Apache Subversion does this. We
absolutely do not. Never have. Never will. The Subversion codebase is
owned by all of us, and we all care for every line of it. Some people
know more than others, of course. But any one of us, can change any
part, without being subjected to a "maintainer". Of course, we ask
people with more knowledge of the component when we feel
uncomfortable, but we also know when it is safe or not to make a
specific change. And *always*, our fellow committers can review our
work and let us know when we've done something wrong.

Equal ownership reduces fiefdoms, enhances a feeling of community and
project ownership, and creates a more open and inviting project.

So again: -1 on this entire concept. Not good, to be polite.

Regards,
Greg Stein
Director, Vice Chairman
Apache Software Foundation

On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
> Hi all,
> 
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> 
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> 
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> 
> IMO, adopting this model would have two benefits:
> 
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> 
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> 
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> 
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> 
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> 
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
> 
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> 
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
> 
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> 
> Matei

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by jackylk <ja...@huawei.com>.

+1 Great idea!



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Designating-maintainers-for-some-Spark-components-tp9115p9142.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Matei Zaharia <ma...@gmail.com>.

Hi Tim,

We can definitely add one for that if the component grows larger or becomes harder to maintain. The main reason I didn't propose one is that the Mesos integration is actually a lot simpler than YARN at the moment, partly because we support several YARN versions that have incompatible APIs. But so far our modus operandi has been to ask Mesos contributors to review patches that touch it.

We didn't want to add a lot of components at the beginning partly to minimize overhead, but we can revisit it later. It would definitely be bad if we break Mesos support.

Matei

> On Nov 5, 2014, at 5:35 PM, Timothy Chen <tn...@gmail.com> wrote:
> 
> Hi Matei,
> 
> Definitely in favor of moving into this model for exactly the reasons
> you mentioned.
> 
> From the module list though, the module that I'm mostly involved with
> and is not listed is the Mesos integration piece.
> 
> I believe we also need a maintainer for Mesos, and I wonder if there
> is someone that can be added to that?
> 
> Tim
> 
> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com> wrote:
>> Hi all,
>> 
>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>> 
>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>> 
>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>> 
>> IMO, adopting this model would have two benefits:
>> 
>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>> 
>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>> 
>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>> 
>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>> 
>> If you'd like to see examples for this model, check out the following projects:
>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
>> 
>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>> 
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>> 
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> 
>> Matei


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Timothy Chen <tn...@gmail.com>.

Hi Matei,

Definitely in favor of moving into this model for exactly the reasons
you mentioned.

>From the module list though, the module that I'm mostly involved with
and is not listed is the Mesos integration piece.

I believe we also need a maintainer for Mesos, and I wonder if there
is someone that can be added to that?

Tim

On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com> wrote:
> Hi all,
>
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>
> IMO, adopting this model would have two benefits:
>
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
>
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
>
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>
> Matei

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Prashant Sharma <sc...@gmail.com>.

+1, Sounds good.

Now I know whom to ping for what, even if I did not follow the whole
history of the project very carefully.

Prashant Sharma



On Thu, Nov 6, 2014 at 7:01 AM, Matei Zaharia <ma...@gmail.com>
wrote:

> Hi all,
>
> I wanted to share a discussion we've been having on the PMC list, as well
> as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
>
> As background on this, Spark has grown a lot since joining Apache. We've
> had over 80 contributors/month for the past 3 months, which I believe makes
> us the most active project in contributors/month at Apache, as well as over
> 500 patches/month. The codebase has also grown significantly, with new
> libraries for SQL, ML, graphs and more.
>
> In this kind of large project, one common way to scale development is to
> assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
>
> IMO, adopting this model would have two benefits:
>
> 1) Consistent oversight of design for that component, especially regarding
> architecture and API. This process would ensure that the component's
> maintainers see all proposed changes and consider them to fit together in a
> good way.
>
> 2) More structure for new contributors and committers -- in particular, it
> would be easy to look up who’s responsible for each module and ask them for
> reviews, etc, rather than having patches slip between the cracks.
>
> We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
>
> - Some components in Spark will have maintainers assigned to them, where
> one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their
> components, though they do not need to be the main reviewers for them (e.g.
> they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
>
> If you'd like to see examples for this model, check out the following
> projects:
> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
>
> Finally, I wanted to list our current proposal for initial components and
> maintainers. It would be good to get feedback on other components we might
> add, but please note that personnel discussions (e.g. "I don't think Matei
> should maintain *that* component) should only happen on the private list.
> The initial components were chosen to include all public APIs and the main
> core components, and the maintainers were chosen from the most active
> contributors to those modules.
>
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
>
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>
> Matei

Re: [VOTE] Designating maintainers for some Spark components

Posted by RJ Nowling <rn...@gmail.com>.

Matei,

I saw that you're listed as a maintainer for ~6 different subcomponents,
and on over half of those, you're only the 2nd person.  My concern is that
you would be stretched thin and maybe wouldn't be able to work as a "back
up" on all of those subcomponents.  Are you planning on adding more
maintainers for each subcomponent?  I think it would be good to have 2
regulars + backups for each.

RJ

On Thu, Nov 6, 2014 at 8:48 AM, Jason Dai <ja...@gmail.com> wrote:

> +1 (binding)
>
> On Thu, Nov 6, 2014 at 4:02 PM, Ankur Dave <an...@gmail.com> wrote:
>
> > +1 (binding)
> >
> > Ankur <http://www.ankurdave.com/>
> >
> > On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> > wrote:
> >
> > > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> > > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > >
> >
>

-- 
em rnowling@gmail.com
c 954.496.2314

Re: [VOTE] Designating maintainers for some Spark components

Posted by Jason Dai <ja...@gmail.com>.

+1 (binding)

On Thu, Nov 6, 2014 at 4:02 PM, Ankur Dave <an...@gmail.com> wrote:

> +1 (binding)
>
> Ankur <http://www.ankurdave.com/>
>
> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Kushal Datta <ku...@gmail.com>.

+1 (binding)

For tickets which span across multiple components, will it need to be
approved by all maintainers? For example, I'm working on the Python
bindings of GraphX where code is added to both Python and GraphX modules.

Thanks,
-Kushal.

On Thu, Nov 6, 2014 at 12:02 AM, Ankur Dave <an...@gmail.com> wrote:

> +1 (binding)
>
> Ankur <http://www.ankurdave.com/>
>
> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Ankur Dave <an...@gmail.com>.

+1 (binding)

Ankur <http://www.ankurdave.com/>

On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Hari Shreedharan <hs...@cloudera.com>.

In Cloudstack, I believe one becomes a maintainer first for a subset of modules, before he/she becomes a proven maintainter who has commit rights on the entire source tree. 




So would it make sense to go that route, and have committers voted in as maintainers for certain parts of the codebase and then eventually become proven maintainers (though this might have be honor code based, since I don’t think git allows per module commit rights).


Thanks,
Hari

On Thu, Nov 6, 2014 at 3:45 PM, Patrick Wendell <pw...@gmail.com>
wrote:

> I think new committers might or might not be maintainers (it would
> depend on the PMC vote). I don't think it would affect what you could
> merge, you can merge in any part of the source tree, you just need to
> get sign off if you want to touch a public API or make major
> architectural changes. Most projects already require code review from
> other committers before you commit something, so it's just a version
> of that where you have specific people appointed to specific
> components for review.
> If you look, most large software projects have a maintainer model,
> both in Apache and outside of it. Cloudstack is probably the best
> example in Apache since they are the second most active project
> (roughly) after Spark. They have two levels of maintainers and much
> strong language - their language: "In general, maintainers only have
> commit rights on the module for which they are responsible.".
> I'd like us to start with something simpler and lightweight as
> proposed here. Really the proposal on the table is just to codify the
> current de-facto process to make sure we stick by it as we scale. If
> we want to add more formality to it or strictness, we can do it later.
> - Patrick
> On Thu, Nov 6, 2014 at 3:29 PM, Hari Shreedharan
> <hs...@cloudera.com> wrote:
>> How would this model work with a new committer who gets voted in? Does it mean that a new committer would be a maintainer for at least one area -- else we could end up having committers who really can't merge anything significant until he becomes a maintainer.
>>
>>
>> Thanks,
>> Hari
>>
>> On Thu, Nov 6, 2014 at 3:00 PM, Matei Zaharia <ma...@gmail.com>
>> wrote:
>>
>>> I think you're misunderstanding the idea of "process" here. The point of process is to make sure something happens automatically, which is useful to ensure a certain level of quality. For example, all our patches go through Jenkins, and nobody will make the mistake of merging them if they fail tests, or RAT checks, or API compatibility checks. The idea is to get the same kind of automation for design on these components. This is a very common process for large software projects, and it's essentially what we had already, but formalizing it will make clear that this is the process we want. It's important to do it early in order to be able to refine the process as the project grows.
>>> In terms of scope, again, the maintainers are *not* going to be the only reviewers for that component, they are just a second level of sign-off required for architecture and API. Being a maintainer is also not a "promotion", it's a responsibility. Since we don't have much experience yet with this model, I didn't propose automatic rules beyond that the PMC can add / remove maintainers -- presumably the PMC is in the best position to know what the project needs. I think automatic rules are exactly the kind of "process" you're arguing against. The "process" here is about ensuring certain checks are made for every code change, not about automating personnel and development decisions.
>>> In any case, I appreciate your input on this, and we're going to evaluate the model to see how it goes. It might be that we decide we don't want it at all. However, from what I've seen of other projects (not Hadoop but projects with an order of magnitude more contributors, like Python or Linux), this is one of the best ways to have consistently great releases with a large contributor base and little room for error. With all due respect to what Hadoop's accomplished, I wouldn't use Hadoop as the best example to strive for; in my experience there I've seen patches reverted because of architectural disagreements, new APIs released and abandoned, and generally an experience that's been painful for users. A lot of the decisions we've made in Spark (e.g. time-based release cycle, built-in libraries, API stability rules, etc) were based on lessons learned there, in an attempt to define a better model.
>>> Matei
>>>> On Nov 6, 2014, at 2:18 PM, bc Wong <bc...@cloudera.com> wrote:
>>>>
>>>> On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>>>> <snip>
>>>> Ultimately, the core motivation is that the project has grown to the point where it's hard to expect every committer to have full understanding of every component. Some committers know a ton about systems but little about machine learning, some are algorithmic whizzes but may not realize the implications of changing something on the Python API, etc. This is just a way to make sure that a domain expert has looked at the areas where it is most likely for something to go wrong.
>>>>
>>>> Hi Matei,
>>>>
>>>> I understand where you're coming from. My suggestion is to solve this without adding a new process. In the example above, those "algo whizzes" committers should realize that they're touching the Python API, and loop in some Python maintainers. Those Python maintainers would then respond and help move the PR along. This is good hygiene and should already be happening. For example, HDFS committers have commit rights to all of Hadoop. But none of them would check in YARN code without getting agreement from the YARN folks.
>>>>
>>>> I think the majority of the effort here will be education and building the convention. We have to ask committers to watch out for API changes, know their own limits, and involve the component domain experts. We need that anyways, which btw also seems to solve the problem. It's not clear what the new process would add.
>>>>
>>>> It'd be good to know the details, too. What are the exact criteria for a committer to get promoted to be a maintainer? How often does the PMC re-evaluate the list of maintainers? Is there an upper bound on the number of maintainers for a component? Can we have an automatic rule for a maintainer promotion after X patches or Y lines of code in that area?
>>>>
>>>> Cheers,
>>>> bc
>>>>
>>>>> On Nov 6, 2014, at 10:53 AM, bc Wong <bcwalrus@cloudera.com <ma...@cloudera.com>> wrote:
>>>>>
>>>>> Hi Matei,
>>>>>
>>>>> Good call on scaling the project itself. Identifying domain experts in different areas is a good thing. But I have some questions about the implementation. Here's my understanding of the proposal:
>>>>>
>>>>> (1) The PMC votes on a list of components and their maintainers. Changes to that list requires PMC approval.
>>>>> (2) No committer shall commit changes to a component without a +1 from a maintainer of that component.
>>>>>
>>>>> I see good reasons for #1, to help people navigate the project and identify expertise. For #2, I'd like to understand what problem it's trying to solve. Do we have rogue committers committing to areas that they don't know much about? If that's the case, we should address it directly, instead of adding new processes.
>>>>>
>>>>> To point out the obvious, it completely changes what "committers" means in Spark. Do we have clear promotion criteria from "committer" to "maintainer"? Is there a max number of maintainers per area Currently, as committers gains expertise in new areas, they could start reviewing code in those areas and give +1. This encourages more contributions and cross-component knowledge sharing. Under the new proposal, they now have to be promoted to "maintainers" first. That reduces our review bandwidth.
>>>>>
>>>>> Again, if there is a quality issue with code reviews, let's talk to those committers and help them do better. There are non-process ways to solve the problem.
>>>>>
>>>>> So I think we shouldn't require "maintainer +1". I do like the idea of having explicit maintainers on a volunteer basis. These maintainers should watch their jira and PR traffic, and be very active in design & API discussions. That leads to better consistency and long-term design choices.
>>>>>
>>>>> Cheers,
>>>>> bc
>>>>>
>>>>> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>>>>> Hi all,
>>>>>
>>>>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>>>>>
>>>>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>>>>>
>>>>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>>>>>
>>>>> IMO, adopting this model would have two benefits:
>>>>>
>>>>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>>>>>
>>>>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who's responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>>>>>
>>>>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>>>>>
>>>>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>>>>> - Each component with maintainers will have at least 2 maintainers.
>>>>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>>>>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>>>>>
>>>>> If you'd like to see examples for this model, check out the following projects:
>>>>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide><https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>>
>>>>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html><https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>>
>>>>>
>>>>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>>>>>
>>>>> - Spark core public API: Matei, Patrick, Reynold
>>>>> - Job scheduler: Matei, Kay, Patrick
>>>>> - Shuffle and network: Reynold, Aaron, Matei
>>>>> - Block manager: Reynold, Aaron
>>>>> - YARN: Tom, Andrew Or
>>>>> - Python: Josh, Matei
>>>>> - MLlib: Xiangrui, Matei
>>>>> - SQL: Michael, Reynold
>>>>> - Streaming: TD, Matei
>>>>> - GraphX: Ankur, Joey, Reynold
>>>>>
>>>>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>>>>>
>>>>> Matei
>>>>
>>>>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Patrick Wendell <pw...@gmail.com>.

I think new committers might or might not be maintainers (it would
depend on the PMC vote). I don't think it would affect what you could
merge, you can merge in any part of the source tree, you just need to
get sign off if you want to touch a public API or make major
architectural changes. Most projects already require code review from
other committers before you commit something, so it's just a version
of that where you have specific people appointed to specific
components for review.

If you look, most large software projects have a maintainer model,
both in Apache and outside of it. Cloudstack is probably the best
example in Apache since they are the second most active project
(roughly) after Spark. They have two levels of maintainers and much
strong language - their language: "In general, maintainers only have
commit rights on the module for which they are responsible.".

I'd like us to start with something simpler and lightweight as
proposed here. Really the proposal on the table is just to codify the
current de-facto process to make sure we stick by it as we scale. If
we want to add more formality to it or strictness, we can do it later.

- Patrick

On Thu, Nov 6, 2014 at 3:29 PM, Hari Shreedharan
<hs...@cloudera.com> wrote:
> How would this model work with a new committer who gets voted in? Does it mean that a new committer would be a maintainer for at least one area -- else we could end up having committers who really can't merge anything significant until he becomes a maintainer.
>
>
> Thanks,
> Hari
>
> On Thu, Nov 6, 2014 at 3:00 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> I think you're misunderstanding the idea of "process" here. The point of process is to make sure something happens automatically, which is useful to ensure a certain level of quality. For example, all our patches go through Jenkins, and nobody will make the mistake of merging them if they fail tests, or RAT checks, or API compatibility checks. The idea is to get the same kind of automation for design on these components. This is a very common process for large software projects, and it's essentially what we had already, but formalizing it will make clear that this is the process we want. It's important to do it early in order to be able to refine the process as the project grows.
>> In terms of scope, again, the maintainers are *not* going to be the only reviewers for that component, they are just a second level of sign-off required for architecture and API. Being a maintainer is also not a "promotion", it's a responsibility. Since we don't have much experience yet with this model, I didn't propose automatic rules beyond that the PMC can add / remove maintainers -- presumably the PMC is in the best position to know what the project needs. I think automatic rules are exactly the kind of "process" you're arguing against. The "process" here is about ensuring certain checks are made for every code change, not about automating personnel and development decisions.
>> In any case, I appreciate your input on this, and we're going to evaluate the model to see how it goes. It might be that we decide we don't want it at all. However, from what I've seen of other projects (not Hadoop but projects with an order of magnitude more contributors, like Python or Linux), this is one of the best ways to have consistently great releases with a large contributor base and little room for error. With all due respect to what Hadoop's accomplished, I wouldn't use Hadoop as the best example to strive for; in my experience there I've seen patches reverted because of architectural disagreements, new APIs released and abandoned, and generally an experience that's been painful for users. A lot of the decisions we've made in Spark (e.g. time-based release cycle, built-in libraries, API stability rules, etc) were based on lessons learned there, in an attempt to define a better model.
>> Matei
>>> On Nov 6, 2014, at 2:18 PM, bc Wong <bc...@cloudera.com> wrote:
>>>
>>> On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>>> <snip>
>>> Ultimately, the core motivation is that the project has grown to the point where it's hard to expect every committer to have full understanding of every component. Some committers know a ton about systems but little about machine learning, some are algorithmic whizzes but may not realize the implications of changing something on the Python API, etc. This is just a way to make sure that a domain expert has looked at the areas where it is most likely for something to go wrong.
>>>
>>> Hi Matei,
>>>
>>> I understand where you're coming from. My suggestion is to solve this without adding a new process. In the example above, those "algo whizzes" committers should realize that they're touching the Python API, and loop in some Python maintainers. Those Python maintainers would then respond and help move the PR along. This is good hygiene and should already be happening. For example, HDFS committers have commit rights to all of Hadoop. But none of them would check in YARN code without getting agreement from the YARN folks.
>>>
>>> I think the majority of the effort here will be education and building the convention. We have to ask committers to watch out for API changes, know their own limits, and involve the component domain experts. We need that anyways, which btw also seems to solve the problem. It's not clear what the new process would add.
>>>
>>> It'd be good to know the details, too. What are the exact criteria for a committer to get promoted to be a maintainer? How often does the PMC re-evaluate the list of maintainers? Is there an upper bound on the number of maintainers for a component? Can we have an automatic rule for a maintainer promotion after X patches or Y lines of code in that area?
>>>
>>> Cheers,
>>> bc
>>>
>>>> On Nov 6, 2014, at 10:53 AM, bc Wong <bcwalrus@cloudera.com <ma...@cloudera.com>> wrote:
>>>>
>>>> Hi Matei,
>>>>
>>>> Good call on scaling the project itself. Identifying domain experts in different areas is a good thing. But I have some questions about the implementation. Here's my understanding of the proposal:
>>>>
>>>> (1) The PMC votes on a list of components and their maintainers. Changes to that list requires PMC approval.
>>>> (2) No committer shall commit changes to a component without a +1 from a maintainer of that component.
>>>>
>>>> I see good reasons for #1, to help people navigate the project and identify expertise. For #2, I'd like to understand what problem it's trying to solve. Do we have rogue committers committing to areas that they don't know much about? If that's the case, we should address it directly, instead of adding new processes.
>>>>
>>>> To point out the obvious, it completely changes what "committers" means in Spark. Do we have clear promotion criteria from "committer" to "maintainer"? Is there a max number of maintainers per area Currently, as committers gains expertise in new areas, they could start reviewing code in those areas and give +1. This encourages more contributions and cross-component knowledge sharing. Under the new proposal, they now have to be promoted to "maintainers" first. That reduces our review bandwidth.
>>>>
>>>> Again, if there is a quality issue with code reviews, let's talk to those committers and help them do better. There are non-process ways to solve the problem.
>>>>
>>>> So I think we shouldn't require "maintainer +1". I do like the idea of having explicit maintainers on a volunteer basis. These maintainers should watch their jira and PR traffic, and be very active in design & API discussions. That leads to better consistency and long-term design choices.
>>>>
>>>> Cheers,
>>>> bc
>>>>
>>>> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>>>> Hi all,
>>>>
>>>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>>>>
>>>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>>>>
>>>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>>>>
>>>> IMO, adopting this model would have two benefits:
>>>>
>>>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>>>>
>>>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who's responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>>>>
>>>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>>>>
>>>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>>>> - Each component with maintainers will have at least 2 maintainers.
>>>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>>>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>>>>
>>>> If you'd like to see examples for this model, check out the following projects:
>>>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide><https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>>
>>>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html><https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>>
>>>>
>>>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>>>>
>>>> - Spark core public API: Matei, Patrick, Reynold
>>>> - Job scheduler: Matei, Kay, Patrick
>>>> - Shuffle and network: Reynold, Aaron, Matei
>>>> - Block manager: Reynold, Aaron
>>>> - YARN: Tom, Andrew Or
>>>> - Python: Josh, Matei
>>>> - MLlib: Xiangrui, Matei
>>>> - SQL: Michael, Reynold
>>>> - Streaming: TD, Matei
>>>> - GraphX: Ankur, Joey, Reynold
>>>>
>>>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>>>>
>>>> Matei
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Posted by Hari Shreedharan <hs...@cloudera.com>.

How would this model work with a new committer who gets voted in? Does it mean that a new committer would be a maintainer for at least one area — else we could end up having committers who really can’t merge anything significant until he becomes a maintainer. 


Thanks,
Hari

On Thu, Nov 6, 2014 at 3:00 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> I think you're misunderstanding the idea of "process" here. The point of process is to make sure something happens automatically, which is useful to ensure a certain level of quality. For example, all our patches go through Jenkins, and nobody will make the mistake of merging them if they fail tests, or RAT checks, or API compatibility checks. The idea is to get the same kind of automation for design on these components. This is a very common process for large software projects, and it's essentially what we had already, but formalizing it will make clear that this is the process we want. It's important to do it early in order to be able to refine the process as the project grows.
> In terms of scope, again, the maintainers are *not* going to be the only reviewers for that component, they are just a second level of sign-off required for architecture and API. Being a maintainer is also not a "promotion", it's a responsibility. Since we don't have much experience yet with this model, I didn't propose automatic rules beyond that the PMC can add / remove maintainers -- presumably the PMC is in the best position to know what the project needs. I think automatic rules are exactly the kind of "process" you're arguing against. The "process" here is about ensuring certain checks are made for every code change, not about automating personnel and development decisions.
> In any case, I appreciate your input on this, and we're going to evaluate the model to see how it goes. It might be that we decide we don't want it at all. However, from what I've seen of other projects (not Hadoop but projects with an order of magnitude more contributors, like Python or Linux), this is one of the best ways to have consistently great releases with a large contributor base and little room for error. With all due respect to what Hadoop's accomplished, I wouldn't use Hadoop as the best example to strive for; in my experience there I've seen patches reverted because of architectural disagreements, new APIs released and abandoned, and generally an experience that's been painful for users. A lot of the decisions we've made in Spark (e.g. time-based release cycle, built-in libraries, API stability rules, etc) were based on lessons learned there, in an attempt to define a better model.
> Matei
>> On Nov 6, 2014, at 2:18 PM, bc Wong <bc...@cloudera.com> wrote:
>> 
>> On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>> <snip> 
>> Ultimately, the core motivation is that the project has grown to the point where it's hard to expect every committer to have full understanding of every component. Some committers know a ton about systems but little about machine learning, some are algorithmic whizzes but may not realize the implications of changing something on the Python API, etc. This is just a way to make sure that a domain expert has looked at the areas where it is most likely for something to go wrong.
>> 
>> Hi Matei,
>> 
>> I understand where you're coming from. My suggestion is to solve this without adding a new process. In the example above, those "algo whizzes" committers should realize that they're touching the Python API, and loop in some Python maintainers. Those Python maintainers would then respond and help move the PR along. This is good hygiene and should already be happening. For example, HDFS committers have commit rights to all of Hadoop. But none of them would check in YARN code without getting agreement from the YARN folks.
>> 
>> I think the majority of the effort here will be education and building the convention. We have to ask committers to watch out for API changes, know their own limits, and involve the component domain experts. We need that anyways, which btw also seems to solve the problem. It's not clear what the new process would add.
>> 
>> It'd be good to know the details, too. What are the exact criteria for a committer to get promoted to be a maintainer? How often does the PMC re-evaluate the list of maintainers? Is there an upper bound on the number of maintainers for a component? Can we have an automatic rule for a maintainer promotion after X patches or Y lines of code in that area?
>> 
>> Cheers,
>> bc
>> 
>>> On Nov 6, 2014, at 10:53 AM, bc Wong <bcwalrus@cloudera.com <ma...@cloudera.com>> wrote:
>>> 
>>> Hi Matei,
>>> 
>>> Good call on scaling the project itself. Identifying domain experts in different areas is a good thing. But I have some questions about the implementation. Here's my understanding of the proposal:
>>> 
>>> (1) The PMC votes on a list of components and their maintainers. Changes to that list requires PMC approval.
>>> (2) No committer shall commit changes to a component without a +1 from a maintainer of that component.
>>> 
>>> I see good reasons for #1, to help people navigate the project and identify expertise. For #2, I'd like to understand what problem it's trying to solve. Do we have rogue committers committing to areas that they don't know much about? If that's the case, we should address it directly, instead of adding new processes.
>>> 
>>> To point out the obvious, it completely changes what "committers" means in Spark. Do we have clear promotion criteria from "committer" to "maintainer"? Is there a max number of maintainers per area Currently, as committers gains expertise in new areas, they could start reviewing code in those areas and give +1. This encourages more contributions and cross-component knowledge sharing. Under the new proposal, they now have to be promoted to "maintainers" first. That reduces our review bandwidth.
>>> 
>>> Again, if there is a quality issue with code reviews, let's talk to those committers and help them do better. There are non-process ways to solve the problem.
>>> 
>>> So I think we shouldn't require "maintainer +1". I do like the idea of having explicit maintainers on a volunteer basis. These maintainers should watch their jira and PR traffic, and be very active in design & API discussions. That leads to better consistency and long-term design choices.
>>> 
>>> Cheers,
>>> bc
>>> 
>>> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>>> Hi all,
>>> 
>>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>>> 
>>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>>> 
>>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>>> 
>>> IMO, adopting this model would have two benefits:
>>> 
>>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>>> 
>>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>>> 
>>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>>> 
>>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>>> - Each component with maintainers will have at least 2 maintainers.
>>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>>> 
>>> If you'd like to see examples for this model, check out the following projects:
>>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide><https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>>
>>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html><https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>>
>>> 
>>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>>> 
>>> - Spark core public API: Matei, Patrick, Reynold
>>> - Job scheduler: Matei, Kay, Patrick
>>> - Shuffle and network: Reynold, Aaron, Matei
>>> - Block manager: Reynold, Aaron
>>> - YARN: Tom, Andrew Or
>>> - Python: Josh, Matei
>>> - MLlib: Xiangrui, Matei
>>> - SQL: Michael, Reynold
>>> - Streaming: TD, Matei
>>> - GraphX: Ankur, Joey, Reynold
>>> 
>>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>>> 
>>> Matei
>> 
>>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Matei Zaharia <ma...@gmail.com>.

I think you're misunderstanding the idea of "process" here. The point of process is to make sure something happens automatically, which is useful to ensure a certain level of quality. For example, all our patches go through Jenkins, and nobody will make the mistake of merging them if they fail tests, or RAT checks, or API compatibility checks. The idea is to get the same kind of automation for design on these components. This is a very common process for large software projects, and it's essentially what we had already, but formalizing it will make clear that this is the process we want. It's important to do it early in order to be able to refine the process as the project grows.

In terms of scope, again, the maintainers are *not* going to be the only reviewers for that component, they are just a second level of sign-off required for architecture and API. Being a maintainer is also not a "promotion", it's a responsibility. Since we don't have much experience yet with this model, I didn't propose automatic rules beyond that the PMC can add / remove maintainers -- presumably the PMC is in the best position to know what the project needs. I think automatic rules are exactly the kind of "process" you're arguing against. The "process" here is about ensuring certain checks are made for every code change, not about automating personnel and development decisions.

In any case, I appreciate your input on this, and we're going to evaluate the model to see how it goes. It might be that we decide we don't want it at all. However, from what I've seen of other projects (not Hadoop but projects with an order of magnitude more contributors, like Python or Linux), this is one of the best ways to have consistently great releases with a large contributor base and little room for error. With all due respect to what Hadoop's accomplished, I wouldn't use Hadoop as the best example to strive for; in my experience there I've seen patches reverted because of architectural disagreements, new APIs released and abandoned, and generally an experience that's been painful for users. A lot of the decisions we've made in Spark (e.g. time-based release cycle, built-in libraries, API stability rules, etc) were based on lessons learned there, in an attempt to define a better model.

Matei


> On Nov 6, 2014, at 2:18 PM, bc Wong <bc...@cloudera.com> wrote:
> 
> On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
> <snip> 
> Ultimately, the core motivation is that the project has grown to the point where it's hard to expect every committer to have full understanding of every component. Some committers know a ton about systems but little about machine learning, some are algorithmic whizzes but may not realize the implications of changing something on the Python API, etc. This is just a way to make sure that a domain expert has looked at the areas where it is most likely for something to go wrong.
> 
> Hi Matei,
> 
> I understand where you're coming from. My suggestion is to solve this without adding a new process. In the example above, those "algo whizzes" committers should realize that they're touching the Python API, and loop in some Python maintainers. Those Python maintainers would then respond and help move the PR along. This is good hygiene and should already be happening. For example, HDFS committers have commit rights to all of Hadoop. But none of them would check in YARN code without getting agreement from the YARN folks.
> 
> I think the majority of the effort here will be education and building the convention. We have to ask committers to watch out for API changes, know their own limits, and involve the component domain experts. We need that anyways, which btw also seems to solve the problem. It's not clear what the new process would add.
> 
> It'd be good to know the details, too. What are the exact criteria for a committer to get promoted to be a maintainer? How often does the PMC re-evaluate the list of maintainers? Is there an upper bound on the number of maintainers for a component? Can we have an automatic rule for a maintainer promotion after X patches or Y lines of code in that area?
> 
> Cheers,
> bc
> 
>> On Nov 6, 2014, at 10:53 AM, bc Wong <bcwalrus@cloudera.com <ma...@cloudera.com>> wrote:
>> 
>> Hi Matei,
>> 
>> Good call on scaling the project itself. Identifying domain experts in different areas is a good thing. But I have some questions about the implementation. Here's my understanding of the proposal:
>> 
>> (1) The PMC votes on a list of components and their maintainers. Changes to that list requires PMC approval.
>> (2) No committer shall commit changes to a component without a +1 from a maintainer of that component.
>> 
>> I see good reasons for #1, to help people navigate the project and identify expertise. For #2, I'd like to understand what problem it's trying to solve. Do we have rogue committers committing to areas that they don't know much about? If that's the case, we should address it directly, instead of adding new processes.
>> 
>> To point out the obvious, it completely changes what "committers" means in Spark. Do we have clear promotion criteria from "committer" to "maintainer"? Is there a max number of maintainers per area Currently, as committers gains expertise in new areas, they could start reviewing code in those areas and give +1. This encourages more contributions and cross-component knowledge sharing. Under the new proposal, they now have to be promoted to "maintainers" first. That reduces our review bandwidth.
>> 
>> Again, if there is a quality issue with code reviews, let's talk to those committers and help them do better. There are non-process ways to solve the problem.
>> 
>> So I think we shouldn't require "maintainer +1". I do like the idea of having explicit maintainers on a volunteer basis. These maintainers should watch their jira and PR traffic, and be very active in design & API discussions. That leads to better consistency and long-term design choices.
>> 
>> Cheers,
>> bc
>> 
>> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>> Hi all,
>> 
>> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
>> 
>> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
>> 
>> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
>> 
>> IMO, adopting this model would have two benefits:
>> 
>> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
>> 
>> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
>> 
>> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
>> 
>> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
>> 
>> If you'd like to see examples for this model, check out the following projects:
>> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide><https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>>
>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html><https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>>
>> 
>> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
>> 
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>> 
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> 
>> Matei
> 
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by bc Wong <bc...@cloudera.com>.

On Thu, Nov 6, 2014 at 11:25 AM, Matei Zaharia <ma...@gmail.com>
wrote:
<snip>

> Ultimately, the core motivation is that the project has grown to the point
> where it's hard to expect every committer to have full understanding of
> every component. Some committers know a ton about systems but little about
> machine learning, some are algorithmic whizzes but may not realize the
> implications of changing something on the Python API, etc. This is just a
> way to make sure that a domain expert has looked at the areas where it is
> most likely for something to go wrong.
>

Hi Matei,

I understand where you're coming from. My suggestion is to solve this
without adding a new process. In the example above, those "algo whizzes"
committers should realize that they're touching the Python API, and loop in
some Python maintainers. Those Python maintainers would then respond and
help move the PR along. This is good hygiene and should already be
happening. For example, HDFS committers have commit rights to all of
Hadoop. But none of them would check in YARN code without getting agreement
from the YARN folks.

I think the majority of the effort here will be education and building the
convention. We have to ask committers to watch out for API changes, know
their own limits, and involve the component domain experts. We need that
anyways, which btw also seems to solve the problem. It's not clear what the
new process would add.

It'd be good to know the details, too. What are the exact criteria for a
committer to get promoted to be a maintainer? How often does the PMC
re-evaluate the list of maintainers? Is there an upper bound on the number
of maintainers for a component? Can we have an automatic rule for a
maintainer promotion after X patches or Y lines of code in that area?

Cheers,
bc

On Nov 6, 2014, at 10:53 AM, bc Wong <bc...@cloudera.com> wrote:
>
> Hi Matei,
>
> Good call on scaling the project itself. Identifying domain experts in
> different areas is a good thing. But I have some questions about the
> implementation. Here's my understanding of the proposal:
>
> (1) The PMC votes on a list of components and their maintainers. Changes
> to that list requires PMC approval.
> (2) No committer shall commit changes to a component without a +1 from a
> maintainer of that component.
>
> I see good reasons for #1, to help people navigate the project and
> identify expertise. For #2, I'd like to understand what problem it's trying
> to solve. Do we have rogue committers committing to areas that they don't
> know much about? If that's the case, we should address it directly, instead
> of adding new processes.
>
> To point out the obvious, it completely changes what "committers" means in
> Spark. Do we have clear promotion criteria from "committer" to
> "maintainer"? Is there a max number of maintainers per area Currently, as
> committers gains expertise in new areas, they could start reviewing code in
> those areas and give +1. This encourages more contributions and
> cross-component knowledge sharing. Under the new proposal, they now have to
> be promoted to "maintainers" first. That reduces our review bandwidth.
>
> Again, if there is a quality issue with code reviews, let's talk to those
> committers and help them do better. There are non-process ways to solve the
> problem.
>
> So I think we shouldn't require "maintainer +1". I do like the idea of
> having explicit maintainers on a volunteer basis. These maintainers should
> watch their jira and PR traffic, and be very active in design & API
> discussions. That leads to better consistency and long-term design choices.
>
> Cheers,
> bc
>
> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I wanted to share a discussion we've been having on the PMC list, as well
>> as call for an official vote on it on a public list. Basically, as the
>> Spark project scales up, we need to define a model to make sure there is
>> still great oversight of key components (in particular internal
>> architecture and public APIs), and to this end I've proposed implementing a
>> maintainer model for some of these components, similar to other large
>> projects.
>>
>> As background on this, Spark has grown a lot since joining Apache. We've
>> had over 80 contributors/month for the past 3 months, which I believe makes
>> us the most active project in contributors/month at Apache, as well as over
>> 500 patches/month. The codebase has also grown significantly, with new
>> libraries for SQL, ML, graphs and more.
>>
>> In this kind of large project, one common way to scale development is to
>> assign "maintainers" to oversee key components, where each patch to that
>> component needs to get sign-off from at least one of its maintainers. Most
>> existing large projects do this -- at Apache, some large ones with this
>> model are CloudStack (the second-most active project overall), Subversion,
>> and Kafka, and other examples include Linux and Python. This is also
>> by-and-large how Spark operates today -- most components have a de-facto
>> maintainer.
>>
>> IMO, adopting this model would have two benefits:
>>
>> 1) Consistent oversight of design for that component, especially
>> regarding architecture and API. This process would ensure that the
>> component's maintainers see all proposed changes and consider them to fit
>> together in a good way.
>>
>> 2) More structure for new contributors and committers -- in particular,
>> it would be easy to look up who’s responsible for each module and ask them
>> for reviews, etc, rather than having patches slip between the cracks.
>>
>> We'd like to start with in a light-weight manner, where the model only
>> applies to certain key components (e.g. scheduler, shuffle) and user-facing
>> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
>> it if we deem it useful. The specific mechanics would be as follows:
>>
>> - Some components in Spark will have maintainers assigned to them, where
>> one of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable
>> committers on that component by the PMC. The PMC can vote to add / remove
>> maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for
>> their components, though they do not need to be the main reviewers for them
>> (e.g. they might just sign off on architecture / API). To prevent inactive
>> maintainers from blocking the project, if a maintainer isn't responding in
>> a reasonable time period (say 2 weeks), other committers can merge the
>> patch, and the PMC will want to discuss adding another maintainer.
>>
>> If you'd like to see examples for this model, check out the following
>> projects:
>> - CloudStack:
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> <
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> >
>> - Subversion:
>> https://subversion.apache.org/docs/community-guide/roles.html<
>> https://subversion.apache.org/docs/community-guide/roles.html>
>>
>> Finally, I wanted to list our current proposal for initial components and
>> maintainers. It would be good to get feedback on other components we might
>> add, but please note that personnel discussions (e.g. "I don't think Matei
>> should maintain *that* component) should only happen on the private list.
>> The initial components were chosen to include all public APIs and the main
>> core components, and the maintainers were chosen from the most active
>> contributors to those modules.
>>
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>>
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
>> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>>
>> Matei
>
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Matei Zaharia <ma...@gmail.com>.

Hi BC,

The point is exactly to ensure that the maintainers have looked at each patch to that component and consider it to fit consistently into its architecture. The issue is not about "rogue" committers, it's about making sure that changes don't accidentally sneak in that we want to roll back, particularly because we have frequent releases and we guarantee API stability. This process is meant to ensure that whichever committer reviews a patch also forwards it to its maintainers.

Note that any committer is able to review patches in any component. The maintainer sign-off is just a second requirement for some core components (central parts of the system and public APIs). But I expect that most maintainers will let others do the bulk of the reviewing and focus only on changes to the architecture or API.

Ultimately, the core motivation is that the project has grown to the point where it's hard to expect every committer to have full understanding of every component. Some committers know a ton about systems but little about machine learning, some are algorithmic whizzes but may not realize the implications of changing something on the Python API, etc. This is just a way to make sure that a domain expert has looked at the areas where it is most likely for something to go wrong.

Matei

> On Nov 6, 2014, at 10:53 AM, bc Wong <bc...@cloudera.com> wrote:
> 
> Hi Matei,
> 
> Good call on scaling the project itself. Identifying domain experts in different areas is a good thing. But I have some questions about the implementation. Here's my understanding of the proposal:
> 
> (1) The PMC votes on a list of components and their maintainers. Changes to that list requires PMC approval.
> (2) No committer shall commit changes to a component without a +1 from a maintainer of that component.
> 
> I see good reasons for #1, to help people navigate the project and identify expertise. For #2, I'd like to understand what problem it's trying to solve. Do we have rogue committers committing to areas that they don't know much about? If that's the case, we should address it directly, instead of adding new processes.
> 
> To point out the obvious, it completely changes what "committers" means in Spark. Do we have clear promotion criteria from "committer" to "maintainer"? Is there a max number of maintainers per area Currently, as committers gains expertise in new areas, they could start reviewing code in those areas and give +1. This encourages more contributions and cross-component knowledge sharing. Under the new proposal, they now have to be promoted to "maintainers" first. That reduces our review bandwidth.
> 
> Again, if there is a quality issue with code reviews, let's talk to those committers and help them do better. There are non-process ways to solve the problem.
> 
> So I think we shouldn't require "maintainer +1". I do like the idea of having explicit maintainers on a volunteer basis. These maintainers should watch their jira and PR traffic, and be very active in design & API discussions. That leads to better consistency and long-term design choices.
> 
> Cheers,
> bc
> 
> On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
> Hi all,
> 
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> 
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> 
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> 
> IMO, adopting this model would have two benefits:
> 
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> 
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> 
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> 
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> 
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide><https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>>
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html><https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>>
> 
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> 
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
> 
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> 
> Matei

Re: [VOTE] Designating maintainers for some Spark components

Posted by bc Wong <bc...@cloudera.com>.

Hi Matei,

Good call on scaling the project itself. Identifying domain experts in
different areas is a good thing. But I have some questions about the
implementation. Here's my understanding of the proposal:

(1) The PMC votes on a list of components and their maintainers. Changes to
that list requires PMC approval.
(2) No committer shall commit changes to a component without a +1 from a
maintainer of that component.

I see good reasons for #1, to help people navigate the project and identify
expertise. For #2, I'd like to understand what problem it's trying to
solve. Do we have rogue committers committing to areas that they don't know
much about? If that's the case, we should address it directly, instead of
adding new processes.

To point out the obvious, it completely changes what "committers" means in
Spark. Do we have clear promotion criteria from "committer" to
"maintainer"? Is there a max number of maintainers per area Currently, as
committers gains expertise in new areas, they could start reviewing code in
those areas and give +1. This encourages more contributions and
cross-component knowledge sharing. Under the new proposal, they now have to
be promoted to "maintainers" first. That reduces our review bandwidth.

Again, if there is a quality issue with code reviews, let's talk to those
committers and help them do better. There are non-process ways to solve the
problem.

So I think we shouldn't require "maintainer +1". I do like the idea of
having explicit maintainers on a volunteer basis. These maintainers should
watch their jira and PR traffic, and be very active in design & API
discussions. That leads to better consistency and long-term design choices.

Cheers,
bc

On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> Hi all,
>
> I wanted to share a discussion we've been having on the PMC list, as well
> as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
>
> As background on this, Spark has grown a lot since joining Apache. We've
> had over 80 contributors/month for the past 3 months, which I believe makes
> us the most active project in contributors/month at Apache, as well as over
> 500 patches/month. The codebase has also grown significantly, with new
> libraries for SQL, ML, graphs and more.
>
> In this kind of large project, one common way to scale development is to
> assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
>
> IMO, adopting this model would have two benefits:
>
> 1) Consistent oversight of design for that component, especially regarding
> architecture and API. This process would ensure that the component's
> maintainers see all proposed changes and consider them to fit together in a
> good way.
>
> 2) More structure for new contributors and committers -- in particular, it
> would be easy to look up who’s responsible for each module and ask them for
> reviews, etc, rather than having patches slip between the cracks.
>
> We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
>
> - Some components in Spark will have maintainers assigned to them, where
> one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their
> components, though they do not need to be the main reviewers for them (e.g.
> they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
>
> If you'd like to see examples for this model, check out the following
> projects:
> - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
>
> Finally, I wanted to list our current proposal for initial components and
> maintainers. It would be good to get feedback on other components we might
> add, but please note that personnel discussions (e.g. "I don't think Matei
> should maintain *that* component) should only happen on the private list.
> The initial components were chosen to include all public APIs and the main
> core components, and the maintainers were chosen from the most active
> contributors to those modules.
>
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
>
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>
> Matei

Re: [VOTE] Designating maintainers for some Spark components

Posted by Kay Ousterhout <ke...@eecs.berkeley.edu>.

+1 (binding)

I see this as a way to increase transparency and efficiency around a
process that already informally exists, with benefits to both new
contributors and committers.  For new contributors, it makes clear who they
should ping about a pending patch.  For committers, it's a good reference
for who to rope in if they're reviewing a change that touches code they're
unfamiliar with.  I've often found myself in that situation when doing a
review; for me, having this list would be quite helpful.

-Kay

On Thu, Nov 6, 2014 at 10:00 AM, Josh Rosen <ro...@gmail.com> wrote:

> +1 (binding).
>
> (our pull request browsing tool is open-source, by the way; contributions
> welcome: https://github.com/databricks/spark-pr-dashboard)
>
> On Thu, Nov 6, 2014 at 9:28 AM, Nick Pentreath <ni...@gmail.com>
> wrote:
>
> > +1 (binding)
> >
> > —
> > Sent from Mailbox
> >
> > On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das <de...@gmail.com>
> > wrote:
> >
> > > +1
> > > The app to track PRs based on component is a great idea...
> > > On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara <
> > Sean.McNamara@webtrends.com>
> > > wrote:
> > >> +1
> > >>
> > >> Sean
> > >>
> > >> On Nov 5, 2014, at 6:32 PM, Matei Zaharia <ma...@gmail.com>
> > wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > I wanted to share a discussion we've been having on the PMC list, as
> > >> well as call for an official vote on it on a public list. Basically,
> as
> > the
> > >> Spark project scales up, we need to define a model to make sure there
> is
> > >> still great oversight of key components (in particular internal
> > >> architecture and public APIs), and to this end I've proposed
> > implementing a
> > >> maintainer model for some of these components, similar to other large
> > >> projects.
> > >> >
> > >> > As background on this, Spark has grown a lot since joining Apache.
> > We've
> > >> had over 80 contributors/month for the past 3 months, which I believe
> > makes
> > >> us the most active project in contributors/month at Apache, as well as
> > over
> > >> 500 patches/month. The codebase has also grown significantly, with new
> > >> libraries for SQL, ML, graphs and more.
> > >> >
> > >> > In this kind of large project, one common way to scale development
> is
> > to
> > >> assign "maintainers" to oversee key components, where each patch to
> that
> > >> component needs to get sign-off from at least one of its maintainers.
> > Most
> > >> existing large projects do this -- at Apache, some large ones with
> this
> > >> model are CloudStack (the second-most active project overall),
> > Subversion,
> > >> and Kafka, and other examples include Linux and Python. This is also
> > >> by-and-large how Spark operates today -- most components have a
> de-facto
> > >> maintainer.
> > >> >
> > >> > IMO, adopting this model would have two benefits:
> > >> >
> > >> > 1) Consistent oversight of design for that component, especially
> > >> regarding architecture and API. This process would ensure that the
> > >> component's maintainers see all proposed changes and consider them to
> > fit
> > >> together in a good way.
> > >> >
> > >> > 2) More structure for new contributors and committers -- in
> > particular,
> > >> it would be easy to look up who’s responsible for each module and ask
> > them
> > >> for reviews, etc, rather than having patches slip between the cracks.
> > >> >
> > >> > We'd like to start with in a light-weight manner, where the model
> only
> > >> applies to certain key components (e.g. scheduler, shuffle) and
> > user-facing
> > >> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> > expand
> > >> it if we deem it useful. The specific mechanics would be as follows:
> > >> >
> > >> > - Some components in Spark will have maintainers assigned to them,
> > where
> > >> one of the maintainers needs to sign off on each patch to the
> component.
> > >> > - Each component with maintainers will have at least 2 maintainers.
> > >> > - Maintainers will be assigned from the most active and
> knowledgeable
> > >> committers on that component by the PMC. The PMC can vote to add /
> > remove
> > >> maintainers, and maintained components, through consensus.
> > >> > - Maintainers are expected to be active in responding to patches for
> > >> their components, though they do not need to be the main reviewers for
> > them
> > >> (e.g. they might just sign off on architecture / API). To prevent
> > inactive
> > >> maintainers from blocking the project, if a maintainer isn't
> responding
> > in
> > >> a reasonable time period (say 2 weeks), other committers can merge the
> > >> patch, and the PMC will want to discuss adding another maintainer.
> > >> >
> > >> > If you'd like to see examples for this model, check out the
> following
> > >> projects:
> > >> > - CloudStack:
> > >>
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> > >> >
> > >> > - Subversion:
> > >> https://subversion.apache.org/docs/community-guide/roles.html <
> > >> https://subversion.apache.org/docs/community-guide/roles.html>
> > >> >
> > >> > Finally, I wanted to list our current proposal for initial
> components
> > >> and maintainers. It would be good to get feedback on other components
> we
> > >> might add, but please note that personnel discussions (e.g. "I don't
> > think
> > >> Matei should maintain *that* component) should only happen on the
> > private
> > >> list. The initial components were chosen to include all public APIs
> and
> > the
> > >> main core components, and the maintainers were chosen from the most
> > active
> > >> contributors to those modules.
> > >> >
> > >> > - Spark core public API: Matei, Patrick, Reynold
> > >> > - Job scheduler: Matei, Kay, Patrick
> > >> > - Shuffle and network: Reynold, Aaron, Matei
> > >> > - Block manager: Reynold, Aaron
> > >> > - YARN: Tom, Andrew Or
> > >> > - Python: Josh, Matei
> > >> > - MLlib: Xiangrui, Matei
> > >> > - SQL: Michael, Reynold
> > >> > - Streaming: TD, Matei
> > >> > - GraphX: Ankur, Joey, Reynold
> > >> >
> > >> > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> > The
> > >> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> > >> >
> > >> > Matei
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > >> For additional commands, e-mail: dev-help@spark.apache.org
> > >>
> > >>
> >
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Josh Rosen <ro...@gmail.com>.

+1 (binding).

(our pull request browsing tool is open-source, by the way; contributions
welcome: https://github.com/databricks/spark-pr-dashboard)

On Thu, Nov 6, 2014 at 9:28 AM, Nick Pentreath <ni...@gmail.com>
wrote:

> +1 (binding)
>
> —
> Sent from Mailbox
>
> On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das <de...@gmail.com>
> wrote:
>
> > +1
> > The app to track PRs based on component is a great idea...
> > On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara <
> Sean.McNamara@webtrends.com>
> > wrote:
> >> +1
> >>
> >> Sean
> >>
> >> On Nov 5, 2014, at 6:32 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I wanted to share a discussion we've been having on the PMC list, as
> >> well as call for an official vote on it on a public list. Basically, as
> the
> >> Spark project scales up, we need to define a model to make sure there is
> >> still great oversight of key components (in particular internal
> >> architecture and public APIs), and to this end I've proposed
> implementing a
> >> maintainer model for some of these components, similar to other large
> >> projects.
> >> >
> >> > As background on this, Spark has grown a lot since joining Apache.
> We've
> >> had over 80 contributors/month for the past 3 months, which I believe
> makes
> >> us the most active project in contributors/month at Apache, as well as
> over
> >> 500 patches/month. The codebase has also grown significantly, with new
> >> libraries for SQL, ML, graphs and more.
> >> >
> >> > In this kind of large project, one common way to scale development is
> to
> >> assign "maintainers" to oversee key components, where each patch to that
> >> component needs to get sign-off from at least one of its maintainers.
> Most
> >> existing large projects do this -- at Apache, some large ones with this
> >> model are CloudStack (the second-most active project overall),
> Subversion,
> >> and Kafka, and other examples include Linux and Python. This is also
> >> by-and-large how Spark operates today -- most components have a de-facto
> >> maintainer.
> >> >
> >> > IMO, adopting this model would have two benefits:
> >> >
> >> > 1) Consistent oversight of design for that component, especially
> >> regarding architecture and API. This process would ensure that the
> >> component's maintainers see all proposed changes and consider them to
> fit
> >> together in a good way.
> >> >
> >> > 2) More structure for new contributors and committers -- in
> particular,
> >> it would be easy to look up who’s responsible for each module and ask
> them
> >> for reviews, etc, rather than having patches slip between the cracks.
> >> >
> >> > We'd like to start with in a light-weight manner, where the model only
> >> applies to certain key components (e.g. scheduler, shuffle) and
> user-facing
> >> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> expand
> >> it if we deem it useful. The specific mechanics would be as follows:
> >> >
> >> > - Some components in Spark will have maintainers assigned to them,
> where
> >> one of the maintainers needs to sign off on each patch to the component.
> >> > - Each component with maintainers will have at least 2 maintainers.
> >> > - Maintainers will be assigned from the most active and knowledgeable
> >> committers on that component by the PMC. The PMC can vote to add /
> remove
> >> maintainers, and maintained components, through consensus.
> >> > - Maintainers are expected to be active in responding to patches for
> >> their components, though they do not need to be the main reviewers for
> them
> >> (e.g. they might just sign off on architecture / API). To prevent
> inactive
> >> maintainers from blocking the project, if a maintainer isn't responding
> in
> >> a reasonable time period (say 2 weeks), other committers can merge the
> >> patch, and the PMC will want to discuss adding another maintainer.
> >> >
> >> > If you'd like to see examples for this model, check out the following
> >> projects:
> >> > - CloudStack:
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> <
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> >
> >> > - Subversion:
> >> https://subversion.apache.org/docs/community-guide/roles.html <
> >> https://subversion.apache.org/docs/community-guide/roles.html>
> >> >
> >> > Finally, I wanted to list our current proposal for initial components
> >> and maintainers. It would be good to get feedback on other components we
> >> might add, but please note that personnel discussions (e.g. "I don't
> think
> >> Matei should maintain *that* component) should only happen on the
> private
> >> list. The initial components were chosen to include all public APIs and
> the
> >> main core components, and the maintainers were chosen from the most
> active
> >> contributors to those modules.
> >> >
> >> > - Spark core public API: Matei, Patrick, Reynold
> >> > - Job scheduler: Matei, Kay, Patrick
> >> > - Shuffle and network: Reynold, Aaron, Matei
> >> > - Block manager: Reynold, Aaron
> >> > - YARN: Tom, Andrew Or
> >> > - Python: Josh, Matei
> >> > - MLlib: Xiangrui, Matei
> >> > - SQL: Michael, Reynold
> >> > - Streaming: TD, Matei
> >> > - GraphX: Ankur, Joey, Reynold
> >> >
> >> > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The
> >> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >> >
> >> > Matei
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Nick Pentreath <ni...@gmail.com>.

+1 (binding)

—
Sent from Mailbox

On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das <de...@gmail.com>
wrote:

> +1
> The app to track PRs based on component is a great idea...
> On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara <Se...@webtrends.com>
> wrote:
>> +1
>>
>> Sean
>>
>> On Nov 5, 2014, at 6:32 PM, Matei Zaharia <ma...@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I wanted to share a discussion we've been having on the PMC list, as
>> well as call for an official vote on it on a public list. Basically, as the
>> Spark project scales up, we need to define a model to make sure there is
>> still great oversight of key components (in particular internal
>> architecture and public APIs), and to this end I've proposed implementing a
>> maintainer model for some of these components, similar to other large
>> projects.
>> >
>> > As background on this, Spark has grown a lot since joining Apache. We've
>> had over 80 contributors/month for the past 3 months, which I believe makes
>> us the most active project in contributors/month at Apache, as well as over
>> 500 patches/month. The codebase has also grown significantly, with new
>> libraries for SQL, ML, graphs and more.
>> >
>> > In this kind of large project, one common way to scale development is to
>> assign "maintainers" to oversee key components, where each patch to that
>> component needs to get sign-off from at least one of its maintainers. Most
>> existing large projects do this -- at Apache, some large ones with this
>> model are CloudStack (the second-most active project overall), Subversion,
>> and Kafka, and other examples include Linux and Python. This is also
>> by-and-large how Spark operates today -- most components have a de-facto
>> maintainer.
>> >
>> > IMO, adopting this model would have two benefits:
>> >
>> > 1) Consistent oversight of design for that component, especially
>> regarding architecture and API. This process would ensure that the
>> component's maintainers see all proposed changes and consider them to fit
>> together in a good way.
>> >
>> > 2) More structure for new contributors and committers -- in particular,
>> it would be easy to look up who’s responsible for each module and ask them
>> for reviews, etc, rather than having patches slip between the cracks.
>> >
>> > We'd like to start with in a light-weight manner, where the model only
>> applies to certain key components (e.g. scheduler, shuffle) and user-facing
>> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
>> it if we deem it useful. The specific mechanics would be as follows:
>> >
>> > - Some components in Spark will have maintainers assigned to them, where
>> one of the maintainers needs to sign off on each patch to the component.
>> > - Each component with maintainers will have at least 2 maintainers.
>> > - Maintainers will be assigned from the most active and knowledgeable
>> committers on that component by the PMC. The PMC can vote to add / remove
>> maintainers, and maintained components, through consensus.
>> > - Maintainers are expected to be active in responding to patches for
>> their components, though they do not need to be the main reviewers for them
>> (e.g. they might just sign off on architecture / API). To prevent inactive
>> maintainers from blocking the project, if a maintainer isn't responding in
>> a reasonable time period (say 2 weeks), other committers can merge the
>> patch, and the PMC will want to discuss adding another maintainer.
>> >
>> > If you'd like to see examples for this model, check out the following
>> projects:
>> > - CloudStack:
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> <
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> >
>> > - Subversion:
>> https://subversion.apache.org/docs/community-guide/roles.html <
>> https://subversion.apache.org/docs/community-guide/roles.html>
>> >
>> > Finally, I wanted to list our current proposal for initial components
>> and maintainers. It would be good to get feedback on other components we
>> might add, but please note that personnel discussions (e.g. "I don't think
>> Matei should maintain *that* component) should only happen on the private
>> list. The initial components were chosen to include all public APIs and the
>> main core components, and the maintainers were chosen from the most active
>> contributors to those modules.
>> >
>> > - Spark core public API: Matei, Patrick, Reynold
>> > - Job scheduler: Matei, Kay, Patrick
>> > - Shuffle and network: Reynold, Aaron, Matei
>> > - Block manager: Reynold, Aaron
>> > - YARN: Tom, Andrew Or
>> > - Python: Josh, Matei
>> > - MLlib: Xiangrui, Matei
>> > - SQL: Michael, Reynold
>> > - Streaming: TD, Matei
>> > - GraphX: Ankur, Joey, Reynold
>> >
>> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
>> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> >
>> > Matei
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Debasish Das <de...@gmail.com>.

+1

The app to track PRs based on component is a great idea...

On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara <Se...@webtrends.com>
wrote:

> +1
>
> Sean
>
> On Nov 5, 2014, at 6:32 PM, Matei Zaharia <ma...@gmail.com> wrote:
>
> > Hi all,
> >
> > I wanted to share a discussion we've been having on the PMC list, as
> well as call for an official vote on it on a public list. Basically, as the
> Spark project scales up, we need to define a model to make sure there is
> still great oversight of key components (in particular internal
> architecture and public APIs), and to this end I've proposed implementing a
> maintainer model for some of these components, similar to other large
> projects.
> >
> > As background on this, Spark has grown a lot since joining Apache. We've
> had over 80 contributors/month for the past 3 months, which I believe makes
> us the most active project in contributors/month at Apache, as well as over
> 500 patches/month. The codebase has also grown significantly, with new
> libraries for SQL, ML, graphs and more.
> >
> > In this kind of large project, one common way to scale development is to
> assign "maintainers" to oversee key components, where each patch to that
> component needs to get sign-off from at least one of its maintainers. Most
> existing large projects do this -- at Apache, some large ones with this
> model are CloudStack (the second-most active project overall), Subversion,
> and Kafka, and other examples include Linux and Python. This is also
> by-and-large how Spark operates today -- most components have a de-facto
> maintainer.
> >
> > IMO, adopting this model would have two benefits:
> >
> > 1) Consistent oversight of design for that component, especially
> regarding architecture and API. This process would ensure that the
> component's maintainers see all proposed changes and consider them to fit
> together in a good way.
> >
> > 2) More structure for new contributors and committers -- in particular,
> it would be easy to look up who’s responsible for each module and ask them
> for reviews, etc, rather than having patches slip between the cracks.
> >
> > We'd like to start with in a light-weight manner, where the model only
> applies to certain key components (e.g. scheduler, shuffle) and user-facing
> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
> it if we deem it useful. The specific mechanics would be as follows:
> >
> > - Some components in Spark will have maintainers assigned to them, where
> one of the maintainers needs to sign off on each patch to the component.
> > - Each component with maintainers will have at least 2 maintainers.
> > - Maintainers will be assigned from the most active and knowledgeable
> committers on that component by the PMC. The PMC can vote to add / remove
> maintainers, and maintained components, through consensus.
> > - Maintainers are expected to be active in responding to patches for
> their components, though they do not need to be the main reviewers for them
> (e.g. they might just sign off on architecture / API). To prevent inactive
> maintainers from blocking the project, if a maintainer isn't responding in
> a reasonable time period (say 2 weeks), other committers can merge the
> patch, and the PMC will want to discuss adding another maintainer.
> >
> > If you'd like to see examples for this model, check out the following
> projects:
> > - CloudStack:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> <
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >
> > - Subversion:
> https://subversion.apache.org/docs/community-guide/roles.html <
> https://subversion.apache.org/docs/community-guide/roles.html>
> >
> > Finally, I wanted to list our current proposal for initial components
> and maintainers. It would be good to get feedback on other components we
> might add, but please note that personnel discussions (e.g. "I don't think
> Matei should maintain *that* component) should only happen on the private
> list. The initial components were chosen to include all public APIs and the
> main core components, and the maintainers were chosen from the most active
> contributors to those modules.
> >
> > - Spark core public API: Matei, Patrick, Reynold
> > - Job scheduler: Matei, Kay, Patrick
> > - Shuffle and network: Reynold, Aaron, Matei
> > - Block manager: Reynold, Aaron
> > - YARN: Tom, Andrew Or
> > - Python: Josh, Matei
> > - MLlib: Xiangrui, Matei
> > - SQL: Michael, Reynold
> > - Streaming: TD, Matei
> > - GraphX: Ankur, Joey, Reynold
> >
> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >
> > Matei
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Designating maintainers for some Spark components

Posted by Sean McNamara <Se...@Webtrends.com>.

+1

Sean

On Nov 5, 2014, at 6:32 PM, Matei Zaharia <ma...@gmail.com> wrote:

> Hi all,
> 
> I wanted to share a discussion we've been having on the PMC list, as well as call for an official vote on it on a public list. Basically, as the Spark project scales up, we need to define a model to make sure there is still great oversight of key components (in particular internal architecture and public APIs), and to this end I've proposed implementing a maintainer model for some of these components, similar to other large projects.
> 
> As background on this, Spark has grown a lot since joining Apache. We've had over 80 contributors/month for the past 3 months, which I believe makes us the most active project in contributors/month at Apache, as well as over 500 patches/month. The codebase has also grown significantly, with new libraries for SQL, ML, graphs and more.
> 
> In this kind of large project, one common way to scale development is to assign "maintainers" to oversee key components, where each patch to that component needs to get sign-off from at least one of its maintainers. Most existing large projects do this -- at Apache, some large ones with this model are CloudStack (the second-most active project overall), Subversion, and Kafka, and other examples include Linux and Python. This is also by-and-large how Spark operates today -- most components have a de-facto maintainer.
> 
> IMO, adopting this model would have two benefits:
> 
> 1) Consistent oversight of design for that component, especially regarding architecture and API. This process would ensure that the component's maintainers see all proposed changes and consider them to fit together in a good way.
> 
> 2) More structure for new contributors and committers -- in particular, it would be easy to look up who’s responsible for each module and ask them for reviews, etc, rather than having patches slip between the cracks.
> 
> We'd like to start with in a light-weight manner, where the model only applies to certain key components (e.g. scheduler, shuffle) and user-facing APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it if we deem it useful. The specific mechanics would be as follows:
> 
> - Some components in Spark will have maintainers assigned to them, where one of the maintainers needs to sign off on each patch to the component.
> - Each component with maintainers will have at least 2 maintainers.
> - Maintainers will be assigned from the most active and knowledgeable committers on that component by the PMC. The PMC can vote to add / remove maintainers, and maintained components, through consensus.
> - Maintainers are expected to be active in responding to patches for their components, though they do not need to be the main reviewers for them (e.g. they might just sign off on architecture / API). To prevent inactive maintainers from blocking the project, if a maintainer isn't responding in a reasonable time period (say 2 weeks), other committers can merge the patch, and the PMC will want to discuss adding another maintainer.
> 
> If you'd like to see examples for this model, check out the following projects:
> - CloudStack: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> 
> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html <https://subversion.apache.org/docs/community-guide/roles.html>
> 
> Finally, I wanted to list our current proposal for initial components and maintainers. It would be good to get feedback on other components we might add, but please note that personnel discussions (e.g. "I don't think Matei should maintain *that* component) should only happen on the private list. The initial components were chosen to include all public APIs and the main core components, and the maintainers were chosen from the most active contributors to those modules.
> 
> - Spark core public API: Matei, Patrick, Reynold
> - Job scheduler: Matei, Kay, Patrick
> - Shuffle and network: Reynold, Aaron, Matei
> - Block manager: Reynold, Aaron
> - YARN: Tom, Andrew Or
> - Python: Josh, Matei
> - MLlib: Xiangrui, Matei
> - SQL: Michael, Reynold
> - Streaming: TD, Matei
> - GraphX: Ankur, Joey, Reynold
> 
> I'd like to formally call a [VOTE] on this model, to last 72 hours. The [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> 
> Matei


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org