You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Jay Kreps <ja...@gmail.com> on 2015/07/13 07:37:55 UTC

Question about sub-projects and project merging

Hey board members,

There is a longish thread on the Apache Samza mailing list on the
relationship between Kafka and Samza and whether they wouldn't make a lot
more sense as a single project. This raised some questions I was hoping to
get advice on.

Discussion thread (warning: super long, I attempt to summarize relevant
bits below):
http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABYbY7d_-JCXj7FizSjuEbJEDgbeP33FLyx3NRoZt0yeox9JsQ@mail.gmail.com%3E

Anyhow, some people thought "Apache has lot's of sub-projects, that would
be a graceful way to step in the right direction". At that point others
popped up and said, "sub-projects are discouraged by the board".

I'm not sure if we understand technically what a subproject is, but I think
it means a second repo/committership under the same PMC.

A few questions:
- Is that what a sub-project is?
- Are they discouraged? If so, why?
- Assuming it makes sense in this case what is the process for making one?
- Putting aside sub-projects as a mechanism what are examples where
communities merged successfully? We were pointed towards Lucene/SOLR. Are
there others?

Relevant background info:
- Samza depends on Kafka, but not vice versa
- There is some overlap in committers but not extensive (3/11 Samza
committers are also Kafka committers)

Thanks for the advice!

-Jay

Re: Question about sub-projects and project merging

Posted by Hervé Boutemy <hb...@apache.org>.
some remarks on "what a sub-project is?" taken from my experience working on 
this exact topic for https://projects.apache.org/

first: see facts at https://projects.apache.org/projects.html?pmc for a 
complete list of projects (as documented by PMCs, then there are a lot of 
software that is not described) grouped by PMCs.

I came to the conclusion that this is a question of semantic around "project" 
term, with 2 competing visions at ASF:
- either you talk of TLPs + sub-projects
- or you talk about committees + projects

After trying both visions for https://projects.apache.org/ , which started on 
the "TLP + sub-projects" vision because TLP is pretty much used by all of us, 
I finally preferred "committees + projects" since it avoided the question of 
classifying projects in Top Level Projects and sub-projects, with the bad 
impression it puts on "sub"-ones, and the fact that in some committees, there 
is no project that is more "top" or "sub": see Commons or Logging.
But for some committees, there is really a main project and other projects are 
more like extensions or plugin: see Ant or Velocity

IMHO, talking about committees and projects is the best way to avoid bad 
passion that comes from "TLPs + sub-projects" vision.

With that terms, your question of "merging 2 TLPs" becomes "merging 2 
committees, ie their communities, and putting 2 projects under the management 
of this merged committee": IMHO, the description is more verbose but the 
debate is less passionated and focused on the main question = is this really 
the same community, then that should be managed by one committee only?


I don't have any opinion on Kafka and Samza case: I just hope these 
explanations will help for the discussion.

Regards,

Hervé

Le dimanche 12 juillet 2015 22:37:55 Jay Kreps a écrit :
> Hey board members,
> 
> There is a longish thread on the Apache Samza mailing list on the
> relationship between Kafka and Samza and whether they wouldn't make a lot
> more sense as a single project. This raised some questions I was hoping to
> get advice on.
> 
> Discussion thread (warning: super long, I attempt to summarize relevant
> bits below):
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABYbY7d_-> JCXj7FizSjuEbJEDgbeP33FLyx3NRoZt0yeox9JsQ@mail.gmail.com%3E
> 
> Anyhow, some people thought "Apache has lot's of sub-projects, that would
> be a graceful way to step in the right direction". At that point others
> popped up and said, "sub-projects are discouraged by the board".
> 
> I'm not sure if we understand technically what a subproject is, but I think
> it means a second repo/committership under the same PMC.
> 
> A few questions:
> - Is that what a sub-project is?
> - Are they discouraged? If so, why?
> - Assuming it makes sense in this case what is the process for making one?
> - Putting aside sub-projects as a mechanism what are examples where
> communities merged successfully? We were pointed towards Lucene/SOLR. Are
> there others?
> 
> Relevant background info:
> - Samza depends on Kafka, but not vice versa
> - There is some overlap in committers but not extensive (3/11 Samza
> committers are also Kafka committers)
> 
> Thanks for the advice!
> 
> -Jay


Re: Question about sub-projects and project merging

Posted by Greg Stein <gs...@gmail.com>.
Hi Jay,

Looking at your question, I see the Apache Samza and Apache Kafka
*communities* have little overlap(*). The Board looks at communities, and
their overlap or lack thereof. Smushing two communities under one TLP is
what we have historically called an "umbrella" TLP, and discourage.
Communities should be allowed to operate independently.

If you have *one* community, then one TLP makes sense.

If you have *two* communities, then increase the overlap. When they look
like one community, and that one community votes to merge TLPs ... then ask
for that.

Cheers,
-g

(*) 2 common PMC members, 3 common committers.


On Mon, Jul 13, 2015 at 12:37 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey board members,
>
> There is a longish thread on the Apache Samza mailing list on the
> relationship between Kafka and Samza and whether they wouldn't make a lot
> more sense as a single project. This raised some questions I was hoping to
> get advice on.
>
> Discussion thread (warning: super long, I attempt to summarize relevant
> bits below):
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABYbY7d_-JCXj7FizSjuEbJEDgbeP33FLyx3NRoZt0yeox9JsQ@mail.gmail.com%3E
>
> Anyhow, some people thought "Apache has lot's of sub-projects, that would
> be a graceful way to step in the right direction". At that point others
> popped up and said, "sub-projects are discouraged by the board".
>
> I'm not sure if we understand technically what a subproject is, but I
> think it means a second repo/committership under the same PMC.
>
> A few questions:
> - Is that what a sub-project is?
> - Are they discouraged? If so, why?
> - Assuming it makes sense in this case what is the process for making one?
> - Putting aside sub-projects as a mechanism what are examples where
> communities merged successfully? We were pointed towards Lucene/SOLR. Are
> there others?
>
> Relevant background info:
> - Samza depends on Kafka, but not vice versa
> - There is some overlap in committers but not extensive (3/11 Samza
> committers are also Kafka committers)
>
> Thanks for the advice!
>
> -Jay
>
>
>
>

Re: Question about sub-projects and project merging

Posted by Niclas Hedhman <ni...@hedhman.org>.
>From peanut gallery;

  a. It looks to me that there is no overwhelming reason to merge the
communities. In fact, IF it already was a single community, it might be
time to split Samza out. Ask this question; If the active Samza devs lay
down their tools, how many Kafka devs would care about (and further the dev
of) Samza?

  b. Having "hard dependency" on another upstream project is common place
in ASF. Take a look at the Hadoop echo system for many examples.

  c. To me, it sounds more like a technical issue of design, where Samza is
more flexible than needed, perhaps because the original intent was to allow
integration with more messaging systems than Kafka. Redesigning seems to be
a driver, and that doesn't need to lead to merging the communities.

  d. Is there actually other underlying community issue? I haven't seen any
worrying signs from Board reports, but I am asking anyway... These kind of
questions often surface when the most active members of the community feel
somewhat burned out and looking for other active devs to help out.


Cheers
Niclas

On Mon, Jul 13, 2015 at 8:37 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey board members,
>
> There is a longish thread on the Apache Samza mailing list on the
> relationship between Kafka and Samza and whether they wouldn't make a lot
> more sense as a single project. This raised some questions I was hoping to
> get advice on.
>
> Discussion thread (warning: super long, I attempt to summarize relevant
> bits below):
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABYbY7d_-JCXj7FizSjuEbJEDgbeP33FLyx3NRoZt0yeox9JsQ@mail.gmail.com%3E
>
> Anyhow, some people thought "Apache has lot's of sub-projects, that would
> be a graceful way to step in the right direction". At that point others
> popped up and said, "sub-projects are discouraged by the board".
>
> I'm not sure if we understand technically what a subproject is, but I
> think it means a second repo/committership under the same PMC.
>
> A few questions:
> - Is that what a sub-project is?
> - Are they discouraged? If so, why?
> - Assuming it makes sense in this case what is the process for making one?
> - Putting aside sub-projects as a mechanism what are examples where
> communities merged successfully? We were pointed towards Lucene/SOLR. Are
> there others?
>
> Relevant background info:
> - Samza depends on Kafka, but not vice versa
> - There is some overlap in committers but not extensive (3/11 Samza
> committers are also Kafka committers)
>
> Thanks for the advice!
>
> -Jay
>
>
>
>


-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Re: Question about sub-projects and project merging

Posted by Greg Stein <gs...@gmail.com>.
Hi Jay,

Looking at your question, I see the Apache Samza and Apache Kafka
*communities* have little overlap(*). The Board looks at communities, and
their overlap or lack thereof. Smushing two communities under one TLP is
what we have historically called an "umbrella" TLP, and discourage.
Communities should be allowed to operate independently.

If you have *one* community, then one TLP makes sense.

If you have *two* communities, then increase the overlap. When they look
like one community, and that one community votes to merge TLPs ... then ask
for that.

Cheers,
-g

(*) 2 common PMC members, 3 common committers.


On Mon, Jul 13, 2015 at 12:37 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey board members,
>
> There is a longish thread on the Apache Samza mailing list on the
> relationship between Kafka and Samza and whether they wouldn't make a lot
> more sense as a single project. This raised some questions I was hoping to
> get advice on.
>
> Discussion thread (warning: super long, I attempt to summarize relevant
> bits below):
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABYbY7d_-JCXj7FizSjuEbJEDgbeP33FLyx3NRoZt0yeox9JsQ@mail.gmail.com%3E
>
> Anyhow, some people thought "Apache has lot's of sub-projects, that would
> be a graceful way to step in the right direction". At that point others
> popped up and said, "sub-projects are discouraged by the board".
>
> I'm not sure if we understand technically what a subproject is, but I
> think it means a second repo/committership under the same PMC.
>
> A few questions:
> - Is that what a sub-project is?
> - Are they discouraged? If so, why?
> - Assuming it makes sense in this case what is the process for making one?
> - Putting aside sub-projects as a mechanism what are examples where
> communities merged successfully? We were pointed towards Lucene/SOLR. Are
> there others?
>
> Relevant background info:
> - Samza depends on Kafka, but not vice versa
> - There is some overlap in committers but not extensive (3/11 Samza
> committers are also Kafka committers)
>
> Thanks for the advice!
>
> -Jay
>
>
>
>

Re: Question about sub-projects and project merging

Posted by Jay Kreps <ja...@confluent.io>.
Hey Mike,

Thanks for sharing, it is helpful to hear the experience that leads to
these recommendations.

-Jay

On Mon, Jul 13, 2015 at 11:01 AM, Mike Kienenberger <mk...@gmail.com>
wrote:

> A subproject is one of many projects that fall under the same umbrella
> project management committee (PMC).   It doesn't have to be a separate
> repo, but it generally has a separate community or a subset of the
> full community.
>
> Speaking as a long-time PMC member for MyFaces, our problem with
> subprojects (we have 11!) is that it's hard to keep accountability and
> monitor community health.
>
> A subproject starts of being active with some subset of the community,
> but then reduces activity at some future point.   Those who aren't
> directly involved with the subproject tend not to notice that the
> particular subproject has fallen to unhealthy levels.   Generally, you
> don't realize something is wrong until after all of the developers
> have left when you suddenly realize that there's no one answering
> questions, applying patches, or familiar with the code base.
>
> Non-umbrella projects report to the board are expected to evaluate
> community health each quarter.   Umbrella projects are also supposed
> to do this, but often fail to realize that community health has to be
> individually evaluated for each subproject each quarter.   The PMC
> chair is likely not directly involved with each subproject, and may
> not be in a good position to evaluate the sub-project's health.  As
> Hervé mentions, this is particularly true for TLPs which have a main
> project and "optional" modules where everyone cares about the main
> project and only a few care about each module subproject.   This is
> what happened with MyFaces.
>
> What tends to happen with umbrella projects is that you end up
> creating two-tier project management.  Those responsible to the board
> are "upper management" but may not be directly involved and fail to
> understand the subproject community health.  Those who are supposed to
> actively manage the project are "lower management" and are not
> directly responsible to the board for quarterly reports.
>
> Best practice is to have a one-tier PMC.  As soon as a subproject is
> healthy enough to stand on its own, it probably should go TLP.
> MyFaces successfully spun off DeltaSpike, and DeltaSpike remains
> healthy.  The other alternative is to be certain to address the status
> of each subproject in the board report, much like the Incubator board
> report does each time.
>
> My advice is the same as others -- keep the two projects separate, but
> encourage individual Samza committers join as Kafka committers if they
> feel the need to do so.
>
> On Mon, Jul 13, 2015 at 1:37 AM, Jay Kreps <ja...@gmail.com> wrote:
> > Hey board members,
> >
> > There is a longish thread on the Apache Samza mailing list on the
> > relationship between Kafka and Samza and whether they wouldn't make a lot
> > more sense as a single project. This raised some questions I was hoping
> to
> > get advice on.
> >
> > Discussion thread (warning: super long, I attempt to summarize relevant
> bits
> > below):
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABYbY7d_-JCXj7FizSjuEbJEDgbeP33FLyx3NRoZt0yeox9JsQ@mail.gmail.com%3E
> >
> > Anyhow, some people thought "Apache has lot's of sub-projects, that
> would be
> > a graceful way to step in the right direction". At that point others
> popped
> > up and said, "sub-projects are discouraged by the board".
> >
> > I'm not sure if we understand technically what a subproject is, but I
> think
> > it means a second repo/committership under the same PMC.
> >
> > A few questions:
> > - Is that what a sub-project is?
> > - Are they discouraged? If so, why?
> > - Assuming it makes sense in this case what is the process for making
> one?
> > - Putting aside sub-projects as a mechanism what are examples where
> > communities merged successfully? We were pointed towards Lucene/SOLR. Are
> > there others?
> >
> > Relevant background info:
> > - Samza depends on Kafka, but not vice versa
> > - There is some overlap in committers but not extensive (3/11 Samza
> > committers are also Kafka committers)
> >
> > Thanks for the advice!
> >
> > -Jay
> >
> >
> >
>

Re: Question about sub-projects and project merging

Posted by Mike Kienenberger <mk...@gmail.com>.
A subproject is one of many projects that fall under the same umbrella
project management committee (PMC).   It doesn't have to be a separate
repo, but it generally has a separate community or a subset of the
full community.

Speaking as a long-time PMC member for MyFaces, our problem with
subprojects (we have 11!) is that it's hard to keep accountability and
monitor community health.

A subproject starts of being active with some subset of the community,
but then reduces activity at some future point.   Those who aren't
directly involved with the subproject tend not to notice that the
particular subproject has fallen to unhealthy levels.   Generally, you
don't realize something is wrong until after all of the developers
have left when you suddenly realize that there's no one answering
questions, applying patches, or familiar with the code base.

Non-umbrella projects report to the board are expected to evaluate
community health each quarter.   Umbrella projects are also supposed
to do this, but often fail to realize that community health has to be
individually evaluated for each subproject each quarter.   The PMC
chair is likely not directly involved with each subproject, and may
not be in a good position to evaluate the sub-project's health.  As
Hervé mentions, this is particularly true for TLPs which have a main
project and "optional" modules where everyone cares about the main
project and only a few care about each module subproject.   This is
what happened with MyFaces.

What tends to happen with umbrella projects is that you end up
creating two-tier project management.  Those responsible to the board
are "upper management" but may not be directly involved and fail to
understand the subproject community health.  Those who are supposed to
actively manage the project are "lower management" and are not
directly responsible to the board for quarterly reports.

Best practice is to have a one-tier PMC.  As soon as a subproject is
healthy enough to stand on its own, it probably should go TLP.
MyFaces successfully spun off DeltaSpike, and DeltaSpike remains
healthy.  The other alternative is to be certain to address the status
of each subproject in the board report, much like the Incubator board
report does each time.

My advice is the same as others -- keep the two projects separate, but
encourage individual Samza committers join as Kafka committers if they
feel the need to do so.

On Mon, Jul 13, 2015 at 1:37 AM, Jay Kreps <ja...@gmail.com> wrote:
> Hey board members,
>
> There is a longish thread on the Apache Samza mailing list on the
> relationship between Kafka and Samza and whether they wouldn't make a lot
> more sense as a single project. This raised some questions I was hoping to
> get advice on.
>
> Discussion thread (warning: super long, I attempt to summarize relevant bits
> below):
> http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABYbY7d_-JCXj7FizSjuEbJEDgbeP33FLyx3NRoZt0yeox9JsQ@mail.gmail.com%3E
>
> Anyhow, some people thought "Apache has lot's of sub-projects, that would be
> a graceful way to step in the right direction". At that point others popped
> up and said, "sub-projects are discouraged by the board".
>
> I'm not sure if we understand technically what a subproject is, but I think
> it means a second repo/committership under the same PMC.
>
> A few questions:
> - Is that what a sub-project is?
> - Are they discouraged? If so, why?
> - Assuming it makes sense in this case what is the process for making one?
> - Putting aside sub-projects as a mechanism what are examples where
> communities merged successfully? We were pointed towards Lucene/SOLR. Are
> there others?
>
> Relevant background info:
> - Samza depends on Kafka, but not vice versa
> - There is some overlap in committers but not extensive (3/11 Samza
> committers are also Kafka committers)
>
> Thanks for the advice!
>
> -Jay
>
>
>