You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Andrew Wang <an...@cloudera.com> on 2016/06/10 06:12:50 UTC

[DISCUSS] Increased use of feature branches

Hi all,

On a separate thread, a question was raised about 3.x branching and use of
feature branches going forward.

We discussed this previously on the "Looking to a Hadoop 3 release" thread
that has spanned the years, with Vinod making this proposal (building on
ideas from others who also commented in the email thread):

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser

Pasting here for ease:

On an unrelated note, offline I was pitching to a bunch of
contributors another idea to deal
with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.

What this gains us is that
 - Trunk is always nearly stable or nearly ready for releases
 - We no longer have some code lying around in some branch (today’s
trunk) that is not releasable
because it gets mixed with other undesirable and incompatible changes.
 - This needs to be coupled with more discipline on individual
features - medium to to large
features are always worked upon in branches and get merged into trunk
(and a nearing release!)
when they are ready
 - All incompatible changes go into some sort of a trunk-incompat
branch and stay there till
we accumulate enough of those to warrant another major release.

Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
there's no need for this branch yet. This aspect of Vinod's proposal was
still under a bit of discussion; Chris Douglas though we should cut a
branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
This point doesn't necessarily need to be resolved now though, since again
we're still doing alphas.

What we should get consensus on is the goal of keeping trunk stable, and
achieving that by doing more development on feature branches and being
judicious about merges. My sense from the Hadoop 3 email thread (and the
more recent one on the async API) is that people are generally in favor of
this.

We're just about ready to do the first 3.0.0 alpha, so would greatly
appreciate everyone's timely response in this matter.

Thanks,
Andrew

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
>
> I agree with the concerns you raise around feature rot. For a feature like
>> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
>> be impossible. I imagine we'd also need a very motivated maintainer (or
>> maintainers) to handle the periodic integration of new trunk commits, since
>> you'd potentially be doing it for multiple large features. If some brave
>> and experienced committer is willing to own maintenance of the
>> trunk-incompat branch, I think it could work. However, this is a big shift
>> from how we've historically done development.
>>
>
> If an incompatible feature is ready (like EC here), should we consider
> working towards the next major release? In other words, is it okay to defer
> cutting branch-3 until we have a large incompatible feature that would be a
> pain to keep up with?
>

So the idea is that we do trunk-incompat, then when the first large
incompat feature hits, we switch to branch-3? I guess this might work,
though it still requires someone to maintain trunk-incompat.

I think it'd also be hard to make this decision, since EC for instance at
one point was targeted for 2.x.

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
>
> I agree with the concerns you raise around feature rot. For a feature like
>> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
>> be impossible. I imagine we'd also need a very motivated maintainer (or
>> maintainers) to handle the periodic integration of new trunk commits, since
>> you'd potentially be doing it for multiple large features. If some brave
>> and experienced committer is willing to own maintenance of the
>> trunk-incompat branch, I think it could work. However, this is a big shift
>> from how we've historically done development.
>>
>
> If an incompatible feature is ready (like EC here), should we consider
> working towards the next major release? In other words, is it okay to defer
> cutting branch-3 until we have a large incompatible feature that would be a
> pain to keep up with?
>

So the idea is that we do trunk-incompat, then when the first large
incompat feature hits, we switch to branch-3? I guess this might work,
though it still requires someone to maintain trunk-incompat.

I think it'd also be hard to make this decision, since EC for instance at
one point was targeted for 2.x.

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
>
> I agree with the concerns you raise around feature rot. For a feature like
>> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
>> be impossible. I imagine we'd also need a very motivated maintainer (or
>> maintainers) to handle the periodic integration of new trunk commits, since
>> you'd potentially be doing it for multiple large features. If some brave
>> and experienced committer is willing to own maintenance of the
>> trunk-incompat branch, I think it could work. However, this is a big shift
>> from how we've historically done development.
>>
>
> If an incompatible feature is ready (like EC here), should we consider
> working towards the next major release? In other words, is it okay to defer
> cutting branch-3 until we have a large incompatible feature that would be a
> pain to keep up with?
>

So the idea is that we do trunk-incompat, then when the first large
incompat feature hits, we switch to branch-3? I guess this might work,
though it still requires someone to maintain trunk-incompat.

I think it'd also be hard to make this decision, since EC for instance at
one point was targeted for 2.x.

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
>
> I agree with the concerns you raise around feature rot. For a feature like
>> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
>> be impossible. I imagine we'd also need a very motivated maintainer (or
>> maintainers) to handle the periodic integration of new trunk commits, since
>> you'd potentially be doing it for multiple large features. If some brave
>> and experienced committer is willing to own maintenance of the
>> trunk-incompat branch, I think it could work. However, this is a big shift
>> from how we've historically done development.
>>
>
> If an incompatible feature is ready (like EC here), should we consider
> working towards the next major release? In other words, is it okay to defer
> cutting branch-3 until we have a large incompatible feature that would be a
> pain to keep up with?
>

So the idea is that we do trunk-incompat, then when the first large
incompat feature hits, we switch to branch-3? I guess this might work,
though it still requires someone to maintain trunk-incompat.

I think it'd also be hard to make this decision, since EC for instance at
one point was targeted for 2.x.

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for clarifying Andrew. Inline.

On Mon, Jun 13, 2016 at 3:59 PM, Andrew Wang <an...@cloudera.com>
wrote:

>
> On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
> wrote:
>
>> I would like to understand the trunk-incompat part of the proposal a
>> little better.
>>
>> Is trunk-incompat always going to be a superset of trunk? If yes, is it
>> just a change in naming convention with a hope that our approach to trunk
>> stability changes as Sangjin mentioned?
>>
>> Or, is it okay for trunk-incompat to be based off of an older commit in
>> trunk with (in)frequent rebases? This has the risk of incompatible changes
>> truly rotting. Periodic rebases will ensure these changes don't rot while
>> also easing the burden of hosting two branches; if we choose this route,
>> some guidance of the period and who rebases will be nice.
>>
>
> Based on my understanding from Vinod on the previous "Looking to..."
> thread, it would be the latter. The goal of trunk-incompat was to avoid
> adding yet-another-branch we need to commit to every time, compared to the
> branch-3 proposal.
>
> I agree with the concerns you raise around feature rot. For a feature like
> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
> be impossible. I imagine we'd also need a very motivated maintainer (or
> maintainers) to handle the periodic integration of new trunk commits, since
> you'd potentially be doing it for multiple large features. If some brave
> and experienced committer is willing to own maintenance of the
> trunk-incompat branch, I think it could work. However, this is a big shift
> from how we've historically done development.
>

If an incompatible feature is ready (like EC here), should we consider
working towards the next major release? In other words, is it okay to defer
cutting branch-3 until we have a large incompatible feature that would be a
pain to keep up with?


>
> This is why I leaned toward Chris D's proposal, which is that we cut
> branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
> this is the "default" proposal, since it's how we've previously done
> things, with the slight adjustment that we defer cutting branch-3 until we
> start enforcing compatibility. This is my current plan for the Hadoop 3
> series, and we already had a lot of +1's about releasing from trunk on the
> previous thread.
>

I guess this makes sense.


>
> If there's a strong advocate for trunk-incompat over branch-3, let's have
> that discussion. However, given that beta is still months (and multiple
> releases) away, I don't think this decision affects my near-term goal of
> getting 3.0.0-alpha1 released.
>
> Thanks,
> Andrew
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for clarifying Andrew. Inline.

On Mon, Jun 13, 2016 at 3:59 PM, Andrew Wang <an...@cloudera.com>
wrote:

>
> On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
> wrote:
>
>> I would like to understand the trunk-incompat part of the proposal a
>> little better.
>>
>> Is trunk-incompat always going to be a superset of trunk? If yes, is it
>> just a change in naming convention with a hope that our approach to trunk
>> stability changes as Sangjin mentioned?
>>
>> Or, is it okay for trunk-incompat to be based off of an older commit in
>> trunk with (in)frequent rebases? This has the risk of incompatible changes
>> truly rotting. Periodic rebases will ensure these changes don't rot while
>> also easing the burden of hosting two branches; if we choose this route,
>> some guidance of the period and who rebases will be nice.
>>
>
> Based on my understanding from Vinod on the previous "Looking to..."
> thread, it would be the latter. The goal of trunk-incompat was to avoid
> adding yet-another-branch we need to commit to every time, compared to the
> branch-3 proposal.
>
> I agree with the concerns you raise around feature rot. For a feature like
> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
> be impossible. I imagine we'd also need a very motivated maintainer (or
> maintainers) to handle the periodic integration of new trunk commits, since
> you'd potentially be doing it for multiple large features. If some brave
> and experienced committer is willing to own maintenance of the
> trunk-incompat branch, I think it could work. However, this is a big shift
> from how we've historically done development.
>

If an incompatible feature is ready (like EC here), should we consider
working towards the next major release? In other words, is it okay to defer
cutting branch-3 until we have a large incompatible feature that would be a
pain to keep up with?


>
> This is why I leaned toward Chris D's proposal, which is that we cut
> branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
> this is the "default" proposal, since it's how we've previously done
> things, with the slight adjustment that we defer cutting branch-3 until we
> start enforcing compatibility. This is my current plan for the Hadoop 3
> series, and we already had a lot of +1's about releasing from trunk on the
> previous thread.
>

I guess this makes sense.


>
> If there's a strong advocate for trunk-incompat over branch-3, let's have
> that discussion. However, given that beta is still months (and multiple
> releases) away, I don't think this decision affects my near-term goal of
> getting 3.0.0-alpha1 released.
>
> Thanks,
> Andrew
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for clarifying Andrew. Inline.

On Mon, Jun 13, 2016 at 3:59 PM, Andrew Wang <an...@cloudera.com>
wrote:

>
> On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
> wrote:
>
>> I would like to understand the trunk-incompat part of the proposal a
>> little better.
>>
>> Is trunk-incompat always going to be a superset of trunk? If yes, is it
>> just a change in naming convention with a hope that our approach to trunk
>> stability changes as Sangjin mentioned?
>>
>> Or, is it okay for trunk-incompat to be based off of an older commit in
>> trunk with (in)frequent rebases? This has the risk of incompatible changes
>> truly rotting. Periodic rebases will ensure these changes don't rot while
>> also easing the burden of hosting two branches; if we choose this route,
>> some guidance of the period and who rebases will be nice.
>>
>
> Based on my understanding from Vinod on the previous "Looking to..."
> thread, it would be the latter. The goal of trunk-incompat was to avoid
> adding yet-another-branch we need to commit to every time, compared to the
> branch-3 proposal.
>
> I agree with the concerns you raise around feature rot. For a feature like
> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
> be impossible. I imagine we'd also need a very motivated maintainer (or
> maintainers) to handle the periodic integration of new trunk commits, since
> you'd potentially be doing it for multiple large features. If some brave
> and experienced committer is willing to own maintenance of the
> trunk-incompat branch, I think it could work. However, this is a big shift
> from how we've historically done development.
>

If an incompatible feature is ready (like EC here), should we consider
working towards the next major release? In other words, is it okay to defer
cutting branch-3 until we have a large incompatible feature that would be a
pain to keep up with?


>
> This is why I leaned toward Chris D's proposal, which is that we cut
> branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
> this is the "default" proposal, since it's how we've previously done
> things, with the slight adjustment that we defer cutting branch-3 until we
> start enforcing compatibility. This is my current plan for the Hadoop 3
> series, and we already had a lot of +1's about releasing from trunk on the
> previous thread.
>

I guess this makes sense.


>
> If there's a strong advocate for trunk-incompat over branch-3, let's have
> that discussion. However, given that beta is still months (and multiple
> releases) away, I don't think this decision affects my near-term goal of
> getting 3.0.0-alpha1 released.
>
> Thanks,
> Andrew
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for clarifying Andrew. Inline.

On Mon, Jun 13, 2016 at 3:59 PM, Andrew Wang <an...@cloudera.com>
wrote:

>
> On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
> wrote:
>
>> I would like to understand the trunk-incompat part of the proposal a
>> little better.
>>
>> Is trunk-incompat always going to be a superset of trunk? If yes, is it
>> just a change in naming convention with a hope that our approach to trunk
>> stability changes as Sangjin mentioned?
>>
>> Or, is it okay for trunk-incompat to be based off of an older commit in
>> trunk with (in)frequent rebases? This has the risk of incompatible changes
>> truly rotting. Periodic rebases will ensure these changes don't rot while
>> also easing the burden of hosting two branches; if we choose this route,
>> some guidance of the period and who rebases will be nice.
>>
>
> Based on my understanding from Vinod on the previous "Looking to..."
> thread, it would be the latter. The goal of trunk-incompat was to avoid
> adding yet-another-branch we need to commit to every time, compared to the
> branch-3 proposal.
>
> I agree with the concerns you raise around feature rot. For a feature like
> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
> be impossible. I imagine we'd also need a very motivated maintainer (or
> maintainers) to handle the periodic integration of new trunk commits, since
> you'd potentially be doing it for multiple large features. If some brave
> and experienced committer is willing to own maintenance of the
> trunk-incompat branch, I think it could work. However, this is a big shift
> from how we've historically done development.
>

If an incompatible feature is ready (like EC here), should we consider
working towards the next major release? In other words, is it okay to defer
cutting branch-3 until we have a large incompatible feature that would be a
pain to keep up with?


>
> This is why I leaned toward Chris D's proposal, which is that we cut
> branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
> this is the "default" proposal, since it's how we've previously done
> things, with the slight adjustment that we defer cutting branch-3 until we
> start enforcing compatibility. This is my current plan for the Hadoop 3
> series, and we already had a lot of +1's about releasing from trunk on the
> previous thread.
>

I guess this makes sense.


>
> If there's a strong advocate for trunk-incompat over branch-3, let's have
> that discussion. However, given that beta is still months (and multiple
> releases) away, I don't think this decision affects my near-term goal of
> getting 3.0.0-alpha1 released.
>
> Thanks,
> Andrew
>

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
wrote:

> I would like to understand the trunk-incompat part of the proposal a
> little better.
>
> Is trunk-incompat always going to be a superset of trunk? If yes, is it
> just a change in naming convention with a hope that our approach to trunk
> stability changes as Sangjin mentioned?
>
> Or, is it okay for trunk-incompat to be based off of an older commit in
> trunk with (in)frequent rebases? This has the risk of incompatible changes
> truly rotting. Periodic rebases will ensure these changes don't rot while
> also easing the burden of hosting two branches; if we choose this route,
> some guidance of the period and who rebases will be nice.
>

Based on my understanding from Vinod on the previous "Looking to..."
thread, it would be the latter. The goal of trunk-incompat was to avoid
adding yet-another-branch we need to commit to every time, compared to the
branch-3 proposal.

I agree with the concerns you raise around feature rot. For a feature like
EC, it'd be untenable to leave it in trunk-incompat since the rebases would
be impossible. I imagine we'd also need a very motivated maintainer (or
maintainers) to handle the periodic integration of new trunk commits, since
you'd potentially be doing it for multiple large features. If some brave
and experienced committer is willing to own maintenance of the
trunk-incompat branch, I think it could work. However, this is a big shift
from how we've historically done development.

This is why I leaned toward Chris D's proposal, which is that we cut
branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
this is the "default" proposal, since it's how we've previously done
things, with the slight adjustment that we defer cutting branch-3 until we
start enforcing compatibility. This is my current plan for the Hadoop 3
series, and we already had a lot of +1's about releasing from trunk on the
previous thread.

If there's a strong advocate for trunk-incompat over branch-3, let's have
that discussion. However, given that beta is still months (and multiple
releases) away, I don't think this decision affects my near-term goal of
getting 3.0.0-alpha1 released.

Thanks,
Andrew

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
wrote:

> I would like to understand the trunk-incompat part of the proposal a
> little better.
>
> Is trunk-incompat always going to be a superset of trunk? If yes, is it
> just a change in naming convention with a hope that our approach to trunk
> stability changes as Sangjin mentioned?
>
> Or, is it okay for trunk-incompat to be based off of an older commit in
> trunk with (in)frequent rebases? This has the risk of incompatible changes
> truly rotting. Periodic rebases will ensure these changes don't rot while
> also easing the burden of hosting two branches; if we choose this route,
> some guidance of the period and who rebases will be nice.
>

Based on my understanding from Vinod on the previous "Looking to..."
thread, it would be the latter. The goal of trunk-incompat was to avoid
adding yet-another-branch we need to commit to every time, compared to the
branch-3 proposal.

I agree with the concerns you raise around feature rot. For a feature like
EC, it'd be untenable to leave it in trunk-incompat since the rebases would
be impossible. I imagine we'd also need a very motivated maintainer (or
maintainers) to handle the periodic integration of new trunk commits, since
you'd potentially be doing it for multiple large features. If some brave
and experienced committer is willing to own maintenance of the
trunk-incompat branch, I think it could work. However, this is a big shift
from how we've historically done development.

This is why I leaned toward Chris D's proposal, which is that we cut
branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
this is the "default" proposal, since it's how we've previously done
things, with the slight adjustment that we defer cutting branch-3 until we
start enforcing compatibility. This is my current plan for the Hadoop 3
series, and we already had a lot of +1's about releasing from trunk on the
previous thread.

If there's a strong advocate for trunk-incompat over branch-3, let's have
that discussion. However, given that beta is still months (and multiple
releases) away, I don't think this decision affects my near-term goal of
getting 3.0.0-alpha1 released.

Thanks,
Andrew

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
wrote:

> I would like to understand the trunk-incompat part of the proposal a
> little better.
>
> Is trunk-incompat always going to be a superset of trunk? If yes, is it
> just a change in naming convention with a hope that our approach to trunk
> stability changes as Sangjin mentioned?
>
> Or, is it okay for trunk-incompat to be based off of an older commit in
> trunk with (in)frequent rebases? This has the risk of incompatible changes
> truly rotting. Periodic rebases will ensure these changes don't rot while
> also easing the burden of hosting two branches; if we choose this route,
> some guidance of the period and who rebases will be nice.
>

Based on my understanding from Vinod on the previous "Looking to..."
thread, it would be the latter. The goal of trunk-incompat was to avoid
adding yet-another-branch we need to commit to every time, compared to the
branch-3 proposal.

I agree with the concerns you raise around feature rot. For a feature like
EC, it'd be untenable to leave it in trunk-incompat since the rebases would
be impossible. I imagine we'd also need a very motivated maintainer (or
maintainers) to handle the periodic integration of new trunk commits, since
you'd potentially be doing it for multiple large features. If some brave
and experienced committer is willing to own maintenance of the
trunk-incompat branch, I think it could work. However, this is a big shift
from how we've historically done development.

This is why I leaned toward Chris D's proposal, which is that we cut
branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
this is the "default" proposal, since it's how we've previously done
things, with the slight adjustment that we defer cutting branch-3 until we
start enforcing compatibility. This is my current plan for the Hadoop 3
series, and we already had a lot of +1's about releasing from trunk on the
previous thread.

If there's a strong advocate for trunk-incompat over branch-3, let's have
that discussion. However, given that beta is still months (and multiple
releases) away, I don't think this decision affects my near-term goal of
getting 3.0.0-alpha1 released.

Thanks,
Andrew

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
wrote:

> I would like to understand the trunk-incompat part of the proposal a
> little better.
>
> Is trunk-incompat always going to be a superset of trunk? If yes, is it
> just a change in naming convention with a hope that our approach to trunk
> stability changes as Sangjin mentioned?
>
> Or, is it okay for trunk-incompat to be based off of an older commit in
> trunk with (in)frequent rebases? This has the risk of incompatible changes
> truly rotting. Periodic rebases will ensure these changes don't rot while
> also easing the burden of hosting two branches; if we choose this route,
> some guidance of the period and who rebases will be nice.
>

Based on my understanding from Vinod on the previous "Looking to..."
thread, it would be the latter. The goal of trunk-incompat was to avoid
adding yet-another-branch we need to commit to every time, compared to the
branch-3 proposal.

I agree with the concerns you raise around feature rot. For a feature like
EC, it'd be untenable to leave it in trunk-incompat since the rebases would
be impossible. I imagine we'd also need a very motivated maintainer (or
maintainers) to handle the periodic integration of new trunk commits, since
you'd potentially be doing it for multiple large features. If some brave
and experienced committer is willing to own maintenance of the
trunk-incompat branch, I think it could work. However, this is a big shift
from how we've historically done development.

This is why I leaned toward Chris D's proposal, which is that we cut
branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
this is the "default" proposal, since it's how we've previously done
things, with the slight adjustment that we defer cutting branch-3 until we
start enforcing compatibility. This is my current plan for the Hadoop 3
series, and we already had a lot of +1's about releasing from trunk on the
previous thread.

If there's a strong advocate for trunk-incompat over branch-3, let's have
that discussion. However, given that beta is still months (and multiple
releases) away, I don't think this decision affects my near-term goal of
getting 3.0.0-alpha1 released.

Thanks,
Andrew

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
I would like to understand the trunk-incompat part of the proposal a little
better.

Is trunk-incompat always going to be a superset of trunk? If yes, is it
just a change in naming convention with a hope that our approach to trunk
stability changes as Sangjin mentioned?

Or, is it okay for trunk-incompat to be based off of an older commit in
trunk with (in)frequent rebases? This has the risk of incompatible changes
truly rotting. Periodic rebases will ensure these changes don't rot while
also easing the burden of hosting two branches; if we choose this route,
some guidance of the period and who rebases will be nice.

On Fri, Jun 10, 2016 at 5:11 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Let me try to clarify a few points, since not everyone might have been
> present for the previous emails.
>
> On the "Looking to a Hadoop 3 release" thread, we already reached
> consensus on doing releases from trunk. People didn't want to have to
> commit to another branch, and wanted to try releasing from trunk. The
> question, then, was how to ensure that trunk remains stable and releasable.
>
> Part of Vinod's proposal was that we, as a community, be more judicious
> about what we commit to trunk, and try to make use of more feature branches
> for larger efforts. There was no requirement that 1-2 patch changes go
> through a feature branch. There weren't any requirements around # of
> patches or length of development at all, just asking that committers be
> more judicious. I personally think Sangjin's rule of thumb of ~12 patches
> or ~1 month are about right, but it's up to the developers who are
> involved, and I doubt any one standard will fit all situations.
>
> So, this is about as low-overhead a policy there is: devs, please be
> careful when committing to trunk, and consider using a feature branch for
> bigger efforts.
>
> If you have further ideas about how to improve stability of trunk, I'd
> love to hear it. I'd hope though that the above would be a
> non-controversial statement.
>
> Best,
> Andrew
>
> On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:
>
>> Thanks for your thoughts Anu.
>>
>> Regarding your question
>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something? so this will lead us
>>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>>
>>
>> Andrew mentioned in the original email
>>
>>> Regarding "trunk-incompat", since we're still in the alpha stage for
>>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>>> was still under a bit of discussion; Chris Douglas though we should cut a
>>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>>> This point doesn't necessarily need to be resolved now though, since again
>>> we're still doing alphas.
>>
>>
>> and I agree with that sentiment. I think even if we have a
>> "trunk-incompat" branch to hold future incompatible changes, the situation
>> will change little from today. Instead of dealing with "trunk" (where
>> incompatible changes may appear) and "branch-3", we would be dealing with
>> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>>
>>
>> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <aengineer@hortonworks.com
>> > wrote:
>>
>>> I actively work on two branches (Diskbalancer and ozone) and I agree
>>> with most of what Sangjin said.
>>> There is an overhead in working with branches, there are both technical
>>> costs and administrative issues
>>> which discourages developers from using branches.
>>>
>>> I think the biggest issue with branch based development is that fact
>>> that other developers do not use a branch.
>>> If a small feature appears as a series of commits to “”datanode.java””,
>>> the branch based developer ends up rebasing
>>> and paying this price of rebasing many times. If everyone followed a
>>> model of branch + Pull request, other branches
>>> would not have to deal with continues rebasing to trunk commits. If we
>>> are moving to a branch based
>>> development, we should probably move to that model for most development
>>> to avoid this tax on people who
>>>  actually end up working in the branches.
>>>
>>> I do have a question in my mind though: What is being proposed is that
>>> we move active development to branches
>>> if the feature is small or incomplete, however keep the trunk open for
>>> check-ins. One of the biggest reason why we
>>> check-in into trunk and not to branch-2 is because it is a change that
>>> will break backward compatibility. So do we
>>> have an expectation of backward compatibility thru the 3.0-alpha series
>>> (I personally vote No, since 3.0 is experimental
>>> at this stage), but if we decide to support some sort of
>>> backward-compact then willy-nilly committing to trunk
>>> and still maintaining the expectation we can release Alphas from 3.0
>>> does not look possible.
>>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something?
>>> so this will lead us back to trunk being the unstable – 3.0 being the
>>> new “branch-2”.
>>>
>>> One more point: If we are moving to use a branch always – then we are
>>> looking at a model similar to using a git + pull
>>> request model. If that is so would it make sense to modify the rules to
>>> make these branches easier to merge?
>>> Say for example, if all commits in a branch has followed review and
>>> checking policy – just like trunk and commits
>>> have been made only after a sign off from a committer, would it be
>>> possible to merge with a 3-day voting period
>>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>>> people signing-off?
>>>
>>> What I am suggesting is reducing the administrative overheads of using a
>>> branch to encourage use of branching.
>>> Right now it feels like Apache’s process encourages committing directly
>>> to trunk than a branch
>>>
>>> Thanks
>>> Anu
>>>
>>>
>>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>>
>>> >Having worked on a major feature in a feature branch, I have some
>>> thoughts
>>> >and observations on feature branch development.
>>> >
>>> >IMO feature branch development v. direct commits to trunk in piecemeal
>>> is
>>> >really a choice of *granularity*. Do we want a series of fine-grained
>>> state
>>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>>> >
>>> >This makes me favor a branch-based development model for any
>>> "decent-sized"
>>> >features (we'll need to define "decent-sized" of course). Once you have
>>> >coarse-grained changes, it's easier to reason about what made what
>>> release
>>> >and in what state. As importantly, it makes it easier to back out a
>>> >complete feature fairly easily if that becomes necessary. My totally
>>> >unscientific suggestion may be if a feature takes more than dozen
>>> commits
>>> >and longer than a month, we should probably have a bias towards a
>>> feature
>>> >branch.
>>> >
>>> >Branch-based development also makes you go faster if your feature is
>>> >larger. I wouldn't do it the other way for timeline service v.2 for
>>> example.
>>> >
>>> >That said, feature branches don't come for free. Now the onus is on the
>>> >feature developer to constantly rebase with the trunk to keep it
>>> reasonably
>>> >integrated with the trunk. More logistics is involved for the feature
>>> >developer. Another big question is, when a feature branch gets big and
>>> it's
>>> >time to merge, would it get as scrutinized as a series of individual
>>> >commits? Since the size of merge can be big, you kind of have to rely on
>>> >those feature committers and those who help them.
>>> >
>>> >In terms of integrating/stabilizing, I don't think branch development
>>> >necessarily makes it harder. It is again granularity. In case of direct
>>> >commits on trunk, you do a lot more fine-grained integrations. In case
>>> of
>>> >branch development, you do far fewer coarse-grained integrations via
>>> >rebasing. If more people are doing branch-based development, it makes
>>> >rebasing easier to manage too.
>>> >
>>> >Going back to the related topic of where to release (trunk v.
>>> branch-X), I
>>> >think that is more of a proxy of the real question of "how do we
>>> maintain
>>> >quality and stability of the trunk?". Even if we release from the
>>> trunk, if
>>> >our bar for merging to trunk is low, the quality will not improve
>>> >automatically. So I think we ought to tackle the quality question first.
>>> >
>>> >My 2 cents.
>>> >
>>> >
>>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>>> >
>>> >> Thanks for the notes Andrew, Junping, Karthik.
>>> >>
>>> >> Here are some of my understandings:
>>> >>
>>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>>> use.
>>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>>> >> about durability, Hadoop might be gone in 50 years :). As a
>>> committer, I
>>> >> should be able to look at a patch and determine whether it's a
>>> >> self-contained improvement of trunk, without looking at other
>>> uncommitted
>>> >> patches.
>>> >> - Some comments inline:
>>> >>
>>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> > Comparing with advantages, I believe the disadvantages of shipping
>>> any
>>> >> > releases directly from trunk are more obvious and significant:
>>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>>> have
>>> >> > to wait to commit to trunk or put into a separated branch that could
>>> >> delay
>>> >> > feature development progress as additional vote process get
>>> involved even
>>> >> > the feature is simple and harmless.
>>> >> >
>>> >> Thanks Junping, those are valid concerns. I think we should clearly
>>> >> separate incompatible with  uncompleted / half-done work in this
>>> >> discussion. Whether people should commit incompatible changes to
>>> trunk is a
>>> >> much more tricky question (related to trunk-incompat etc.). But per my
>>> >> comment above, IMHO, *not committing uncompleted work to trunk*
>>> should be a
>>> >> much easier principle to agree upon.
>>> >>
>>> >>
>>> >> > - For small feature with only 1 or 2 commits, that need three +1
>>> from
>>> >> PMCs
>>> >> > will increase the bar largely for contributors who just start to
>>> >> contribute
>>> >> > on Hadoop features but no such sufficient support.
>>> >> >
>>> >> Development overhead is another valid concern. I think our
>>> rule-of-thumb
>>> >> should be that, small-medium new features should be proposed as a
>>> single
>>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>>> goes
>>> >> beyond a single JIRA/patch, use a feature branch.
>>> >>
>>> >>
>>> >> >
>>> >> > Given these concerns, I am open to other options, like: proposed by
>>> Vinod
>>> >> > or Chris, but rather than to release anything directly from trunk.
>>> >> >
>>> >> > - This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again we're still doing alphas.
>>> >> > No. I think we have to settle down this first. Without a common
>>> agreed
>>> >> and
>>> >> > transparent release process and branches in community, any release
>>> >> (alpha,
>>> >> > beta) bits is only called a private release but not a official
>>> apache
>>> >> > hadoop release (even alpha).
>>> >> >
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Junping
>>> >> > ________________________________________
>>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>>> >> > Sent: Friday, June 10, 2016 7:49 AM
>>> >> > To: Andrew Wang
>>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>>> >> >
>>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>>> this
>>> >> > across to a VOTE so it is clear.
>>> >> >
>>> >> > I see a few advantages shipping from trunk:
>>> >> >
>>> >> >    - The lack of need for one additional backport each time.
>>> >> >    - Feature rot in trunk
>>> >> >
>>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>>> can
>>> >> > continue doing 3.x releases off branch-3 even after we move trunk
>>> to 4.x
>>> >> (I
>>> >> > said it :))
>>> >> >
>>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>>> andrew.wang@cloudera.com>
>>> >> > wrote:
>>> >> >
>>> >> > > Hi all,
>>> >> > >
>>> >> > > On a separate thread, a question was raised about 3.x branching
>>> and use
>>> >> > of
>>> >> > > feature branches going forward.
>>> >> > >
>>> >> > > We discussed this previously on the "Looking to a Hadoop 3
>>> release"
>>> >> > thread
>>> >> > > that has spanned the years, with Vinod making this proposal
>>> (building
>>> >> on
>>> >> > > ideas from others who also commented in the email thread):
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> >
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>>> >> > >
>>> >> > > Pasting here for ease:
>>> >> > >
>>> >> > > On an unrelated note, offline I was pitching to a bunch of
>>> >> > > contributors another idea to deal
>>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>>> directly*.
>>> >> > >
>>> >> > > What this gains us is that
>>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>>> >> > >  - We no longer have some code lying around in some branch
>>> (today’s
>>> >> > > trunk) that is not releasable
>>> >> > > because it gets mixed with other undesirable and incompatible
>>> changes.
>>> >> > >  - This needs to be coupled with more discipline on individual
>>> >> > > features - medium to to large
>>> >> > > features are always worked upon in branches and get merged into
>>> trunk
>>> >> > > (and a nearing release!)
>>> >> > > when they are ready
>>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>>> >> > > branch and stay there till
>>> >> > > we accumulate enough of those to warrant another major release.
>>> >> > >
>>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>>> for
>>> >> > 3.0.0,
>>> >> > > there's no need for this branch yet. This aspect of Vinod's
>>> proposal
>>> >> was
>>> >> > > still under a bit of discussion; Chris Douglas though we should
>>> cut a
>>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>>> >> > thinking.
>>> >> > > This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again
>>> >> > > we're still doing alphas.
>>> >> > >
>>> >> > > What we should get consensus on is the goal of keeping trunk
>>> stable,
>>> >> and
>>> >> > > achieving that by doing more development on feature branches and
>>> being
>>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>>> (and
>>> >> the
>>> >> > > more recent one on the async API) is that people are generally in
>>> favor
>>> >> > of
>>> >> > > this.
>>> >> > >
>>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>>> greatly
>>> >> > > appreciate everyone's timely response in this matter.
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Andrew
>>> >> > >
>>> >> >
>>> >> >
>>> ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>>> >> >
>>> >> >
>>> >>
>>>
>>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
I would like to understand the trunk-incompat part of the proposal a little
better.

Is trunk-incompat always going to be a superset of trunk? If yes, is it
just a change in naming convention with a hope that our approach to trunk
stability changes as Sangjin mentioned?

Or, is it okay for trunk-incompat to be based off of an older commit in
trunk with (in)frequent rebases? This has the risk of incompatible changes
truly rotting. Periodic rebases will ensure these changes don't rot while
also easing the burden of hosting two branches; if we choose this route,
some guidance of the period and who rebases will be nice.

On Fri, Jun 10, 2016 at 5:11 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Let me try to clarify a few points, since not everyone might have been
> present for the previous emails.
>
> On the "Looking to a Hadoop 3 release" thread, we already reached
> consensus on doing releases from trunk. People didn't want to have to
> commit to another branch, and wanted to try releasing from trunk. The
> question, then, was how to ensure that trunk remains stable and releasable.
>
> Part of Vinod's proposal was that we, as a community, be more judicious
> about what we commit to trunk, and try to make use of more feature branches
> for larger efforts. There was no requirement that 1-2 patch changes go
> through a feature branch. There weren't any requirements around # of
> patches or length of development at all, just asking that committers be
> more judicious. I personally think Sangjin's rule of thumb of ~12 patches
> or ~1 month are about right, but it's up to the developers who are
> involved, and I doubt any one standard will fit all situations.
>
> So, this is about as low-overhead a policy there is: devs, please be
> careful when committing to trunk, and consider using a feature branch for
> bigger efforts.
>
> If you have further ideas about how to improve stability of trunk, I'd
> love to hear it. I'd hope though that the above would be a
> non-controversial statement.
>
> Best,
> Andrew
>
> On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:
>
>> Thanks for your thoughts Anu.
>>
>> Regarding your question
>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something? so this will lead us
>>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>>
>>
>> Andrew mentioned in the original email
>>
>>> Regarding "trunk-incompat", since we're still in the alpha stage for
>>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>>> was still under a bit of discussion; Chris Douglas though we should cut a
>>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>>> This point doesn't necessarily need to be resolved now though, since again
>>> we're still doing alphas.
>>
>>
>> and I agree with that sentiment. I think even if we have a
>> "trunk-incompat" branch to hold future incompatible changes, the situation
>> will change little from today. Instead of dealing with "trunk" (where
>> incompatible changes may appear) and "branch-3", we would be dealing with
>> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>>
>>
>> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <aengineer@hortonworks.com
>> > wrote:
>>
>>> I actively work on two branches (Diskbalancer and ozone) and I agree
>>> with most of what Sangjin said.
>>> There is an overhead in working with branches, there are both technical
>>> costs and administrative issues
>>> which discourages developers from using branches.
>>>
>>> I think the biggest issue with branch based development is that fact
>>> that other developers do not use a branch.
>>> If a small feature appears as a series of commits to “”datanode.java””,
>>> the branch based developer ends up rebasing
>>> and paying this price of rebasing many times. If everyone followed a
>>> model of branch + Pull request, other branches
>>> would not have to deal with continues rebasing to trunk commits. If we
>>> are moving to a branch based
>>> development, we should probably move to that model for most development
>>> to avoid this tax on people who
>>>  actually end up working in the branches.
>>>
>>> I do have a question in my mind though: What is being proposed is that
>>> we move active development to branches
>>> if the feature is small or incomplete, however keep the trunk open for
>>> check-ins. One of the biggest reason why we
>>> check-in into trunk and not to branch-2 is because it is a change that
>>> will break backward compatibility. So do we
>>> have an expectation of backward compatibility thru the 3.0-alpha series
>>> (I personally vote No, since 3.0 is experimental
>>> at this stage), but if we decide to support some sort of
>>> backward-compact then willy-nilly committing to trunk
>>> and still maintaining the expectation we can release Alphas from 3.0
>>> does not look possible.
>>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something?
>>> so this will lead us back to trunk being the unstable – 3.0 being the
>>> new “branch-2”.
>>>
>>> One more point: If we are moving to use a branch always – then we are
>>> looking at a model similar to using a git + pull
>>> request model. If that is so would it make sense to modify the rules to
>>> make these branches easier to merge?
>>> Say for example, if all commits in a branch has followed review and
>>> checking policy – just like trunk and commits
>>> have been made only after a sign off from a committer, would it be
>>> possible to merge with a 3-day voting period
>>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>>> people signing-off?
>>>
>>> What I am suggesting is reducing the administrative overheads of using a
>>> branch to encourage use of branching.
>>> Right now it feels like Apache’s process encourages committing directly
>>> to trunk than a branch
>>>
>>> Thanks
>>> Anu
>>>
>>>
>>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>>
>>> >Having worked on a major feature in a feature branch, I have some
>>> thoughts
>>> >and observations on feature branch development.
>>> >
>>> >IMO feature branch development v. direct commits to trunk in piecemeal
>>> is
>>> >really a choice of *granularity*. Do we want a series of fine-grained
>>> state
>>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>>> >
>>> >This makes me favor a branch-based development model for any
>>> "decent-sized"
>>> >features (we'll need to define "decent-sized" of course). Once you have
>>> >coarse-grained changes, it's easier to reason about what made what
>>> release
>>> >and in what state. As importantly, it makes it easier to back out a
>>> >complete feature fairly easily if that becomes necessary. My totally
>>> >unscientific suggestion may be if a feature takes more than dozen
>>> commits
>>> >and longer than a month, we should probably have a bias towards a
>>> feature
>>> >branch.
>>> >
>>> >Branch-based development also makes you go faster if your feature is
>>> >larger. I wouldn't do it the other way for timeline service v.2 for
>>> example.
>>> >
>>> >That said, feature branches don't come for free. Now the onus is on the
>>> >feature developer to constantly rebase with the trunk to keep it
>>> reasonably
>>> >integrated with the trunk. More logistics is involved for the feature
>>> >developer. Another big question is, when a feature branch gets big and
>>> it's
>>> >time to merge, would it get as scrutinized as a series of individual
>>> >commits? Since the size of merge can be big, you kind of have to rely on
>>> >those feature committers and those who help them.
>>> >
>>> >In terms of integrating/stabilizing, I don't think branch development
>>> >necessarily makes it harder. It is again granularity. In case of direct
>>> >commits on trunk, you do a lot more fine-grained integrations. In case
>>> of
>>> >branch development, you do far fewer coarse-grained integrations via
>>> >rebasing. If more people are doing branch-based development, it makes
>>> >rebasing easier to manage too.
>>> >
>>> >Going back to the related topic of where to release (trunk v.
>>> branch-X), I
>>> >think that is more of a proxy of the real question of "how do we
>>> maintain
>>> >quality and stability of the trunk?". Even if we release from the
>>> trunk, if
>>> >our bar for merging to trunk is low, the quality will not improve
>>> >automatically. So I think we ought to tackle the quality question first.
>>> >
>>> >My 2 cents.
>>> >
>>> >
>>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>>> >
>>> >> Thanks for the notes Andrew, Junping, Karthik.
>>> >>
>>> >> Here are some of my understandings:
>>> >>
>>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>>> use.
>>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>>> >> about durability, Hadoop might be gone in 50 years :). As a
>>> committer, I
>>> >> should be able to look at a patch and determine whether it's a
>>> >> self-contained improvement of trunk, without looking at other
>>> uncommitted
>>> >> patches.
>>> >> - Some comments inline:
>>> >>
>>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> > Comparing with advantages, I believe the disadvantages of shipping
>>> any
>>> >> > releases directly from trunk are more obvious and significant:
>>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>>> have
>>> >> > to wait to commit to trunk or put into a separated branch that could
>>> >> delay
>>> >> > feature development progress as additional vote process get
>>> involved even
>>> >> > the feature is simple and harmless.
>>> >> >
>>> >> Thanks Junping, those are valid concerns. I think we should clearly
>>> >> separate incompatible with  uncompleted / half-done work in this
>>> >> discussion. Whether people should commit incompatible changes to
>>> trunk is a
>>> >> much more tricky question (related to trunk-incompat etc.). But per my
>>> >> comment above, IMHO, *not committing uncompleted work to trunk*
>>> should be a
>>> >> much easier principle to agree upon.
>>> >>
>>> >>
>>> >> > - For small feature with only 1 or 2 commits, that need three +1
>>> from
>>> >> PMCs
>>> >> > will increase the bar largely for contributors who just start to
>>> >> contribute
>>> >> > on Hadoop features but no such sufficient support.
>>> >> >
>>> >> Development overhead is another valid concern. I think our
>>> rule-of-thumb
>>> >> should be that, small-medium new features should be proposed as a
>>> single
>>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>>> goes
>>> >> beyond a single JIRA/patch, use a feature branch.
>>> >>
>>> >>
>>> >> >
>>> >> > Given these concerns, I am open to other options, like: proposed by
>>> Vinod
>>> >> > or Chris, but rather than to release anything directly from trunk.
>>> >> >
>>> >> > - This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again we're still doing alphas.
>>> >> > No. I think we have to settle down this first. Without a common
>>> agreed
>>> >> and
>>> >> > transparent release process and branches in community, any release
>>> >> (alpha,
>>> >> > beta) bits is only called a private release but not a official
>>> apache
>>> >> > hadoop release (even alpha).
>>> >> >
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Junping
>>> >> > ________________________________________
>>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>>> >> > Sent: Friday, June 10, 2016 7:49 AM
>>> >> > To: Andrew Wang
>>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>>> >> >
>>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>>> this
>>> >> > across to a VOTE so it is clear.
>>> >> >
>>> >> > I see a few advantages shipping from trunk:
>>> >> >
>>> >> >    - The lack of need for one additional backport each time.
>>> >> >    - Feature rot in trunk
>>> >> >
>>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>>> can
>>> >> > continue doing 3.x releases off branch-3 even after we move trunk
>>> to 4.x
>>> >> (I
>>> >> > said it :))
>>> >> >
>>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>>> andrew.wang@cloudera.com>
>>> >> > wrote:
>>> >> >
>>> >> > > Hi all,
>>> >> > >
>>> >> > > On a separate thread, a question was raised about 3.x branching
>>> and use
>>> >> > of
>>> >> > > feature branches going forward.
>>> >> > >
>>> >> > > We discussed this previously on the "Looking to a Hadoop 3
>>> release"
>>> >> > thread
>>> >> > > that has spanned the years, with Vinod making this proposal
>>> (building
>>> >> on
>>> >> > > ideas from others who also commented in the email thread):
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> >
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>>> >> > >
>>> >> > > Pasting here for ease:
>>> >> > >
>>> >> > > On an unrelated note, offline I was pitching to a bunch of
>>> >> > > contributors another idea to deal
>>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>>> directly*.
>>> >> > >
>>> >> > > What this gains us is that
>>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>>> >> > >  - We no longer have some code lying around in some branch
>>> (today’s
>>> >> > > trunk) that is not releasable
>>> >> > > because it gets mixed with other undesirable and incompatible
>>> changes.
>>> >> > >  - This needs to be coupled with more discipline on individual
>>> >> > > features - medium to to large
>>> >> > > features are always worked upon in branches and get merged into
>>> trunk
>>> >> > > (and a nearing release!)
>>> >> > > when they are ready
>>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>>> >> > > branch and stay there till
>>> >> > > we accumulate enough of those to warrant another major release.
>>> >> > >
>>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>>> for
>>> >> > 3.0.0,
>>> >> > > there's no need for this branch yet. This aspect of Vinod's
>>> proposal
>>> >> was
>>> >> > > still under a bit of discussion; Chris Douglas though we should
>>> cut a
>>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>>> >> > thinking.
>>> >> > > This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again
>>> >> > > we're still doing alphas.
>>> >> > >
>>> >> > > What we should get consensus on is the goal of keeping trunk
>>> stable,
>>> >> and
>>> >> > > achieving that by doing more development on feature branches and
>>> being
>>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>>> (and
>>> >> the
>>> >> > > more recent one on the async API) is that people are generally in
>>> favor
>>> >> > of
>>> >> > > this.
>>> >> > >
>>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>>> greatly
>>> >> > > appreciate everyone's timely response in this matter.
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Andrew
>>> >> > >
>>> >> >
>>> >> >
>>> ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>>> >> >
>>> >> >
>>> >>
>>>
>>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
I would like to understand the trunk-incompat part of the proposal a little
better.

Is trunk-incompat always going to be a superset of trunk? If yes, is it
just a change in naming convention with a hope that our approach to trunk
stability changes as Sangjin mentioned?

Or, is it okay for trunk-incompat to be based off of an older commit in
trunk with (in)frequent rebases? This has the risk of incompatible changes
truly rotting. Periodic rebases will ensure these changes don't rot while
also easing the burden of hosting two branches; if we choose this route,
some guidance of the period and who rebases will be nice.

On Fri, Jun 10, 2016 at 5:11 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Let me try to clarify a few points, since not everyone might have been
> present for the previous emails.
>
> On the "Looking to a Hadoop 3 release" thread, we already reached
> consensus on doing releases from trunk. People didn't want to have to
> commit to another branch, and wanted to try releasing from trunk. The
> question, then, was how to ensure that trunk remains stable and releasable.
>
> Part of Vinod's proposal was that we, as a community, be more judicious
> about what we commit to trunk, and try to make use of more feature branches
> for larger efforts. There was no requirement that 1-2 patch changes go
> through a feature branch. There weren't any requirements around # of
> patches or length of development at all, just asking that committers be
> more judicious. I personally think Sangjin's rule of thumb of ~12 patches
> or ~1 month are about right, but it's up to the developers who are
> involved, and I doubt any one standard will fit all situations.
>
> So, this is about as low-overhead a policy there is: devs, please be
> careful when committing to trunk, and consider using a feature branch for
> bigger efforts.
>
> If you have further ideas about how to improve stability of trunk, I'd
> love to hear it. I'd hope though that the above would be a
> non-controversial statement.
>
> Best,
> Andrew
>
> On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:
>
>> Thanks for your thoughts Anu.
>>
>> Regarding your question
>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something? so this will lead us
>>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>>
>>
>> Andrew mentioned in the original email
>>
>>> Regarding "trunk-incompat", since we're still in the alpha stage for
>>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>>> was still under a bit of discussion; Chris Douglas though we should cut a
>>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>>> This point doesn't necessarily need to be resolved now though, since again
>>> we're still doing alphas.
>>
>>
>> and I agree with that sentiment. I think even if we have a
>> "trunk-incompat" branch to hold future incompatible changes, the situation
>> will change little from today. Instead of dealing with "trunk" (where
>> incompatible changes may appear) and "branch-3", we would be dealing with
>> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>>
>>
>> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <aengineer@hortonworks.com
>> > wrote:
>>
>>> I actively work on two branches (Diskbalancer and ozone) and I agree
>>> with most of what Sangjin said.
>>> There is an overhead in working with branches, there are both technical
>>> costs and administrative issues
>>> which discourages developers from using branches.
>>>
>>> I think the biggest issue with branch based development is that fact
>>> that other developers do not use a branch.
>>> If a small feature appears as a series of commits to “”datanode.java””,
>>> the branch based developer ends up rebasing
>>> and paying this price of rebasing many times. If everyone followed a
>>> model of branch + Pull request, other branches
>>> would not have to deal with continues rebasing to trunk commits. If we
>>> are moving to a branch based
>>> development, we should probably move to that model for most development
>>> to avoid this tax on people who
>>>  actually end up working in the branches.
>>>
>>> I do have a question in my mind though: What is being proposed is that
>>> we move active development to branches
>>> if the feature is small or incomplete, however keep the trunk open for
>>> check-ins. One of the biggest reason why we
>>> check-in into trunk and not to branch-2 is because it is a change that
>>> will break backward compatibility. So do we
>>> have an expectation of backward compatibility thru the 3.0-alpha series
>>> (I personally vote No, since 3.0 is experimental
>>> at this stage), but if we decide to support some sort of
>>> backward-compact then willy-nilly committing to trunk
>>> and still maintaining the expectation we can release Alphas from 3.0
>>> does not look possible.
>>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something?
>>> so this will lead us back to trunk being the unstable – 3.0 being the
>>> new “branch-2”.
>>>
>>> One more point: If we are moving to use a branch always – then we are
>>> looking at a model similar to using a git + pull
>>> request model. If that is so would it make sense to modify the rules to
>>> make these branches easier to merge?
>>> Say for example, if all commits in a branch has followed review and
>>> checking policy – just like trunk and commits
>>> have been made only after a sign off from a committer, would it be
>>> possible to merge with a 3-day voting period
>>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>>> people signing-off?
>>>
>>> What I am suggesting is reducing the administrative overheads of using a
>>> branch to encourage use of branching.
>>> Right now it feels like Apache’s process encourages committing directly
>>> to trunk than a branch
>>>
>>> Thanks
>>> Anu
>>>
>>>
>>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>>
>>> >Having worked on a major feature in a feature branch, I have some
>>> thoughts
>>> >and observations on feature branch development.
>>> >
>>> >IMO feature branch development v. direct commits to trunk in piecemeal
>>> is
>>> >really a choice of *granularity*. Do we want a series of fine-grained
>>> state
>>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>>> >
>>> >This makes me favor a branch-based development model for any
>>> "decent-sized"
>>> >features (we'll need to define "decent-sized" of course). Once you have
>>> >coarse-grained changes, it's easier to reason about what made what
>>> release
>>> >and in what state. As importantly, it makes it easier to back out a
>>> >complete feature fairly easily if that becomes necessary. My totally
>>> >unscientific suggestion may be if a feature takes more than dozen
>>> commits
>>> >and longer than a month, we should probably have a bias towards a
>>> feature
>>> >branch.
>>> >
>>> >Branch-based development also makes you go faster if your feature is
>>> >larger. I wouldn't do it the other way for timeline service v.2 for
>>> example.
>>> >
>>> >That said, feature branches don't come for free. Now the onus is on the
>>> >feature developer to constantly rebase with the trunk to keep it
>>> reasonably
>>> >integrated with the trunk. More logistics is involved for the feature
>>> >developer. Another big question is, when a feature branch gets big and
>>> it's
>>> >time to merge, would it get as scrutinized as a series of individual
>>> >commits? Since the size of merge can be big, you kind of have to rely on
>>> >those feature committers and those who help them.
>>> >
>>> >In terms of integrating/stabilizing, I don't think branch development
>>> >necessarily makes it harder. It is again granularity. In case of direct
>>> >commits on trunk, you do a lot more fine-grained integrations. In case
>>> of
>>> >branch development, you do far fewer coarse-grained integrations via
>>> >rebasing. If more people are doing branch-based development, it makes
>>> >rebasing easier to manage too.
>>> >
>>> >Going back to the related topic of where to release (trunk v.
>>> branch-X), I
>>> >think that is more of a proxy of the real question of "how do we
>>> maintain
>>> >quality and stability of the trunk?". Even if we release from the
>>> trunk, if
>>> >our bar for merging to trunk is low, the quality will not improve
>>> >automatically. So I think we ought to tackle the quality question first.
>>> >
>>> >My 2 cents.
>>> >
>>> >
>>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>>> >
>>> >> Thanks for the notes Andrew, Junping, Karthik.
>>> >>
>>> >> Here are some of my understandings:
>>> >>
>>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>>> use.
>>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>>> >> about durability, Hadoop might be gone in 50 years :). As a
>>> committer, I
>>> >> should be able to look at a patch and determine whether it's a
>>> >> self-contained improvement of trunk, without looking at other
>>> uncommitted
>>> >> patches.
>>> >> - Some comments inline:
>>> >>
>>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> > Comparing with advantages, I believe the disadvantages of shipping
>>> any
>>> >> > releases directly from trunk are more obvious and significant:
>>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>>> have
>>> >> > to wait to commit to trunk or put into a separated branch that could
>>> >> delay
>>> >> > feature development progress as additional vote process get
>>> involved even
>>> >> > the feature is simple and harmless.
>>> >> >
>>> >> Thanks Junping, those are valid concerns. I think we should clearly
>>> >> separate incompatible with  uncompleted / half-done work in this
>>> >> discussion. Whether people should commit incompatible changes to
>>> trunk is a
>>> >> much more tricky question (related to trunk-incompat etc.). But per my
>>> >> comment above, IMHO, *not committing uncompleted work to trunk*
>>> should be a
>>> >> much easier principle to agree upon.
>>> >>
>>> >>
>>> >> > - For small feature with only 1 or 2 commits, that need three +1
>>> from
>>> >> PMCs
>>> >> > will increase the bar largely for contributors who just start to
>>> >> contribute
>>> >> > on Hadoop features but no such sufficient support.
>>> >> >
>>> >> Development overhead is another valid concern. I think our
>>> rule-of-thumb
>>> >> should be that, small-medium new features should be proposed as a
>>> single
>>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>>> goes
>>> >> beyond a single JIRA/patch, use a feature branch.
>>> >>
>>> >>
>>> >> >
>>> >> > Given these concerns, I am open to other options, like: proposed by
>>> Vinod
>>> >> > or Chris, but rather than to release anything directly from trunk.
>>> >> >
>>> >> > - This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again we're still doing alphas.
>>> >> > No. I think we have to settle down this first. Without a common
>>> agreed
>>> >> and
>>> >> > transparent release process and branches in community, any release
>>> >> (alpha,
>>> >> > beta) bits is only called a private release but not a official
>>> apache
>>> >> > hadoop release (even alpha).
>>> >> >
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Junping
>>> >> > ________________________________________
>>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>>> >> > Sent: Friday, June 10, 2016 7:49 AM
>>> >> > To: Andrew Wang
>>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>>> >> >
>>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>>> this
>>> >> > across to a VOTE so it is clear.
>>> >> >
>>> >> > I see a few advantages shipping from trunk:
>>> >> >
>>> >> >    - The lack of need for one additional backport each time.
>>> >> >    - Feature rot in trunk
>>> >> >
>>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>>> can
>>> >> > continue doing 3.x releases off branch-3 even after we move trunk
>>> to 4.x
>>> >> (I
>>> >> > said it :))
>>> >> >
>>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>>> andrew.wang@cloudera.com>
>>> >> > wrote:
>>> >> >
>>> >> > > Hi all,
>>> >> > >
>>> >> > > On a separate thread, a question was raised about 3.x branching
>>> and use
>>> >> > of
>>> >> > > feature branches going forward.
>>> >> > >
>>> >> > > We discussed this previously on the "Looking to a Hadoop 3
>>> release"
>>> >> > thread
>>> >> > > that has spanned the years, with Vinod making this proposal
>>> (building
>>> >> on
>>> >> > > ideas from others who also commented in the email thread):
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> >
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>>> >> > >
>>> >> > > Pasting here for ease:
>>> >> > >
>>> >> > > On an unrelated note, offline I was pitching to a bunch of
>>> >> > > contributors another idea to deal
>>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>>> directly*.
>>> >> > >
>>> >> > > What this gains us is that
>>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>>> >> > >  - We no longer have some code lying around in some branch
>>> (today’s
>>> >> > > trunk) that is not releasable
>>> >> > > because it gets mixed with other undesirable and incompatible
>>> changes.
>>> >> > >  - This needs to be coupled with more discipline on individual
>>> >> > > features - medium to to large
>>> >> > > features are always worked upon in branches and get merged into
>>> trunk
>>> >> > > (and a nearing release!)
>>> >> > > when they are ready
>>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>>> >> > > branch and stay there till
>>> >> > > we accumulate enough of those to warrant another major release.
>>> >> > >
>>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>>> for
>>> >> > 3.0.0,
>>> >> > > there's no need for this branch yet. This aspect of Vinod's
>>> proposal
>>> >> was
>>> >> > > still under a bit of discussion; Chris Douglas though we should
>>> cut a
>>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>>> >> > thinking.
>>> >> > > This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again
>>> >> > > we're still doing alphas.
>>> >> > >
>>> >> > > What we should get consensus on is the goal of keeping trunk
>>> stable,
>>> >> and
>>> >> > > achieving that by doing more development on feature branches and
>>> being
>>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>>> (and
>>> >> the
>>> >> > > more recent one on the async API) is that people are generally in
>>> favor
>>> >> > of
>>> >> > > this.
>>> >> > >
>>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>>> greatly
>>> >> > > appreciate everyone's timely response in this matter.
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Andrew
>>> >> > >
>>> >> >
>>> >> >
>>> ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>>> >> >
>>> >> >
>>> >>
>>>
>>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
I would like to understand the trunk-incompat part of the proposal a little
better.

Is trunk-incompat always going to be a superset of trunk? If yes, is it
just a change in naming convention with a hope that our approach to trunk
stability changes as Sangjin mentioned?

Or, is it okay for trunk-incompat to be based off of an older commit in
trunk with (in)frequent rebases? This has the risk of incompatible changes
truly rotting. Periodic rebases will ensure these changes don't rot while
also easing the burden of hosting two branches; if we choose this route,
some guidance of the period and who rebases will be nice.

On Fri, Jun 10, 2016 at 5:11 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Let me try to clarify a few points, since not everyone might have been
> present for the previous emails.
>
> On the "Looking to a Hadoop 3 release" thread, we already reached
> consensus on doing releases from trunk. People didn't want to have to
> commit to another branch, and wanted to try releasing from trunk. The
> question, then, was how to ensure that trunk remains stable and releasable.
>
> Part of Vinod's proposal was that we, as a community, be more judicious
> about what we commit to trunk, and try to make use of more feature branches
> for larger efforts. There was no requirement that 1-2 patch changes go
> through a feature branch. There weren't any requirements around # of
> patches or length of development at all, just asking that committers be
> more judicious. I personally think Sangjin's rule of thumb of ~12 patches
> or ~1 month are about right, but it's up to the developers who are
> involved, and I doubt any one standard will fit all situations.
>
> So, this is about as low-overhead a policy there is: devs, please be
> careful when committing to trunk, and consider using a feature branch for
> bigger efforts.
>
> If you have further ideas about how to improve stability of trunk, I'd
> love to hear it. I'd hope though that the above would be a
> non-controversial statement.
>
> Best,
> Andrew
>
> On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:
>
>> Thanks for your thoughts Anu.
>>
>> Regarding your question
>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something? so this will lead us
>>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>>
>>
>> Andrew mentioned in the original email
>>
>>> Regarding "trunk-incompat", since we're still in the alpha stage for
>>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>>> was still under a bit of discussion; Chris Douglas though we should cut a
>>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>>> This point doesn't necessarily need to be resolved now though, since again
>>> we're still doing alphas.
>>
>>
>> and I agree with that sentiment. I think even if we have a
>> "trunk-incompat" branch to hold future incompatible changes, the situation
>> will change little from today. Instead of dealing with "trunk" (where
>> incompatible changes may appear) and "branch-3", we would be dealing with
>> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>>
>>
>> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <aengineer@hortonworks.com
>> > wrote:
>>
>>> I actively work on two branches (Diskbalancer and ozone) and I agree
>>> with most of what Sangjin said.
>>> There is an overhead in working with branches, there are both technical
>>> costs and administrative issues
>>> which discourages developers from using branches.
>>>
>>> I think the biggest issue with branch based development is that fact
>>> that other developers do not use a branch.
>>> If a small feature appears as a series of commits to “”datanode.java””,
>>> the branch based developer ends up rebasing
>>> and paying this price of rebasing many times. If everyone followed a
>>> model of branch + Pull request, other branches
>>> would not have to deal with continues rebasing to trunk commits. If we
>>> are moving to a branch based
>>> development, we should probably move to that model for most development
>>> to avoid this tax on people who
>>>  actually end up working in the branches.
>>>
>>> I do have a question in my mind though: What is being proposed is that
>>> we move active development to branches
>>> if the feature is small or incomplete, however keep the trunk open for
>>> check-ins. One of the biggest reason why we
>>> check-in into trunk and not to branch-2 is because it is a change that
>>> will break backward compatibility. So do we
>>> have an expectation of backward compatibility thru the 3.0-alpha series
>>> (I personally vote No, since 3.0 is experimental
>>> at this stage), but if we decide to support some sort of
>>> backward-compact then willy-nilly committing to trunk
>>> and still maintaining the expectation we can release Alphas from 3.0
>>> does not look possible.
>>>
>>> And then comes the question, once 3.0 becomes official, where do we
>>> check-in a change,  if that would break something?
>>> so this will lead us back to trunk being the unstable – 3.0 being the
>>> new “branch-2”.
>>>
>>> One more point: If we are moving to use a branch always – then we are
>>> looking at a model similar to using a git + pull
>>> request model. If that is so would it make sense to modify the rules to
>>> make these branches easier to merge?
>>> Say for example, if all commits in a branch has followed review and
>>> checking policy – just like trunk and commits
>>> have been made only after a sign off from a committer, would it be
>>> possible to merge with a 3-day voting period
>>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>>> people signing-off?
>>>
>>> What I am suggesting is reducing the administrative overheads of using a
>>> branch to encourage use of branching.
>>> Right now it feels like Apache’s process encourages committing directly
>>> to trunk than a branch
>>>
>>> Thanks
>>> Anu
>>>
>>>
>>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>>
>>> >Having worked on a major feature in a feature branch, I have some
>>> thoughts
>>> >and observations on feature branch development.
>>> >
>>> >IMO feature branch development v. direct commits to trunk in piecemeal
>>> is
>>> >really a choice of *granularity*. Do we want a series of fine-grained
>>> state
>>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>>> >
>>> >This makes me favor a branch-based development model for any
>>> "decent-sized"
>>> >features (we'll need to define "decent-sized" of course). Once you have
>>> >coarse-grained changes, it's easier to reason about what made what
>>> release
>>> >and in what state. As importantly, it makes it easier to back out a
>>> >complete feature fairly easily if that becomes necessary. My totally
>>> >unscientific suggestion may be if a feature takes more than dozen
>>> commits
>>> >and longer than a month, we should probably have a bias towards a
>>> feature
>>> >branch.
>>> >
>>> >Branch-based development also makes you go faster if your feature is
>>> >larger. I wouldn't do it the other way for timeline service v.2 for
>>> example.
>>> >
>>> >That said, feature branches don't come for free. Now the onus is on the
>>> >feature developer to constantly rebase with the trunk to keep it
>>> reasonably
>>> >integrated with the trunk. More logistics is involved for the feature
>>> >developer. Another big question is, when a feature branch gets big and
>>> it's
>>> >time to merge, would it get as scrutinized as a series of individual
>>> >commits? Since the size of merge can be big, you kind of have to rely on
>>> >those feature committers and those who help them.
>>> >
>>> >In terms of integrating/stabilizing, I don't think branch development
>>> >necessarily makes it harder. It is again granularity. In case of direct
>>> >commits on trunk, you do a lot more fine-grained integrations. In case
>>> of
>>> >branch development, you do far fewer coarse-grained integrations via
>>> >rebasing. If more people are doing branch-based development, it makes
>>> >rebasing easier to manage too.
>>> >
>>> >Going back to the related topic of where to release (trunk v.
>>> branch-X), I
>>> >think that is more of a proxy of the real question of "how do we
>>> maintain
>>> >quality and stability of the trunk?". Even if we release from the
>>> trunk, if
>>> >our bar for merging to trunk is low, the quality will not improve
>>> >automatically. So I think we ought to tackle the quality question first.
>>> >
>>> >My 2 cents.
>>> >
>>> >
>>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>>> >
>>> >> Thanks for the notes Andrew, Junping, Karthik.
>>> >>
>>> >> Here are some of my understandings:
>>> >>
>>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>>> use.
>>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>>> >> about durability, Hadoop might be gone in 50 years :). As a
>>> committer, I
>>> >> should be able to look at a patch and determine whether it's a
>>> >> self-contained improvement of trunk, without looking at other
>>> uncommitted
>>> >> patches.
>>> >> - Some comments inline:
>>> >>
>>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> > Comparing with advantages, I believe the disadvantages of shipping
>>> any
>>> >> > releases directly from trunk are more obvious and significant:
>>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>>> have
>>> >> > to wait to commit to trunk or put into a separated branch that could
>>> >> delay
>>> >> > feature development progress as additional vote process get
>>> involved even
>>> >> > the feature is simple and harmless.
>>> >> >
>>> >> Thanks Junping, those are valid concerns. I think we should clearly
>>> >> separate incompatible with  uncompleted / half-done work in this
>>> >> discussion. Whether people should commit incompatible changes to
>>> trunk is a
>>> >> much more tricky question (related to trunk-incompat etc.). But per my
>>> >> comment above, IMHO, *not committing uncompleted work to trunk*
>>> should be a
>>> >> much easier principle to agree upon.
>>> >>
>>> >>
>>> >> > - For small feature with only 1 or 2 commits, that need three +1
>>> from
>>> >> PMCs
>>> >> > will increase the bar largely for contributors who just start to
>>> >> contribute
>>> >> > on Hadoop features but no such sufficient support.
>>> >> >
>>> >> Development overhead is another valid concern. I think our
>>> rule-of-thumb
>>> >> should be that, small-medium new features should be proposed as a
>>> single
>>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>>> goes
>>> >> beyond a single JIRA/patch, use a feature branch.
>>> >>
>>> >>
>>> >> >
>>> >> > Given these concerns, I am open to other options, like: proposed by
>>> Vinod
>>> >> > or Chris, but rather than to release anything directly from trunk.
>>> >> >
>>> >> > - This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again we're still doing alphas.
>>> >> > No. I think we have to settle down this first. Without a common
>>> agreed
>>> >> and
>>> >> > transparent release process and branches in community, any release
>>> >> (alpha,
>>> >> > beta) bits is only called a private release but not a official
>>> apache
>>> >> > hadoop release (even alpha).
>>> >> >
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Junping
>>> >> > ________________________________________
>>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>>> >> > Sent: Friday, June 10, 2016 7:49 AM
>>> >> > To: Andrew Wang
>>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>>> >> >
>>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>>> this
>>> >> > across to a VOTE so it is clear.
>>> >> >
>>> >> > I see a few advantages shipping from trunk:
>>> >> >
>>> >> >    - The lack of need for one additional backport each time.
>>> >> >    - Feature rot in trunk
>>> >> >
>>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>>> can
>>> >> > continue doing 3.x releases off branch-3 even after we move trunk
>>> to 4.x
>>> >> (I
>>> >> > said it :))
>>> >> >
>>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>>> andrew.wang@cloudera.com>
>>> >> > wrote:
>>> >> >
>>> >> > > Hi all,
>>> >> > >
>>> >> > > On a separate thread, a question was raised about 3.x branching
>>> and use
>>> >> > of
>>> >> > > feature branches going forward.
>>> >> > >
>>> >> > > We discussed this previously on the "Looking to a Hadoop 3
>>> release"
>>> >> > thread
>>> >> > > that has spanned the years, with Vinod making this proposal
>>> (building
>>> >> on
>>> >> > > ideas from others who also commented in the email thread):
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> >
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>>> >> > >
>>> >> > > Pasting here for ease:
>>> >> > >
>>> >> > > On an unrelated note, offline I was pitching to a bunch of
>>> >> > > contributors another idea to deal
>>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>>> directly*.
>>> >> > >
>>> >> > > What this gains us is that
>>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>>> >> > >  - We no longer have some code lying around in some branch
>>> (today’s
>>> >> > > trunk) that is not releasable
>>> >> > > because it gets mixed with other undesirable and incompatible
>>> changes.
>>> >> > >  - This needs to be coupled with more discipline on individual
>>> >> > > features - medium to to large
>>> >> > > features are always worked upon in branches and get merged into
>>> trunk
>>> >> > > (and a nearing release!)
>>> >> > > when they are ready
>>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>>> >> > > branch and stay there till
>>> >> > > we accumulate enough of those to warrant another major release.
>>> >> > >
>>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>>> for
>>> >> > 3.0.0,
>>> >> > > there's no need for this branch yet. This aspect of Vinod's
>>> proposal
>>> >> was
>>> >> > > still under a bit of discussion; Chris Douglas though we should
>>> cut a
>>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>>> >> > thinking.
>>> >> > > This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again
>>> >> > > we're still doing alphas.
>>> >> > >
>>> >> > > What we should get consensus on is the goal of keeping trunk
>>> stable,
>>> >> and
>>> >> > > achieving that by doing more development on feature branches and
>>> being
>>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>>> (and
>>> >> the
>>> >> > > more recent one on the async API) is that people are generally in
>>> favor
>>> >> > of
>>> >> > > this.
>>> >> > >
>>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>>> greatly
>>> >> > > appreciate everyone's timely response in this matter.
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Andrew
>>> >> > >
>>> >> >
>>> >> >
>>> ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>>> >> >
>>> >> >
>>> >>
>>>
>>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
Let me try to clarify a few points, since not everyone might have been
present for the previous emails.

On the "Looking to a Hadoop 3 release" thread, we already reached consensus
on doing releases from trunk. People didn't want to have to commit to
another branch, and wanted to try releasing from trunk. The question, then,
was how to ensure that trunk remains stable and releasable.

Part of Vinod's proposal was that we, as a community, be more judicious
about what we commit to trunk, and try to make use of more feature branches
for larger efforts. There was no requirement that 1-2 patch changes go
through a feature branch. There weren't any requirements around # of
patches or length of development at all, just asking that committers be
more judicious. I personally think Sangjin's rule of thumb of ~12 patches
or ~1 month are about right, but it's up to the developers who are
involved, and I doubt any one standard will fit all situations.

So, this is about as low-overhead a policy there is: devs, please be
careful when committing to trunk, and consider using a feature branch for
bigger efforts.

If you have further ideas about how to improve stability of trunk, I'd love
to hear it. I'd hope though that the above would be a non-controversial
statement.

Best,
Andrew

On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:

> Thanks for your thoughts Anu.
>
> Regarding your question
>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something? so this will lead us
>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>
>
> Andrew mentioned in the original email
>
>> Regarding "trunk-incompat", since we're still in the alpha stage for
>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>> was still under a bit of discussion; Chris Douglas though we should cut a
>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>> This point doesn't necessarily need to be resolved now though, since again
>> we're still doing alphas.
>
>
> and I agree with that sentiment. I think even if we have a
> "trunk-incompat" branch to hold future incompatible changes, the situation
> will change little from today. Instead of dealing with "trunk" (where
> incompatible changes may appear) and "branch-3", we would be dealing with
> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>
>
> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
> wrote:
>
>> I actively work on two branches (Diskbalancer and ozone) and I agree with
>> most of what Sangjin said.
>> There is an overhead in working with branches, there are both technical
>> costs and administrative issues
>> which discourages developers from using branches.
>>
>> I think the biggest issue with branch based development is that fact that
>> other developers do not use a branch.
>> If a small feature appears as a series of commits to “”datanode.java””,
>> the branch based developer ends up rebasing
>> and paying this price of rebasing many times. If everyone followed a
>> model of branch + Pull request, other branches
>> would not have to deal with continues rebasing to trunk commits. If we
>> are moving to a branch based
>> development, we should probably move to that model for most development
>> to avoid this tax on people who
>>  actually end up working in the branches.
>>
>> I do have a question in my mind though: What is being proposed is that we
>> move active development to branches
>> if the feature is small or incomplete, however keep the trunk open for
>> check-ins. One of the biggest reason why we
>> check-in into trunk and not to branch-2 is because it is a change that
>> will break backward compatibility. So do we
>> have an expectation of backward compatibility thru the 3.0-alpha series
>> (I personally vote No, since 3.0 is experimental
>> at this stage), but if we decide to support some sort of backward-compact
>> then willy-nilly committing to trunk
>> and still maintaining the expectation we can release Alphas from 3.0 does
>> not look possible.
>>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something?
>> so this will lead us back to trunk being the unstable – 3.0 being the new
>> “branch-2”.
>>
>> One more point: If we are moving to use a branch always – then we are
>> looking at a model similar to using a git + pull
>> request model. If that is so would it make sense to modify the rules to
>> make these branches easier to merge?
>> Say for example, if all commits in a branch has followed review and
>> checking policy – just like trunk and commits
>> have been made only after a sign off from a committer, would it be
>> possible to merge with a 3-day voting period
>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>> people signing-off?
>>
>> What I am suggesting is reducing the administrative overheads of using a
>> branch to encourage use of branching.
>> Right now it feels like Apache’s process encourages committing directly
>> to trunk than a branch
>>
>> Thanks
>> Anu
>>
>>
>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>
>> >Having worked on a major feature in a feature branch, I have some
>> thoughts
>> >and observations on feature branch development.
>> >
>> >IMO feature branch development v. direct commits to trunk in piecemeal is
>> >really a choice of *granularity*. Do we want a series of fine-grained
>> state
>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>> >
>> >This makes me favor a branch-based development model for any
>> "decent-sized"
>> >features (we'll need to define "decent-sized" of course). Once you have
>> >coarse-grained changes, it's easier to reason about what made what
>> release
>> >and in what state. As importantly, it makes it easier to back out a
>> >complete feature fairly easily if that becomes necessary. My totally
>> >unscientific suggestion may be if a feature takes more than dozen commits
>> >and longer than a month, we should probably have a bias towards a feature
>> >branch.
>> >
>> >Branch-based development also makes you go faster if your feature is
>> >larger. I wouldn't do it the other way for timeline service v.2 for
>> example.
>> >
>> >That said, feature branches don't come for free. Now the onus is on the
>> >feature developer to constantly rebase with the trunk to keep it
>> reasonably
>> >integrated with the trunk. More logistics is involved for the feature
>> >developer. Another big question is, when a feature branch gets big and
>> it's
>> >time to merge, would it get as scrutinized as a series of individual
>> >commits? Since the size of merge can be big, you kind of have to rely on
>> >those feature committers and those who help them.
>> >
>> >In terms of integrating/stabilizing, I don't think branch development
>> >necessarily makes it harder. It is again granularity. In case of direct
>> >commits on trunk, you do a lot more fine-grained integrations. In case of
>> >branch development, you do far fewer coarse-grained integrations via
>> >rebasing. If more people are doing branch-based development, it makes
>> >rebasing easier to manage too.
>> >
>> >Going back to the related topic of where to release (trunk v. branch-X),
>> I
>> >think that is more of a proxy of the real question of "how do we maintain
>> >quality and stability of the trunk?". Even if we release from the trunk,
>> if
>> >our bar for merging to trunk is low, the quality will not improve
>> >automatically. So I think we ought to tackle the quality question first.
>> >
>> >My 2 cents.
>> >
>> >
>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>> >
>> >> Thanks for the notes Andrew, Junping, Karthik.
>> >>
>> >> Here are some of my understandings:
>> >>
>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>> use.
>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>> >> about durability, Hadoop might be gone in 50 years :). As a committer,
>> I
>> >> should be able to look at a patch and determine whether it's a
>> >> self-contained improvement of trunk, without looking at other
>> uncommitted
>> >> patches.
>> >> - Some comments inline:
>> >>
>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>> wrote:
>> >>
>> >> > Comparing with advantages, I believe the disadvantages of shipping
>> any
>> >> > releases directly from trunk are more obvious and significant:
>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>> have
>> >> > to wait to commit to trunk or put into a separated branch that could
>> >> delay
>> >> > feature development progress as additional vote process get involved
>> even
>> >> > the feature is simple and harmless.
>> >> >
>> >> Thanks Junping, those are valid concerns. I think we should clearly
>> >> separate incompatible with  uncompleted / half-done work in this
>> >> discussion. Whether people should commit incompatible changes to trunk
>> is a
>> >> much more tricky question (related to trunk-incompat etc.). But per my
>> >> comment above, IMHO, *not committing uncompleted work to trunk* should
>> be a
>> >> much easier principle to agree upon.
>> >>
>> >>
>> >> > - For small feature with only 1 or 2 commits, that need three +1 from
>> >> PMCs
>> >> > will increase the bar largely for contributors who just start to
>> >> contribute
>> >> > on Hadoop features but no such sufficient support.
>> >> >
>> >> Development overhead is another valid concern. I think our
>> rule-of-thumb
>> >> should be that, small-medium new features should be proposed as a
>> single
>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>> goes
>> >> beyond a single JIRA/patch, use a feature branch.
>> >>
>> >>
>> >> >
>> >> > Given these concerns, I am open to other options, like: proposed by
>> Vinod
>> >> > or Chris, but rather than to release anything directly from trunk.
>> >> >
>> >> > - This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again we're still doing alphas.
>> >> > No. I think we have to settle down this first. Without a common
>> agreed
>> >> and
>> >> > transparent release process and branches in community, any release
>> >> (alpha,
>> >> > beta) bits is only called a private release but not a official apache
>> >> > hadoop release (even alpha).
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Junping
>> >> > ________________________________________
>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>> >> > Sent: Friday, June 10, 2016 7:49 AM
>> >> > To: Andrew Wang
>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >> >
>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>> this
>> >> > across to a VOTE so it is clear.
>> >> >
>> >> > I see a few advantages shipping from trunk:
>> >> >
>> >> >    - The lack of need for one additional backport each time.
>> >> >    - Feature rot in trunk
>> >> >
>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>> can
>> >> > continue doing 3.x releases off branch-3 even after we move trunk to
>> 4.x
>> >> (I
>> >> > said it :))
>> >> >
>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>> andrew.wang@cloudera.com>
>> >> > wrote:
>> >> >
>> >> > > Hi all,
>> >> > >
>> >> > > On a separate thread, a question was raised about 3.x branching
>> and use
>> >> > of
>> >> > > feature branches going forward.
>> >> > >
>> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> >> > thread
>> >> > > that has spanned the years, with Vinod making this proposal
>> (building
>> >> on
>> >> > > ideas from others who also commented in the email thread):
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> >> > >
>> >> > > Pasting here for ease:
>> >> > >
>> >> > > On an unrelated note, offline I was pitching to a bunch of
>> >> > > contributors another idea to deal
>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>> directly*.
>> >> > >
>> >> > > What this gains us is that
>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>> >> > >  - We no longer have some code lying around in some branch (today’s
>> >> > > trunk) that is not releasable
>> >> > > because it gets mixed with other undesirable and incompatible
>> changes.
>> >> > >  - This needs to be coupled with more discipline on individual
>> >> > > features - medium to to large
>> >> > > features are always worked upon in branches and get merged into
>> trunk
>> >> > > (and a nearing release!)
>> >> > > when they are ready
>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>> >> > > branch and stay there till
>> >> > > we accumulate enough of those to warrant another major release.
>> >> > >
>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>> for
>> >> > 3.0.0,
>> >> > > there's no need for this branch yet. This aspect of Vinod's
>> proposal
>> >> was
>> >> > > still under a bit of discussion; Chris Douglas though we should
>> cut a
>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> >> > thinking.
>> >> > > This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again
>> >> > > we're still doing alphas.
>> >> > >
>> >> > > What we should get consensus on is the goal of keeping trunk
>> stable,
>> >> and
>> >> > > achieving that by doing more development on feature branches and
>> being
>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>> (and
>> >> the
>> >> > > more recent one on the async API) is that people are generally in
>> favor
>> >> > of
>> >> > > this.
>> >> > >
>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>> greatly
>> >> > > appreciate everyone's timely response in this matter.
>> >> > >
>> >> > > Thanks,
>> >> > > Andrew
>> >> > >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >> >
>> >> >
>> >>
>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
Let me try to clarify a few points, since not everyone might have been
present for the previous emails.

On the "Looking to a Hadoop 3 release" thread, we already reached consensus
on doing releases from trunk. People didn't want to have to commit to
another branch, and wanted to try releasing from trunk. The question, then,
was how to ensure that trunk remains stable and releasable.

Part of Vinod's proposal was that we, as a community, be more judicious
about what we commit to trunk, and try to make use of more feature branches
for larger efforts. There was no requirement that 1-2 patch changes go
through a feature branch. There weren't any requirements around # of
patches or length of development at all, just asking that committers be
more judicious. I personally think Sangjin's rule of thumb of ~12 patches
or ~1 month are about right, but it's up to the developers who are
involved, and I doubt any one standard will fit all situations.

So, this is about as low-overhead a policy there is: devs, please be
careful when committing to trunk, and consider using a feature branch for
bigger efforts.

If you have further ideas about how to improve stability of trunk, I'd love
to hear it. I'd hope though that the above would be a non-controversial
statement.

Best,
Andrew

On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:

> Thanks for your thoughts Anu.
>
> Regarding your question
>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something? so this will lead us
>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>
>
> Andrew mentioned in the original email
>
>> Regarding "trunk-incompat", since we're still in the alpha stage for
>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>> was still under a bit of discussion; Chris Douglas though we should cut a
>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>> This point doesn't necessarily need to be resolved now though, since again
>> we're still doing alphas.
>
>
> and I agree with that sentiment. I think even if we have a
> "trunk-incompat" branch to hold future incompatible changes, the situation
> will change little from today. Instead of dealing with "trunk" (where
> incompatible changes may appear) and "branch-3", we would be dealing with
> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>
>
> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
> wrote:
>
>> I actively work on two branches (Diskbalancer and ozone) and I agree with
>> most of what Sangjin said.
>> There is an overhead in working with branches, there are both technical
>> costs and administrative issues
>> which discourages developers from using branches.
>>
>> I think the biggest issue with branch based development is that fact that
>> other developers do not use a branch.
>> If a small feature appears as a series of commits to “”datanode.java””,
>> the branch based developer ends up rebasing
>> and paying this price of rebasing many times. If everyone followed a
>> model of branch + Pull request, other branches
>> would not have to deal with continues rebasing to trunk commits. If we
>> are moving to a branch based
>> development, we should probably move to that model for most development
>> to avoid this tax on people who
>>  actually end up working in the branches.
>>
>> I do have a question in my mind though: What is being proposed is that we
>> move active development to branches
>> if the feature is small or incomplete, however keep the trunk open for
>> check-ins. One of the biggest reason why we
>> check-in into trunk and not to branch-2 is because it is a change that
>> will break backward compatibility. So do we
>> have an expectation of backward compatibility thru the 3.0-alpha series
>> (I personally vote No, since 3.0 is experimental
>> at this stage), but if we decide to support some sort of backward-compact
>> then willy-nilly committing to trunk
>> and still maintaining the expectation we can release Alphas from 3.0 does
>> not look possible.
>>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something?
>> so this will lead us back to trunk being the unstable – 3.0 being the new
>> “branch-2”.
>>
>> One more point: If we are moving to use a branch always – then we are
>> looking at a model similar to using a git + pull
>> request model. If that is so would it make sense to modify the rules to
>> make these branches easier to merge?
>> Say for example, if all commits in a branch has followed review and
>> checking policy – just like trunk and commits
>> have been made only after a sign off from a committer, would it be
>> possible to merge with a 3-day voting period
>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>> people signing-off?
>>
>> What I am suggesting is reducing the administrative overheads of using a
>> branch to encourage use of branching.
>> Right now it feels like Apache’s process encourages committing directly
>> to trunk than a branch
>>
>> Thanks
>> Anu
>>
>>
>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>
>> >Having worked on a major feature in a feature branch, I have some
>> thoughts
>> >and observations on feature branch development.
>> >
>> >IMO feature branch development v. direct commits to trunk in piecemeal is
>> >really a choice of *granularity*. Do we want a series of fine-grained
>> state
>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>> >
>> >This makes me favor a branch-based development model for any
>> "decent-sized"
>> >features (we'll need to define "decent-sized" of course). Once you have
>> >coarse-grained changes, it's easier to reason about what made what
>> release
>> >and in what state. As importantly, it makes it easier to back out a
>> >complete feature fairly easily if that becomes necessary. My totally
>> >unscientific suggestion may be if a feature takes more than dozen commits
>> >and longer than a month, we should probably have a bias towards a feature
>> >branch.
>> >
>> >Branch-based development also makes you go faster if your feature is
>> >larger. I wouldn't do it the other way for timeline service v.2 for
>> example.
>> >
>> >That said, feature branches don't come for free. Now the onus is on the
>> >feature developer to constantly rebase with the trunk to keep it
>> reasonably
>> >integrated with the trunk. More logistics is involved for the feature
>> >developer. Another big question is, when a feature branch gets big and
>> it's
>> >time to merge, would it get as scrutinized as a series of individual
>> >commits? Since the size of merge can be big, you kind of have to rely on
>> >those feature committers and those who help them.
>> >
>> >In terms of integrating/stabilizing, I don't think branch development
>> >necessarily makes it harder. It is again granularity. In case of direct
>> >commits on trunk, you do a lot more fine-grained integrations. In case of
>> >branch development, you do far fewer coarse-grained integrations via
>> >rebasing. If more people are doing branch-based development, it makes
>> >rebasing easier to manage too.
>> >
>> >Going back to the related topic of where to release (trunk v. branch-X),
>> I
>> >think that is more of a proxy of the real question of "how do we maintain
>> >quality and stability of the trunk?". Even if we release from the trunk,
>> if
>> >our bar for merging to trunk is low, the quality will not improve
>> >automatically. So I think we ought to tackle the quality question first.
>> >
>> >My 2 cents.
>> >
>> >
>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>> >
>> >> Thanks for the notes Andrew, Junping, Karthik.
>> >>
>> >> Here are some of my understandings:
>> >>
>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>> use.
>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>> >> about durability, Hadoop might be gone in 50 years :). As a committer,
>> I
>> >> should be able to look at a patch and determine whether it's a
>> >> self-contained improvement of trunk, without looking at other
>> uncommitted
>> >> patches.
>> >> - Some comments inline:
>> >>
>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>> wrote:
>> >>
>> >> > Comparing with advantages, I believe the disadvantages of shipping
>> any
>> >> > releases directly from trunk are more obvious and significant:
>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>> have
>> >> > to wait to commit to trunk or put into a separated branch that could
>> >> delay
>> >> > feature development progress as additional vote process get involved
>> even
>> >> > the feature is simple and harmless.
>> >> >
>> >> Thanks Junping, those are valid concerns. I think we should clearly
>> >> separate incompatible with  uncompleted / half-done work in this
>> >> discussion. Whether people should commit incompatible changes to trunk
>> is a
>> >> much more tricky question (related to trunk-incompat etc.). But per my
>> >> comment above, IMHO, *not committing uncompleted work to trunk* should
>> be a
>> >> much easier principle to agree upon.
>> >>
>> >>
>> >> > - For small feature with only 1 or 2 commits, that need three +1 from
>> >> PMCs
>> >> > will increase the bar largely for contributors who just start to
>> >> contribute
>> >> > on Hadoop features but no such sufficient support.
>> >> >
>> >> Development overhead is another valid concern. I think our
>> rule-of-thumb
>> >> should be that, small-medium new features should be proposed as a
>> single
>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>> goes
>> >> beyond a single JIRA/patch, use a feature branch.
>> >>
>> >>
>> >> >
>> >> > Given these concerns, I am open to other options, like: proposed by
>> Vinod
>> >> > or Chris, but rather than to release anything directly from trunk.
>> >> >
>> >> > - This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again we're still doing alphas.
>> >> > No. I think we have to settle down this first. Without a common
>> agreed
>> >> and
>> >> > transparent release process and branches in community, any release
>> >> (alpha,
>> >> > beta) bits is only called a private release but not a official apache
>> >> > hadoop release (even alpha).
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Junping
>> >> > ________________________________________
>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>> >> > Sent: Friday, June 10, 2016 7:49 AM
>> >> > To: Andrew Wang
>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >> >
>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>> this
>> >> > across to a VOTE so it is clear.
>> >> >
>> >> > I see a few advantages shipping from trunk:
>> >> >
>> >> >    - The lack of need for one additional backport each time.
>> >> >    - Feature rot in trunk
>> >> >
>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>> can
>> >> > continue doing 3.x releases off branch-3 even after we move trunk to
>> 4.x
>> >> (I
>> >> > said it :))
>> >> >
>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>> andrew.wang@cloudera.com>
>> >> > wrote:
>> >> >
>> >> > > Hi all,
>> >> > >
>> >> > > On a separate thread, a question was raised about 3.x branching
>> and use
>> >> > of
>> >> > > feature branches going forward.
>> >> > >
>> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> >> > thread
>> >> > > that has spanned the years, with Vinod making this proposal
>> (building
>> >> on
>> >> > > ideas from others who also commented in the email thread):
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> >> > >
>> >> > > Pasting here for ease:
>> >> > >
>> >> > > On an unrelated note, offline I was pitching to a bunch of
>> >> > > contributors another idea to deal
>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>> directly*.
>> >> > >
>> >> > > What this gains us is that
>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>> >> > >  - We no longer have some code lying around in some branch (today’s
>> >> > > trunk) that is not releasable
>> >> > > because it gets mixed with other undesirable and incompatible
>> changes.
>> >> > >  - This needs to be coupled with more discipline on individual
>> >> > > features - medium to to large
>> >> > > features are always worked upon in branches and get merged into
>> trunk
>> >> > > (and a nearing release!)
>> >> > > when they are ready
>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>> >> > > branch and stay there till
>> >> > > we accumulate enough of those to warrant another major release.
>> >> > >
>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>> for
>> >> > 3.0.0,
>> >> > > there's no need for this branch yet. This aspect of Vinod's
>> proposal
>> >> was
>> >> > > still under a bit of discussion; Chris Douglas though we should
>> cut a
>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> >> > thinking.
>> >> > > This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again
>> >> > > we're still doing alphas.
>> >> > >
>> >> > > What we should get consensus on is the goal of keeping trunk
>> stable,
>> >> and
>> >> > > achieving that by doing more development on feature branches and
>> being
>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>> (and
>> >> the
>> >> > > more recent one on the async API) is that people are generally in
>> favor
>> >> > of
>> >> > > this.
>> >> > >
>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>> greatly
>> >> > > appreciate everyone's timely response in this matter.
>> >> > >
>> >> > > Thanks,
>> >> > > Andrew
>> >> > >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >> >
>> >> >
>> >>
>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
Let me try to clarify a few points, since not everyone might have been
present for the previous emails.

On the "Looking to a Hadoop 3 release" thread, we already reached consensus
on doing releases from trunk. People didn't want to have to commit to
another branch, and wanted to try releasing from trunk. The question, then,
was how to ensure that trunk remains stable and releasable.

Part of Vinod's proposal was that we, as a community, be more judicious
about what we commit to trunk, and try to make use of more feature branches
for larger efforts. There was no requirement that 1-2 patch changes go
through a feature branch. There weren't any requirements around # of
patches or length of development at all, just asking that committers be
more judicious. I personally think Sangjin's rule of thumb of ~12 patches
or ~1 month are about right, but it's up to the developers who are
involved, and I doubt any one standard will fit all situations.

So, this is about as low-overhead a policy there is: devs, please be
careful when committing to trunk, and consider using a feature branch for
bigger efforts.

If you have further ideas about how to improve stability of trunk, I'd love
to hear it. I'd hope though that the above would be a non-controversial
statement.

Best,
Andrew

On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:

> Thanks for your thoughts Anu.
>
> Regarding your question
>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something? so this will lead us
>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>
>
> Andrew mentioned in the original email
>
>> Regarding "trunk-incompat", since we're still in the alpha stage for
>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>> was still under a bit of discussion; Chris Douglas though we should cut a
>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>> This point doesn't necessarily need to be resolved now though, since again
>> we're still doing alphas.
>
>
> and I agree with that sentiment. I think even if we have a
> "trunk-incompat" branch to hold future incompatible changes, the situation
> will change little from today. Instead of dealing with "trunk" (where
> incompatible changes may appear) and "branch-3", we would be dealing with
> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>
>
> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
> wrote:
>
>> I actively work on two branches (Diskbalancer and ozone) and I agree with
>> most of what Sangjin said.
>> There is an overhead in working with branches, there are both technical
>> costs and administrative issues
>> which discourages developers from using branches.
>>
>> I think the biggest issue with branch based development is that fact that
>> other developers do not use a branch.
>> If a small feature appears as a series of commits to “”datanode.java””,
>> the branch based developer ends up rebasing
>> and paying this price of rebasing many times. If everyone followed a
>> model of branch + Pull request, other branches
>> would not have to deal with continues rebasing to trunk commits. If we
>> are moving to a branch based
>> development, we should probably move to that model for most development
>> to avoid this tax on people who
>>  actually end up working in the branches.
>>
>> I do have a question in my mind though: What is being proposed is that we
>> move active development to branches
>> if the feature is small or incomplete, however keep the trunk open for
>> check-ins. One of the biggest reason why we
>> check-in into trunk and not to branch-2 is because it is a change that
>> will break backward compatibility. So do we
>> have an expectation of backward compatibility thru the 3.0-alpha series
>> (I personally vote No, since 3.0 is experimental
>> at this stage), but if we decide to support some sort of backward-compact
>> then willy-nilly committing to trunk
>> and still maintaining the expectation we can release Alphas from 3.0 does
>> not look possible.
>>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something?
>> so this will lead us back to trunk being the unstable – 3.0 being the new
>> “branch-2”.
>>
>> One more point: If we are moving to use a branch always – then we are
>> looking at a model similar to using a git + pull
>> request model. If that is so would it make sense to modify the rules to
>> make these branches easier to merge?
>> Say for example, if all commits in a branch has followed review and
>> checking policy – just like trunk and commits
>> have been made only after a sign off from a committer, would it be
>> possible to merge with a 3-day voting period
>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>> people signing-off?
>>
>> What I am suggesting is reducing the administrative overheads of using a
>> branch to encourage use of branching.
>> Right now it feels like Apache’s process encourages committing directly
>> to trunk than a branch
>>
>> Thanks
>> Anu
>>
>>
>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>
>> >Having worked on a major feature in a feature branch, I have some
>> thoughts
>> >and observations on feature branch development.
>> >
>> >IMO feature branch development v. direct commits to trunk in piecemeal is
>> >really a choice of *granularity*. Do we want a series of fine-grained
>> state
>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>> >
>> >This makes me favor a branch-based development model for any
>> "decent-sized"
>> >features (we'll need to define "decent-sized" of course). Once you have
>> >coarse-grained changes, it's easier to reason about what made what
>> release
>> >and in what state. As importantly, it makes it easier to back out a
>> >complete feature fairly easily if that becomes necessary. My totally
>> >unscientific suggestion may be if a feature takes more than dozen commits
>> >and longer than a month, we should probably have a bias towards a feature
>> >branch.
>> >
>> >Branch-based development also makes you go faster if your feature is
>> >larger. I wouldn't do it the other way for timeline service v.2 for
>> example.
>> >
>> >That said, feature branches don't come for free. Now the onus is on the
>> >feature developer to constantly rebase with the trunk to keep it
>> reasonably
>> >integrated with the trunk. More logistics is involved for the feature
>> >developer. Another big question is, when a feature branch gets big and
>> it's
>> >time to merge, would it get as scrutinized as a series of individual
>> >commits? Since the size of merge can be big, you kind of have to rely on
>> >those feature committers and those who help them.
>> >
>> >In terms of integrating/stabilizing, I don't think branch development
>> >necessarily makes it harder. It is again granularity. In case of direct
>> >commits on trunk, you do a lot more fine-grained integrations. In case of
>> >branch development, you do far fewer coarse-grained integrations via
>> >rebasing. If more people are doing branch-based development, it makes
>> >rebasing easier to manage too.
>> >
>> >Going back to the related topic of where to release (trunk v. branch-X),
>> I
>> >think that is more of a proxy of the real question of "how do we maintain
>> >quality and stability of the trunk?". Even if we release from the trunk,
>> if
>> >our bar for merging to trunk is low, the quality will not improve
>> >automatically. So I think we ought to tackle the quality question first.
>> >
>> >My 2 cents.
>> >
>> >
>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>> >
>> >> Thanks for the notes Andrew, Junping, Karthik.
>> >>
>> >> Here are some of my understandings:
>> >>
>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>> use.
>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>> >> about durability, Hadoop might be gone in 50 years :). As a committer,
>> I
>> >> should be able to look at a patch and determine whether it's a
>> >> self-contained improvement of trunk, without looking at other
>> uncommitted
>> >> patches.
>> >> - Some comments inline:
>> >>
>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>> wrote:
>> >>
>> >> > Comparing with advantages, I believe the disadvantages of shipping
>> any
>> >> > releases directly from trunk are more obvious and significant:
>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>> have
>> >> > to wait to commit to trunk or put into a separated branch that could
>> >> delay
>> >> > feature development progress as additional vote process get involved
>> even
>> >> > the feature is simple and harmless.
>> >> >
>> >> Thanks Junping, those are valid concerns. I think we should clearly
>> >> separate incompatible with  uncompleted / half-done work in this
>> >> discussion. Whether people should commit incompatible changes to trunk
>> is a
>> >> much more tricky question (related to trunk-incompat etc.). But per my
>> >> comment above, IMHO, *not committing uncompleted work to trunk* should
>> be a
>> >> much easier principle to agree upon.
>> >>
>> >>
>> >> > - For small feature with only 1 or 2 commits, that need three +1 from
>> >> PMCs
>> >> > will increase the bar largely for contributors who just start to
>> >> contribute
>> >> > on Hadoop features but no such sufficient support.
>> >> >
>> >> Development overhead is another valid concern. I think our
>> rule-of-thumb
>> >> should be that, small-medium new features should be proposed as a
>> single
>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>> goes
>> >> beyond a single JIRA/patch, use a feature branch.
>> >>
>> >>
>> >> >
>> >> > Given these concerns, I am open to other options, like: proposed by
>> Vinod
>> >> > or Chris, but rather than to release anything directly from trunk.
>> >> >
>> >> > - This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again we're still doing alphas.
>> >> > No. I think we have to settle down this first. Without a common
>> agreed
>> >> and
>> >> > transparent release process and branches in community, any release
>> >> (alpha,
>> >> > beta) bits is only called a private release but not a official apache
>> >> > hadoop release (even alpha).
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Junping
>> >> > ________________________________________
>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>> >> > Sent: Friday, June 10, 2016 7:49 AM
>> >> > To: Andrew Wang
>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >> >
>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>> this
>> >> > across to a VOTE so it is clear.
>> >> >
>> >> > I see a few advantages shipping from trunk:
>> >> >
>> >> >    - The lack of need for one additional backport each time.
>> >> >    - Feature rot in trunk
>> >> >
>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>> can
>> >> > continue doing 3.x releases off branch-3 even after we move trunk to
>> 4.x
>> >> (I
>> >> > said it :))
>> >> >
>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>> andrew.wang@cloudera.com>
>> >> > wrote:
>> >> >
>> >> > > Hi all,
>> >> > >
>> >> > > On a separate thread, a question was raised about 3.x branching
>> and use
>> >> > of
>> >> > > feature branches going forward.
>> >> > >
>> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> >> > thread
>> >> > > that has spanned the years, with Vinod making this proposal
>> (building
>> >> on
>> >> > > ideas from others who also commented in the email thread):
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> >> > >
>> >> > > Pasting here for ease:
>> >> > >
>> >> > > On an unrelated note, offline I was pitching to a bunch of
>> >> > > contributors another idea to deal
>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>> directly*.
>> >> > >
>> >> > > What this gains us is that
>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>> >> > >  - We no longer have some code lying around in some branch (today’s
>> >> > > trunk) that is not releasable
>> >> > > because it gets mixed with other undesirable and incompatible
>> changes.
>> >> > >  - This needs to be coupled with more discipline on individual
>> >> > > features - medium to to large
>> >> > > features are always worked upon in branches and get merged into
>> trunk
>> >> > > (and a nearing release!)
>> >> > > when they are ready
>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>> >> > > branch and stay there till
>> >> > > we accumulate enough of those to warrant another major release.
>> >> > >
>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>> for
>> >> > 3.0.0,
>> >> > > there's no need for this branch yet. This aspect of Vinod's
>> proposal
>> >> was
>> >> > > still under a bit of discussion; Chris Douglas though we should
>> cut a
>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> >> > thinking.
>> >> > > This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again
>> >> > > we're still doing alphas.
>> >> > >
>> >> > > What we should get consensus on is the goal of keeping trunk
>> stable,
>> >> and
>> >> > > achieving that by doing more development on feature branches and
>> being
>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>> (and
>> >> the
>> >> > > more recent one on the async API) is that people are generally in
>> favor
>> >> > of
>> >> > > this.
>> >> > >
>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>> greatly
>> >> > > appreciate everyone's timely response in this matter.
>> >> > >
>> >> > > Thanks,
>> >> > > Andrew
>> >> > >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >> >
>> >> >
>> >>
>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Andrew Wang <an...@cloudera.com>.
Let me try to clarify a few points, since not everyone might have been
present for the previous emails.

On the "Looking to a Hadoop 3 release" thread, we already reached consensus
on doing releases from trunk. People didn't want to have to commit to
another branch, and wanted to try releasing from trunk. The question, then,
was how to ensure that trunk remains stable and releasable.

Part of Vinod's proposal was that we, as a community, be more judicious
about what we commit to trunk, and try to make use of more feature branches
for larger efforts. There was no requirement that 1-2 patch changes go
through a feature branch. There weren't any requirements around # of
patches or length of development at all, just asking that committers be
more judicious. I personally think Sangjin's rule of thumb of ~12 patches
or ~1 month are about right, but it's up to the developers who are
involved, and I doubt any one standard will fit all situations.

So, this is about as low-overhead a policy there is: devs, please be
careful when committing to trunk, and consider using a feature branch for
bigger efforts.

If you have further ideas about how to improve stability of trunk, I'd love
to hear it. I'd hope though that the above would be a non-controversial
statement.

Best,
Andrew

On Fri, Jun 10, 2016 at 2:10 PM, Sangjin Lee <sj...@apache.org> wrote:

> Thanks for your thoughts Anu.
>
> Regarding your question
>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something? so this will lead us
>> back to trunk being the unstable – 3.0 being the new “branch-2”.
>
>
> Andrew mentioned in the original email
>
>> Regarding "trunk-incompat", since we're still in the alpha stage for
>> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
>> was still under a bit of discussion; Chris Douglas though we should cut a
>> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
>> This point doesn't necessarily need to be resolved now though, since again
>> we're still doing alphas.
>
>
> and I agree with that sentiment. I think even if we have a
> "trunk-incompat" branch to hold future incompatible changes, the situation
> will change little from today. Instead of dealing with "trunk" (where
> incompatible changes may appear) and "branch-3", we would be dealing with
> "trunk-incompat" and "trunk". Names are largely mnemonics then.
>
>
> On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
> wrote:
>
>> I actively work on two branches (Diskbalancer and ozone) and I agree with
>> most of what Sangjin said.
>> There is an overhead in working with branches, there are both technical
>> costs and administrative issues
>> which discourages developers from using branches.
>>
>> I think the biggest issue with branch based development is that fact that
>> other developers do not use a branch.
>> If a small feature appears as a series of commits to “”datanode.java””,
>> the branch based developer ends up rebasing
>> and paying this price of rebasing many times. If everyone followed a
>> model of branch + Pull request, other branches
>> would not have to deal with continues rebasing to trunk commits. If we
>> are moving to a branch based
>> development, we should probably move to that model for most development
>> to avoid this tax on people who
>>  actually end up working in the branches.
>>
>> I do have a question in my mind though: What is being proposed is that we
>> move active development to branches
>> if the feature is small or incomplete, however keep the trunk open for
>> check-ins. One of the biggest reason why we
>> check-in into trunk and not to branch-2 is because it is a change that
>> will break backward compatibility. So do we
>> have an expectation of backward compatibility thru the 3.0-alpha series
>> (I personally vote No, since 3.0 is experimental
>> at this stage), but if we decide to support some sort of backward-compact
>> then willy-nilly committing to trunk
>> and still maintaining the expectation we can release Alphas from 3.0 does
>> not look possible.
>>
>> And then comes the question, once 3.0 becomes official, where do we
>> check-in a change,  if that would break something?
>> so this will lead us back to trunk being the unstable – 3.0 being the new
>> “branch-2”.
>>
>> One more point: If we are moving to use a branch always – then we are
>> looking at a model similar to using a git + pull
>> request model. If that is so would it make sense to modify the rules to
>> make these branches easier to merge?
>> Say for example, if all commits in a branch has followed review and
>> checking policy – just like trunk and commits
>> have been made only after a sign off from a committer, would it be
>> possible to merge with a 3-day voting period
>> instead of 7, or treat it just like today’s commit to trunk – but with 2
>> people signing-off?
>>
>> What I am suggesting is reducing the administrative overheads of using a
>> branch to encourage use of branching.
>> Right now it feels like Apache’s process encourages committing directly
>> to trunk than a branch
>>
>> Thanks
>> Anu
>>
>>
>> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
>> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>>
>> >Having worked on a major feature in a feature branch, I have some
>> thoughts
>> >and observations on feature branch development.
>> >
>> >IMO feature branch development v. direct commits to trunk in piecemeal is
>> >really a choice of *granularity*. Do we want a series of fine-grained
>> state
>> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
>> >
>> >This makes me favor a branch-based development model for any
>> "decent-sized"
>> >features (we'll need to define "decent-sized" of course). Once you have
>> >coarse-grained changes, it's easier to reason about what made what
>> release
>> >and in what state. As importantly, it makes it easier to back out a
>> >complete feature fairly easily if that becomes necessary. My totally
>> >unscientific suggestion may be if a feature takes more than dozen commits
>> >and longer than a month, we should probably have a bias towards a feature
>> >branch.
>> >
>> >Branch-based development also makes you go faster if your feature is
>> >larger. I wouldn't do it the other way for timeline service v.2 for
>> example.
>> >
>> >That said, feature branches don't come for free. Now the onus is on the
>> >feature developer to constantly rebase with the trunk to keep it
>> reasonably
>> >integrated with the trunk. More logistics is involved for the feature
>> >developer. Another big question is, when a feature branch gets big and
>> it's
>> >time to merge, would it get as scrutinized as a series of individual
>> >commits? Since the size of merge can be big, you kind of have to rely on
>> >those feature committers and those who help them.
>> >
>> >In terms of integrating/stabilizing, I don't think branch development
>> >necessarily makes it harder. It is again granularity. In case of direct
>> >commits on trunk, you do a lot more fine-grained integrations. In case of
>> >branch development, you do far fewer coarse-grained integrations via
>> >rebasing. If more people are doing branch-based development, it makes
>> >rebasing easier to manage too.
>> >
>> >Going back to the related topic of where to release (trunk v. branch-X),
>> I
>> >think that is more of a proxy of the real question of "how do we maintain
>> >quality and stability of the trunk?". Even if we release from the trunk,
>> if
>> >our bar for merging to trunk is low, the quality will not improve
>> >automatically. So I think we ought to tackle the quality question first.
>> >
>> >My 2 cents.
>> >
>> >
>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>> >
>> >> Thanks for the notes Andrew, Junping, Karthik.
>> >>
>> >> Here are some of my understandings:
>> >>
>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>> use.
>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>> >> about durability, Hadoop might be gone in 50 years :). As a committer,
>> I
>> >> should be able to look at a patch and determine whether it's a
>> >> self-contained improvement of trunk, without looking at other
>> uncommitted
>> >> patches.
>> >> - Some comments inline:
>> >>
>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com>
>> wrote:
>> >>
>> >> > Comparing with advantages, I believe the disadvantages of shipping
>> any
>> >> > releases directly from trunk are more obvious and significant:
>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>> have
>> >> > to wait to commit to trunk or put into a separated branch that could
>> >> delay
>> >> > feature development progress as additional vote process get involved
>> even
>> >> > the feature is simple and harmless.
>> >> >
>> >> Thanks Junping, those are valid concerns. I think we should clearly
>> >> separate incompatible with  uncompleted / half-done work in this
>> >> discussion. Whether people should commit incompatible changes to trunk
>> is a
>> >> much more tricky question (related to trunk-incompat etc.). But per my
>> >> comment above, IMHO, *not committing uncompleted work to trunk* should
>> be a
>> >> much easier principle to agree upon.
>> >>
>> >>
>> >> > - For small feature with only 1 or 2 commits, that need three +1 from
>> >> PMCs
>> >> > will increase the bar largely for contributors who just start to
>> >> contribute
>> >> > on Hadoop features but no such sufficient support.
>> >> >
>> >> Development overhead is another valid concern. I think our
>> rule-of-thumb
>> >> should be that, small-medium new features should be proposed as a
>> single
>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>> goes
>> >> beyond a single JIRA/patch, use a feature branch.
>> >>
>> >>
>> >> >
>> >> > Given these concerns, I am open to other options, like: proposed by
>> Vinod
>> >> > or Chris, but rather than to release anything directly from trunk.
>> >> >
>> >> > - This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again we're still doing alphas.
>> >> > No. I think we have to settle down this first. Without a common
>> agreed
>> >> and
>> >> > transparent release process and branches in community, any release
>> >> (alpha,
>> >> > beta) bits is only called a private release but not a official apache
>> >> > hadoop release (even alpha).
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Junping
>> >> > ________________________________________
>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>> >> > Sent: Friday, June 10, 2016 7:49 AM
>> >> > To: Andrew Wang
>> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> >> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >> >
>> >> > Thanks for restarting this thread Andrew. I really hope we can get
>> this
>> >> > across to a VOTE so it is clear.
>> >> >
>> >> > I see a few advantages shipping from trunk:
>> >> >
>> >> >    - The lack of need for one additional backport each time.
>> >> >    - Feature rot in trunk
>> >> >
>> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
>> can
>> >> > continue doing 3.x releases off branch-3 even after we move trunk to
>> 4.x
>> >> (I
>> >> > said it :))
>> >> >
>> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
>> andrew.wang@cloudera.com>
>> >> > wrote:
>> >> >
>> >> > > Hi all,
>> >> > >
>> >> > > On a separate thread, a question was raised about 3.x branching
>> and use
>> >> > of
>> >> > > feature branches going forward.
>> >> > >
>> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> >> > thread
>> >> > > that has spanned the years, with Vinod making this proposal
>> (building
>> >> on
>> >> > > ideas from others who also commented in the email thread):
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >>
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> >> > >
>> >> > > Pasting here for ease:
>> >> > >
>> >> > > On an unrelated note, offline I was pitching to a bunch of
>> >> > > contributors another idea to deal
>> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
>> directly*.
>> >> > >
>> >> > > What this gains us is that
>> >> > >  - Trunk is always nearly stable or nearly ready for releases
>> >> > >  - We no longer have some code lying around in some branch (today’s
>> >> > > trunk) that is not releasable
>> >> > > because it gets mixed with other undesirable and incompatible
>> changes.
>> >> > >  - This needs to be coupled with more discipline on individual
>> >> > > features - medium to to large
>> >> > > features are always worked upon in branches and get merged into
>> trunk
>> >> > > (and a nearing release!)
>> >> > > when they are ready
>> >> > >  - All incompatible changes go into some sort of a trunk-incompat
>> >> > > branch and stay there till
>> >> > > we accumulate enough of those to warrant another major release.
>> >> > >
>> >> > > Regarding "trunk-incompat", since we're still in the alpha stage
>> for
>> >> > 3.0.0,
>> >> > > there's no need for this branch yet. This aspect of Vinod's
>> proposal
>> >> was
>> >> > > still under a bit of discussion; Chris Douglas though we should
>> cut a
>> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> >> > thinking.
>> >> > > This point doesn't necessarily need to be resolved now though,
>> since
>> >> > again
>> >> > > we're still doing alphas.
>> >> > >
>> >> > > What we should get consensus on is the goal of keeping trunk
>> stable,
>> >> and
>> >> > > achieving that by doing more development on feature branches and
>> being
>> >> > > judicious about merges. My sense from the Hadoop 3 email thread
>> (and
>> >> the
>> >> > > more recent one on the async API) is that people are generally in
>> favor
>> >> > of
>> >> > > this.
>> >> > >
>> >> > > We're just about ready to do the first 3.0.0 alpha, so would
>> greatly
>> >> > > appreciate everyone's timely response in this matter.
>> >> > >
>> >> > > Thanks,
>> >> > > Andrew
>> >> > >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >> >
>> >> >
>> >>
>>
>>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Thanks for your thoughts Anu.

Regarding your question

> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something? so this will lead us
> back to trunk being the unstable – 3.0 being the new “branch-2”.


Andrew mentioned in the original email

> Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
> was still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.


and I agree with that sentiment. I think even if we have a "trunk-incompat"
branch to hold future incompatible changes, the situation will change
little from today. Instead of dealing with "trunk" (where incompatible
changes may appear) and "branch-3", we would be dealing with
"trunk-incompat" and "trunk". Names are largely mnemonics then.


On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
wrote:

> I actively work on two branches (Diskbalancer and ozone) and I agree with
> most of what Sangjin said.
> There is an overhead in working with branches, there are both technical
> costs and administrative issues
> which discourages developers from using branches.
>
> I think the biggest issue with branch based development is that fact that
> other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””,
> the branch based developer ends up rebasing
> and paying this price of rebasing many times. If everyone followed a model
> of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are
> moving to a branch based
> development, we should probably move to that model for most development to
> avoid this tax on people who
>  actually end up working in the branches.
>
> I do have a question in my mind though: What is being proposed is that we
> move active development to branches
> if the feature is small or incomplete, however keep the trunk open for
> check-ins. One of the biggest reason why we
> check-in into trunk and not to branch-2 is because it is a change that
> will break backward compatibility. So do we
> have an expectation of backward compatibility thru the 3.0-alpha series (I
> personally vote No, since 3.0 is experimental
> at this stage), but if we decide to support some sort of backward-compact
> then willy-nilly committing to trunk
> and still maintaining the expectation we can release Alphas from 3.0 does
> not look possible.
>
> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something?
> so this will lead us back to trunk being the unstable – 3.0 being the new
> “branch-2”.
>
> One more point: If we are moving to use a branch always – then we are
> looking at a model similar to using a git + pull
> request model. If that is so would it make sense to modify the rules to
> make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and
> checking policy – just like trunk and commits
> have been made only after a sign off from a committer, would it be
> possible to merge with a 3-day voting period
> instead of 7, or treat it just like today’s commit to trunk – but with 2
> people signing-off?
>
> What I am suggesting is reducing the administrative overheads of using a
> branch to encourage use of branching.
> Right now it feels like Apache’s process encourages committing directly to
> trunk than a branch
>
> Thanks
> Anu
>
>
> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>
> >Having worked on a major feature in a feature branch, I have some thoughts
> >and observations on feature branch development.
> >
> >IMO feature branch development v. direct commits to trunk in piecemeal is
> >really a choice of *granularity*. Do we want a series of fine-grained
> state
> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
> >
> >This makes me favor a branch-based development model for any
> "decent-sized"
> >features (we'll need to define "decent-sized" of course). Once you have
> >coarse-grained changes, it's easier to reason about what made what release
> >and in what state. As importantly, it makes it easier to back out a
> >complete feature fairly easily if that becomes necessary. My totally
> >unscientific suggestion may be if a feature takes more than dozen commits
> >and longer than a month, we should probably have a bias towards a feature
> >branch.
> >
> >Branch-based development also makes you go faster if your feature is
> >larger. I wouldn't do it the other way for timeline service v.2 for
> example.
> >
> >That said, feature branches don't come for free. Now the onus is on the
> >feature developer to constantly rebase with the trunk to keep it
> reasonably
> >integrated with the trunk. More logistics is involved for the feature
> >developer. Another big question is, when a feature branch gets big and
> it's
> >time to merge, would it get as scrutinized as a series of individual
> >commits? Since the size of merge can be big, you kind of have to rely on
> >those feature committers and those who help them.
> >
> >In terms of integrating/stabilizing, I don't think branch development
> >necessarily makes it harder. It is again granularity. In case of direct
> >commits on trunk, you do a lot more fine-grained integrations. In case of
> >branch development, you do far fewer coarse-grained integrations via
> >rebasing. If more people are doing branch-based development, it makes
> >rebasing easier to manage too.
> >
> >Going back to the related topic of where to release (trunk v. branch-X), I
> >think that is more of a proxy of the real question of "how do we maintain
> >quality and stability of the trunk?". Even if we release from the trunk,
> if
> >our bar for merging to trunk is low, the quality will not improve
> >automatically. So I think we ought to tackle the quality question first.
> >
> >My 2 cents.
> >
> >
> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
> >
> >> Thanks for the notes Andrew, Junping, Karthik.
> >>
> >> Here are some of my understandings:
> >>
> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> >> Hadoop today, without legacy workloads, trunk is what he/she should use.
> >> - Therefore, each commit to trunk should be transactional -- atomic,
> >> consistent, isolated (from other uncommitted patches); I'm not so sure
> >> about durability, Hadoop might be gone in 50 years :). As a committer, I
> >> should be able to look at a patch and determine whether it's a
> >> self-contained improvement of trunk, without looking at other
> uncommitted
> >> patches.
> >> - Some comments inline:
> >>
> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
> >>
> >> > Comparing with advantages, I believe the disadvantages of shipping any
> >> > releases directly from trunk are more obvious and significant:
> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
> have
> >> > to wait to commit to trunk or put into a separated branch that could
> >> delay
> >> > feature development progress as additional vote process get involved
> even
> >> > the feature is simple and harmless.
> >> >
> >> Thanks Junping, those are valid concerns. I think we should clearly
> >> separate incompatible with  uncompleted / half-done work in this
> >> discussion. Whether people should commit incompatible changes to trunk
> is a
> >> much more tricky question (related to trunk-incompat etc.). But per my
> >> comment above, IMHO, *not committing uncompleted work to trunk* should
> be a
> >> much easier principle to agree upon.
> >>
> >>
> >> > - For small feature with only 1 or 2 commits, that need three +1 from
> >> PMCs
> >> > will increase the bar largely for contributors who just start to
> >> contribute
> >> > on Hadoop features but no such sufficient support.
> >> >
> >> Development overhead is another valid concern. I think our rule-of-thumb
> >> should be that, small-medium new features should be proposed as a single
> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> >> beyond a single JIRA/patch, use a feature branch.
> >>
> >>
> >> >
> >> > Given these concerns, I am open to other options, like: proposed by
> Vinod
> >> > or Chris, but rather than to release anything directly from trunk.
> >> >
> >> > - This point doesn't necessarily need to be resolved now though, since
> >> > again we're still doing alphas.
> >> > No. I think we have to settle down this first. Without a common agreed
> >> and
> >> > transparent release process and branches in community, any release
> >> (alpha,
> >> > beta) bits is only called a private release but not a official apache
> >> > hadoop release (even alpha).
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Junping
> >> > ________________________________________
> >> > From: Karthik Kambatla <ka...@cloudera.com>
> >> > Sent: Friday, June 10, 2016 7:49 AM
> >> > To: Andrew Wang
> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> >> > Subject: Re: [DISCUSS] Increased use of feature branches
> >> >
> >> > Thanks for restarting this thread Andrew. I really hope we can get
> this
> >> > across to a VOTE so it is clear.
> >> >
> >> > I see a few advantages shipping from trunk:
> >> >
> >> >    - The lack of need for one additional backport each time.
> >> >    - Feature rot in trunk
> >> >
> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
> can
> >> > continue doing 3.x releases off branch-3 even after we move trunk to
> 4.x
> >> (I
> >> > said it :))
> >> >
> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > On a separate thread, a question was raised about 3.x branching and
> use
> >> > of
> >> > > feature branches going forward.
> >> > >
> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> >> > thread
> >> > > that has spanned the years, with Vinod making this proposal
> (building
> >> on
> >> > > ideas from others who also commented in the email thread):
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >> > >
> >> > > Pasting here for ease:
> >> > >
> >> > > On an unrelated note, offline I was pitching to a bunch of
> >> > > contributors another idea to deal
> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
> directly*.
> >> > >
> >> > > What this gains us is that
> >> > >  - Trunk is always nearly stable or nearly ready for releases
> >> > >  - We no longer have some code lying around in some branch (today’s
> >> > > trunk) that is not releasable
> >> > > because it gets mixed with other undesirable and incompatible
> changes.
> >> > >  - This needs to be coupled with more discipline on individual
> >> > > features - medium to to large
> >> > > features are always worked upon in branches and get merged into
> trunk
> >> > > (and a nearing release!)
> >> > > when they are ready
> >> > >  - All incompatible changes go into some sort of a trunk-incompat
> >> > > branch and stay there till
> >> > > we accumulate enough of those to warrant another major release.
> >> > >
> >> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> >> > 3.0.0,
> >> > > there's no need for this branch yet. This aspect of Vinod's proposal
> >> was
> >> > > still under a bit of discussion; Chris Douglas though we should cut
> a
> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> >> > thinking.
> >> > > This point doesn't necessarily need to be resolved now though, since
> >> > again
> >> > > we're still doing alphas.
> >> > >
> >> > > What we should get consensus on is the goal of keeping trunk stable,
> >> and
> >> > > achieving that by doing more development on feature branches and
> being
> >> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> >> the
> >> > > more recent one on the async API) is that people are generally in
> favor
> >> > of
> >> > > this.
> >> > >
> >> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> >> > > appreciate everyone's timely response in this matter.
> >> > >
> >> > > Thanks,
> >> > > Andrew
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >> >
> >> >
> >>
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by "Gangumalla, Uma" <um...@intel.com>.

On 6/13/16, 12:41 PM, "Anu Engineer" <ae...@hortonworks.com> wrote:

>Hi Colin,
>
>>Even if everyone used branches for all development, person X might merge
>>their branch before person Y, forcing person Y to do a rebase or merge.
>>It is not the presence of absence of branches that causes the need to
>>merge or rebase, but the presence of absence of "churn."
>
>You are perfectly right on this technically. The issue is when a
>branch developer gets caught in Commit, Revert, let-us-commit-again,
>oh-it-is-not-fixed-completely, let-us-revert-the-revert cycle.
>
>I was hoping that branches will be exposed to less of this if everyone
>had private branches and got some time to test and bake the feature
>instead of just directly committing to trunk and then test.
>
>Once again, I agree with your point that in a perfect world, merges should
>be about the churn, but trunk is often treated as development branch,
>So my point is that it gets unnecessary churn. I really appreciate the
>thought in the thread - that is - let us be more responsible about how we
>treat trunk.
>
>> I thought the feature branch merge voting period had been shortened to 5
>>days rather than 7?  We should probably spell this out on
>>https://hadoop.apache.org/bylaws.html
>
>Thanks for the link, right now it says 7 days. That is why I assumed it
>is 7. 
>Would you be kind enough to point me to a thread that says it is 5 days
>for a merge Vote? 
>I did a google search, but was not able to find a thread like that.
>Thanks in advance.
I remember 5days voting was related to release. Not sure that time we
discussed about branch merge voting time.
Here is the link: 
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201406.mbox/%3C64A
2C234-DD6A-4E4C-B52D-E91D5D472456@hortonworks.com%3E
>
>Thanks
>Anu
>
>
>On 6/13/16, 11:51 AM, "Colin McCabe" <cm...@apache.org> wrote:
>
>>On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
>>> > On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com>
>>>wrote:
>>> > 
>>> > I actively work on two branches (Diskbalancer and ozone) and I agree
>>>with most of what Sangjin said.
>>> > There is an overhead in working with branches, there are both
>>>technical costs and administrative issues
>>> > which discourages developers from using branches.
>>> > 
>>> > I think the biggest issue with branch based development is that fact
>>>that other developers do not use a branch.
>>> > If a small feature appears as a series of commits to
>>>“”datanode.java””, the branch based developer ends up rebasing
>>> > and paying this price of rebasing many times. If everyone followed a
>>>model of branch + Pull request, other branches
>>> > would not have to deal with continues rebasing to trunk commits. If
>>>we are moving to a branch based
>>
>>Even if everyone used branches for all development, person X might merge
>>their branch before person Y, forcing person Y to do a rebase or merge.
>>It is not the presence of absence of branches that causes the need to
>>merge or rebase, but the presence of absence of "churn."
>>
>>We try to minimize "churn" in many ways.  For example, we discourage
>>people from making trivial whitespace changes to parts of the code
>>they're not modifying in their patch.  Or doing things like letting
>>their editor change the line ending of files from LF to CR/LF.  However,
>>in the final analysis, churn will always exist because development
>>exists.
>>
>>> > development, we should probably move to that model for most
>>>development to avoid this tax on people who
>>> > actually end up working in the branches.
>>> > 
>>> > I do have a question in my mind though: What is being proposed is
>>>that we move active development to branches
>>> > if the feature is small or incomplete, however keep the trunk open
>>>for check-ins. One of the biggest reason why we
>>> > check-in into trunk and not to branch-2 is because it is a change
>>>that will break backward compatibility. So do we
>>> > have an expectation of backward compatibility thru the 3.0-alpha
>>>series (I personally vote No, since 3.0 is experimental
>>> > at this stage), but if we decide to support some sort of
>>>backward-compact then willy-nilly committing to trunk
>>> > and still maintaining the expectation we can release Alphas from 3.0
>>>does not look possible.
>>> > 
>>> > And then comes the question, once 3.0 becomes official, where do we
>>>check-in a change,  if that would break something?
>>> > so this will lead us back to trunk being the unstable – 3.0 being
>>>the new “branch-2”.
>>
>>I'm not sure I really understand the goal of the "trunk-incompat"
>>proposal.  Like Karthik asked earlier in this thread, isn't it really
>>just a rename of the existing trunk branch?
>>It sounds like the policy is going to be exactly the same as now:
>>incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
>>changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
>>etc.
>>
>>I think we should just create branch-3 and follow the same policy we
>>followed with branch-2 and branch-1.  Switching around the names doesn't
>>really change the policy, and it creates confusion since it's
>>inconsistent with what we did earlier.
>>
>>I think one of the big frustrations with trunk is that features sat
>>there a while without being released because they weren't compatible
>>with branch-2-- the shell script rewrite, for example.  However, this
>>reflects a fundamental tradeoff-- either incompatible features can't be
>>developed at all in the lifetime of Hadoop 3.x, or we will need
>>somewhere to put them.  The trunk-incompat proposal is like saying that
>>you've solved the prison overcrowding problem by renaming all prisons to
>>"correctional facilities."
>>
>>> > 
>>> > One more point: If we are moving to use a branch always – then we
>>>are looking at a model similar to using a git + pull
>>> > request model. If that is so would it make sense to modify the rules
>>>to make these branches easier to merge?
>>> > Say for example, if all commits in a branch has followed review and
>>>checking policy – just like trunk and commits
>>> > have been made only after a sign off from a committer, would it be
>>>possible to merge with a 3-day voting period
>>> > instead of 7, or treat it just like today’s commit to trunk –
>>>but with 2 people signing-off?
>>
>>I thought the feature branch merge voting period had been shortened to 5
>>days rather than 7?  We should probably spell this out on
>>https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
>>believe that *all* development should be on feature branches, just
>>biggish stuff that is likely to be controversial and/or disruptive.  The
>>suggestion I made earlier is that if 3 people ask you for a branch, you
>>should definitely strongly consider a branch.
>>
>>I do think we should shorten the voting period for adding new branch
>>committers... making it 3 or 4 days would be fine.  After all, the work
>>of branch committers is reviewed during the merge in any case.
>>
>>best,
>>Colin
>>
>>
>>> > 
>>> > What I am suggesting is reducing the administrative overheads of
>>>using a branch to encourage use of branching.
>>> > Right now it feels like Apache’s process encourages committing
>>>directly to trunk than a branch
>>> > 
>>> > Thanks
>>> > Anu
>>> 
>>> 
>>> It's a per project process. In slider, we've used a git flow: all work
>>> goes in a feature branch, then merge in with a merge point. This gives
>>>a
>>> better history of workflow, as an individual body of work is an ordered
>>> sequence of operations, independent of everything else. This makes
>>>cherry
>>> picking a sequence easier, it even makes unrolling a series of changes
>>> easier: until the entire set of changes is committed, there is nothing
>>>to
>>> back out.
>>> 
>>> 1. there's the rebase/merge problem: coping with conflicting change.
>>> Rebasing helps, but makes team dev complex. And, if there are big
>>> conflict changes, its often easier to take the current diff with trunk
>>> branch and reapply it than try to rebase a sequence of operations. You
>>> don't always need to rebase though; an FB can repeatedly merge in
>>>trunk,
>>> for a history which may not be self contained, but does isolate the
>>> feature dev from everyone else's work.
>>> 
>>> 2. Changes don't get exposed more broadly until the feature is in. That
>>> may reduce review, but for those of us who work on downstream code it
>>> means: nothing breaks until the complete feature is in. You may not
>>> realise it, but those of us who do compile downstream things (slider,
>>> spark) against even branch-2 always fear discovering what's just broken
>>> at the API level alone. And that's "the stable branch". I haven't dared
>>> build against trunk for a while.
>>> 
>>> 3. It's a real PITA trying to do development which spans >1 feature
>>> branch. Even today it's tricky with code spanning >1 patch
>>>(HADOOP-13207
>>> and HADOP-13208 this weekend). There I'm working in one branch and
>>> generating two separate patches. That's hard to do in a single feature
>>> branch.,
>>> 
>>> 4. The rules for feature branch merge. If I get a patch into trunk,
>>>it's
>>> in the codebase. If I get it into a feature branch, there's the risk
>>>the
>>> entire feature branch doesn't get in. Fix: for short lived feature
>>> branches, we have an RTC policy strict enough we can say "if a feature
>>> branch commit is in. it's considered good enough, even if a few more
>>> successor commits are required before the whole sequence of commits are
>>> considered stable.
>>> 
>>> 5. If you do lots of incremental patches (as feature branches
>>>encourage),
>>> the patch history gets very noisy. Maybe here the patches can be rolled
>>> up for the final commit. This is how Spark works.
>>> 
>>> 6. Jenkins doesn't test feature branches today. Can yetus do this if I
>>> give a name of any branch? If so, for a feature branch of > 1w we could
>>> just fork the trunk jenkins builds too, but have it only email the
>>> committers.
>>> 
>>> 7. That final merge process needs to be rigorous from the regression
>>> testing perspective. the last commit on a feature branch should be the
>>> one to
>>> 
>>> Feature branches need to be short lived to cope with change well. And
>>>if
>>> you are doing fundamental changes (e.g core APIs), there is some
>>> incentive to get that common feature in, while you still get the full
>>> implementation stable in a feature branch. But: you'd be better be
>>> confident that the stuff in trunk isn't going to break. Nobody gets to
>>> break the main build —or at least not for longer than it takes for
>>>the
>>> merge to be reverted.
>>> 
>>> I think maybe we should try doing very-short-lived feature branches,
>>>with
>>> a simple policy:
>>> 
>>> -self contained patch which delivers a complete feature/fix: single
>>> patch. These are things where it means
>>> 
>>> -something which is an intermediate step to delivering something: part
>>>of
>>> a feature branch. A branch where the process for committing patches is
>>>as
>>> rigorous as for trunk —so there's no ambiguity about *whether* a
>>> feature is merged in, only *when*
>>> 
>>> 
>>> 
>>> 
>>> 
>>>?B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKC
>>>B•?È?[œÝXœØܚX™K??K[XZ[?ˆ???œËY?]‹][œÝXœØܚX™P??Y?ÛÜ?˜\?XÚ?K›Ü™ÃB‘›Üˆ?Y?
>>>?]?[ۘ[??ÛÛ[X[™?Ë??K[XZ[?ˆ???œËY?]‹Z?[????Y?ÛÜ?˜\?XÚ?K›Ü™ÃBƒ
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>>
>>
>
>?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB�
>?�?[��X��ܚX�K??K[XZ[?�???��Y?]�][��X��ܚX�P??Y?��?�\?X�?K�ܙ�B��܈?Y??]?[ۘ[??
>��[X[�?�??K[XZ[?�???��Y?]�Z?[????Y?��?�\?X�?K�ܙ�B�


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Steve Loughran <st...@hortonworks.com>.
> On 13 Jun 2016, at 22:49, Colin McCabe <cm...@apache.org> wrote:
> 
> Feature branch code will receive fewer test runs
> since it's not tested in every precommit build like trunk code is.  I do
> agree that good and well-thought out tests should be a precondition of
> merging any big feature branch.  

That's fixable in Jenkins and Yetus

> But we have to expect that merges will
> be destabilizing in Hadoop, just like in every other software project
> out there.

true. The cost of a merge is generally a function of patches and duration of branch (which is really the O(#of other patches merged in to trunk)


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Colin McCabe <cm...@apache.org>.
On Mon, Jun 13, 2016, at 12:41, Anu Engineer wrote:
> Hi Colin,
> 
> >Even if everyone used branches for all development, person X might merge
> >their branch before person Y, forcing person Y to do a rebase or merge. 
> >It is not the presence of absence of branches that causes the need to
> >merge or rebase, but the presence of absence of "churn."
> 
> You are perfectly right on this technically. The issue is when a 
> branch developer gets caught in Commit, Revert, let-us-commit-again, 
> oh-it-is-not-fixed-completely, let-us-revert-the-revert cycle. 
> 
> I was hoping that branches will be exposed to less of this if everyone 
> had private branches and got some time to test and bake the feature 
> instead of just directly committing to trunk and then test.

To be fair to developers, when something becomes problematic after it
gets committed, it's usually because of something that didn't show up in
testing on a private branch.  For example, maybe unit tests fail
occasionally with JDK7 instead of JDK8 (but the developer wasn't using
JDK7, so how would he know?)  Maybe there's a flaky test that shows up
when the test machine is overloaded (but the developer's machine wasn't
overloaded, so how would he see this?)  Maybe there's some interaction
with a new feature that just got added in trunk.  And so on.

> Once again, I agree with your point that in a perfect world, merges
> should
> be about the churn, but trunk is often treated as development branch, 
> So my point is that it gets unnecessary churn. I really appreciate the 
> thought in the thread - that is - let us be more responsible about how we
> treat trunk.

I think assuming that we will catch all bugs before branch merge is the
"perfect world" view, and accepting that some of them will get through
is the realistic view.  Feature branch code will receive fewer test runs
since it's not tested in every precommit build like trunk code is.  I do
agree that good and well-thought out tests should be a precondition of
merging any big feature branch.   But we have to expect that merges will
be destabilizing in Hadoop, just like in every other software project
out there.

Trunk *is* a development branch, and should be treated as such.  Not
everything that hits trunk needs to immediately hit the stable branches.
 It's OK for there to be some experimentation, as long as developers
make a strong effort to test things thoroughly and avoid flaky or
time-dependent tests.

> 
> > I thought the feature branch merge voting period had been shortened to 5
> >days rather than 7?  We should probably spell this out on
> >https://hadoop.apache.org/bylaws.html 
> 
> Thanks for the link, right now it says 7 days. That is why I assumed it
> is 7. 
> Would you be kind enough to point me to a thread that says it is 5 days
> for a merge Vote? 
> I did a google search, but was not able to find a thread like that.
> Thanks in advance.

Hmm, perhaps I was thinking of the release vote process.  Can anyone
confirm?  It would be nice if this information could appear on the
bylaws page...

best,
Colin


> 
> Thanks
> Anu
> 
> 
> On 6/13/16, 11:51 AM, "Colin McCabe" <cm...@apache.org> wrote:
> 
> >On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
> >> > On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com> wrote:
> >> > 
> >> > I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
> >> > There is an overhead in working with branches, there are both technical costs and administrative issues 
> >> > which discourages developers from using branches.
> >> > 
> >> > I think the biggest issue with branch based development is that fact that other developers do not use a branch.
> >> > If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
> >> > and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
> >> > would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
> >
> >Even if everyone used branches for all development, person X might merge
> >their branch before person Y, forcing person Y to do a rebase or merge. 
> >It is not the presence of absence of branches that causes the need to
> >merge or rebase, but the presence of absence of "churn."
> >
> >We try to minimize "churn" in many ways.  For example, we discourage
> >people from making trivial whitespace changes to parts of the code
> >they're not modifying in their patch.  Or doing things like letting
> >their editor change the line ending of files from LF to CR/LF.  However,
> >in the final analysis, churn will always exist because development
> >exists.
> >
> >> > development, we should probably move to that model for most development to avoid this tax on people who
> >> > actually end up working in the branches.
> >> > 
> >> > I do have a question in my mind though: What is being proposed is that we move active development to branches 
> >> > if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
> >> > check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
> >> > have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
> >> > at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
> >> > and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
> >> > 
> >> > And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
> >> > so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.
> >
> >I'm not sure I really understand the goal of the "trunk-incompat"
> >proposal.  Like Karthik asked earlier in this thread, isn't it really
> >just a rename of the existing trunk branch?
> >It sounds like the policy is going to be exactly the same as now:
> >incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
> >changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
> >etc.
> >
> >I think we should just create branch-3 and follow the same policy we
> >followed with branch-2 and branch-1.  Switching around the names doesn't
> >really change the policy, and it creates confusion since it's
> >inconsistent with what we did earlier.
> >
> >I think one of the big frustrations with trunk is that features sat
> >there a while without being released because they weren't compatible
> >with branch-2-- the shell script rewrite, for example.  However, this
> >reflects a fundamental tradeoff-- either incompatible features can't be
> >developed at all in the lifetime of Hadoop 3.x, or we will need
> >somewhere to put them.  The trunk-incompat proposal is like saying that
> >you've solved the prison overcrowding problem by renaming all prisons to
> >"correctional facilities."
> >
> >> > 
> >> > One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
> >> > request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
> >> > Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
> >> > have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
> >> > instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 
> >
> >I thought the feature branch merge voting period had been shortened to 5
> >days rather than 7?  We should probably spell this out on
> >https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
> >believe that *all* development should be on feature branches, just
> >biggish stuff that is likely to be controversial and/or disruptive.  The
> >suggestion I made earlier is that if 3 people ask you for a branch, you
> >should definitely strongly consider a branch.
> >
> >I do think we should shorten the voting period for adding new branch
> >committers... making it 3 or 4 days would be fine.  After all, the work
> >of branch committers is reviewed during the merge in any case.
> >
> >best,
> >Colin
> >
> >
> >> > 
> >> > What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
> >> > Right now it feels like Apache’s process encourages committing directly to trunk than a branch
> >> > 
> >> > Thanks
> >> > Anu
> >> 
> >> 
> >> It's a per project process. In slider, we've used a git flow: all work
> >> goes in a feature branch, then merge in with a merge point. This gives a
> >> better history of workflow, as an individual body of work is an ordered
> >> sequence of operations, independent of everything else. This makes cherry
> >> picking a sequence easier, it even makes unrolling a series of changes
> >> easier: until the entire set of changes is committed, there is nothing to
> >> back out.
> >> 
> >> 1. there's the rebase/merge problem: coping with conflicting change.
> >> Rebasing helps, but makes team dev complex. And, if there are big
> >> conflict changes, its often easier to take the current diff with trunk
> >> branch and reapply it than try to rebase a sequence of operations. You
> >> don't always need to rebase though; an FB can repeatedly merge in trunk,
> >> for a history which may not be self contained, but does isolate the
> >> feature dev from everyone else's work.
> >> 
> >> 2. Changes don't get exposed more broadly until the feature is in. That
> >> may reduce review, but for those of us who work on downstream code it
> >> means: nothing breaks until the complete feature is in. You may not
> >> realise it, but those of us who do compile downstream things (slider,
> >> spark) against even branch-2 always fear discovering what's just broken
> >> at the API level alone. And that's "the stable branch". I haven't dared
> >> build against trunk for a while.
> >> 
> >> 3. It's a real PITA trying to do development which spans >1 feature
> >> branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207
> >> and HADOP-13208 this weekend). There I'm working in one branch and
> >> generating two separate patches. That's hard to do in a single feature
> >> branch.,
> >> 
> >> 4. The rules for feature branch merge. If I get a patch into trunk, it's
> >> in the codebase. If I get it into a feature branch, there's the risk the
> >> entire feature branch doesn't get in. Fix: for short lived feature
> >> branches, we have an RTC policy strict enough we can say "if a feature
> >> branch commit is in. it's considered good enough, even if a few more
> >> successor commits are required before the whole sequence of commits are
> >> considered stable.
> >> 
> >> 5. If you do lots of incremental patches (as feature branches encourage),
> >> the patch history gets very noisy. Maybe here the patches can be rolled
> >> up for the final commit. This is how Spark works.
> >> 
> >> 6. Jenkins doesn't test feature branches today. Can yetus do this if I
> >> give a name of any branch? If so, for a feature branch of > 1w we could
> >> just fork the trunk jenkins builds too, but have it only email the
> >> committers.
> >> 
> >> 7. That final merge process needs to be rigorous from the regression
> >> testing perspective. the last commit on a feature branch should be the
> >> one to
> >> 
> >> Feature branches need to be short lived to cope with change well. And if
> >> you are doing fundamental changes (e.g core APIs), there is some
> >> incentive to get that common feature in, while you still get the full
> >> implementation stable in a feature branch. But: you'd be better be
> >> confident that the stuff in trunk isn't going to break. Nobody gets to
> >> break the main build —or at least not for longer than it takes for the
> >> merge to be reverted.
> >> 
> >> I think maybe we should try doing very-short-lived feature branches, with
> >> a simple policy:
> >> 
> >> -self contained patch which delivers a complete feature/fix: single
> >> patch. These are things where it means
> >> 
> >> -something which is an intermediate step to delivering something: part of
> >> a feature branch. A branch where the process for committing patches is as
> >> rigorous as for trunk —so there's no ambiguity about *whether* a
> >> feature is merged in, only *when*
> >> 
> >> 
> >> 
> >> 
> >> ?B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB•?È?[œÝXœØܚX™K??K[XZ[?ˆ???œËY?]‹][œÝXœØܚX™P??Y?ÛÜ?˜\?XÚ?K›Ü™ÃB‘›Üˆ?Y??]?[ۘ[??ÛÛ[X[™?Ë??K[XZ[?ˆ???œËY?]‹Z?[????Y?ÛÜ?˜\?XÚ?K›Ü™ÃBƒ
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
> >For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
> >
> >
> 
> B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB•È[œÝXœØܚX™KK[XZ[ˆœËY]‹][œÝXœØܚX™PYÛܘ\XÚK›Ü™ÃB‘›ÜˆY][ۘ[ÛÛ[X[™ËK[XZ[ˆœËY]‹Z[YÛܘ\XÚK›Ü™ÃBƒ

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Anu Engineer <ae...@hortonworks.com>.
Hi Colin,

>Even if everyone used branches for all development, person X might merge
>their branch before person Y, forcing person Y to do a rebase or merge. 
>It is not the presence of absence of branches that causes the need to
>merge or rebase, but the presence of absence of "churn."

You are perfectly right on this technically. The issue is when a 
branch developer gets caught in Commit, Revert, let-us-commit-again, 
oh-it-is-not-fixed-completely, let-us-revert-the-revert cycle. 

I was hoping that branches will be exposed to less of this if everyone 
had private branches and got some time to test and bake the feature 
instead of just directly committing to trunk and then test.

Once again, I agree with your point that in a perfect world, merges should
be about the churn, but trunk is often treated as development branch, 
So my point is that it gets unnecessary churn. I really appreciate the 
thought in the thread - that is - let us be more responsible about how we treat trunk.

> I thought the feature branch merge voting period had been shortened to 5
>days rather than 7?  We should probably spell this out on
>https://hadoop.apache.org/bylaws.html 

Thanks for the link, right now it says 7 days. That is why I assumed it is 7. 
Would you be kind enough to point me to a thread that says it is 5 days for a merge Vote? 
I did a google search, but was not able to find a thread like that. Thanks in advance.

Thanks
Anu


On 6/13/16, 11:51 AM, "Colin McCabe" <cm...@apache.org> wrote:

>On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
>> > On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com> wrote:
>> > 
>> > I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
>> > There is an overhead in working with branches, there are both technical costs and administrative issues 
>> > which discourages developers from using branches.
>> > 
>> > I think the biggest issue with branch based development is that fact that other developers do not use a branch.
>> > If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
>> > and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
>> > would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
>
>Even if everyone used branches for all development, person X might merge
>their branch before person Y, forcing person Y to do a rebase or merge. 
>It is not the presence of absence of branches that causes the need to
>merge or rebase, but the presence of absence of "churn."
>
>We try to minimize "churn" in many ways.  For example, we discourage
>people from making trivial whitespace changes to parts of the code
>they're not modifying in their patch.  Or doing things like letting
>their editor change the line ending of files from LF to CR/LF.  However,
>in the final analysis, churn will always exist because development
>exists.
>
>> > development, we should probably move to that model for most development to avoid this tax on people who
>> > actually end up working in the branches.
>> > 
>> > I do have a question in my mind though: What is being proposed is that we move active development to branches 
>> > if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
>> > check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
>> > have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
>> > at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
>> > and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
>> > 
>> > And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
>> > so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.
>
>I'm not sure I really understand the goal of the "trunk-incompat"
>proposal.  Like Karthik asked earlier in this thread, isn't it really
>just a rename of the existing trunk branch?
>It sounds like the policy is going to be exactly the same as now:
>incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
>changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
>etc.
>
>I think we should just create branch-3 and follow the same policy we
>followed with branch-2 and branch-1.  Switching around the names doesn't
>really change the policy, and it creates confusion since it's
>inconsistent with what we did earlier.
>
>I think one of the big frustrations with trunk is that features sat
>there a while without being released because they weren't compatible
>with branch-2-- the shell script rewrite, for example.  However, this
>reflects a fundamental tradeoff-- either incompatible features can't be
>developed at all in the lifetime of Hadoop 3.x, or we will need
>somewhere to put them.  The trunk-incompat proposal is like saying that
>you've solved the prison overcrowding problem by renaming all prisons to
>"correctional facilities."
>
>> > 
>> > One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
>> > request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
>> > Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
>> > have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
>> > instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 
>
>I thought the feature branch merge voting period had been shortened to 5
>days rather than 7?  We should probably spell this out on
>https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
>believe that *all* development should be on feature branches, just
>biggish stuff that is likely to be controversial and/or disruptive.  The
>suggestion I made earlier is that if 3 people ask you for a branch, you
>should definitely strongly consider a branch.
>
>I do think we should shorten the voting period for adding new branch
>committers... making it 3 or 4 days would be fine.  After all, the work
>of branch committers is reviewed during the merge in any case.
>
>best,
>Colin
>
>
>> > 
>> > What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
>> > Right now it feels like Apache’s process encourages committing directly to trunk than a branch
>> > 
>> > Thanks
>> > Anu
>> 
>> 
>> It's a per project process. In slider, we've used a git flow: all work
>> goes in a feature branch, then merge in with a merge point. This gives a
>> better history of workflow, as an individual body of work is an ordered
>> sequence of operations, independent of everything else. This makes cherry
>> picking a sequence easier, it even makes unrolling a series of changes
>> easier: until the entire set of changes is committed, there is nothing to
>> back out.
>> 
>> 1. there's the rebase/merge problem: coping with conflicting change.
>> Rebasing helps, but makes team dev complex. And, if there are big
>> conflict changes, its often easier to take the current diff with trunk
>> branch and reapply it than try to rebase a sequence of operations. You
>> don't always need to rebase though; an FB can repeatedly merge in trunk,
>> for a history which may not be self contained, but does isolate the
>> feature dev from everyone else's work.
>> 
>> 2. Changes don't get exposed more broadly until the feature is in. That
>> may reduce review, but for those of us who work on downstream code it
>> means: nothing breaks until the complete feature is in. You may not
>> realise it, but those of us who do compile downstream things (slider,
>> spark) against even branch-2 always fear discovering what's just broken
>> at the API level alone. And that's "the stable branch". I haven't dared
>> build against trunk for a while.
>> 
>> 3. It's a real PITA trying to do development which spans >1 feature
>> branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207
>> and HADOP-13208 this weekend). There I'm working in one branch and
>> generating two separate patches. That's hard to do in a single feature
>> branch.,
>> 
>> 4. The rules for feature branch merge. If I get a patch into trunk, it's
>> in the codebase. If I get it into a feature branch, there's the risk the
>> entire feature branch doesn't get in. Fix: for short lived feature
>> branches, we have an RTC policy strict enough we can say "if a feature
>> branch commit is in. it's considered good enough, even if a few more
>> successor commits are required before the whole sequence of commits are
>> considered stable.
>> 
>> 5. If you do lots of incremental patches (as feature branches encourage),
>> the patch history gets very noisy. Maybe here the patches can be rolled
>> up for the final commit. This is how Spark works.
>> 
>> 6. Jenkins doesn't test feature branches today. Can yetus do this if I
>> give a name of any branch? If so, for a feature branch of > 1w we could
>> just fork the trunk jenkins builds too, but have it only email the
>> committers.
>> 
>> 7. That final merge process needs to be rigorous from the regression
>> testing perspective. the last commit on a feature branch should be the
>> one to
>> 
>> Feature branches need to be short lived to cope with change well. And if
>> you are doing fundamental changes (e.g core APIs), there is some
>> incentive to get that common feature in, while you still get the full
>> implementation stable in a feature branch. But: you'd be better be
>> confident that the stuff in trunk isn't going to break. Nobody gets to
>> break the main build —or at least not for longer than it takes for the
>> merge to be reverted.
>> 
>> I think maybe we should try doing very-short-lived feature branches, with
>> a simple policy:
>> 
>> -self contained patch which delivers a complete feature/fix: single
>> patch. These are things where it means
>> 
>> -something which is an intermediate step to delivering something: part of
>> a feature branch. A branch where the process for committing patches is as
>> rigorous as for trunk —so there's no ambiguity about *whether* a
>> feature is merged in, only *when*
>> 
>> 
>> 
>> 
>> ?B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB•?È?[œÝXœØܚX™K??K[XZ[?ˆ???œËY?]‹][œÝXœØܚX™P??Y?ÛÜ?˜\?XÚ?K›Ü™ÃB‘›Üˆ?Y??]?[ۘ[??ÛÛ[X[™?Ë??K[XZ[?ˆ???œËY?]‹Z?[????Y?ÛÜ?˜\?XÚ?K›Ü™ÃBƒ
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>


Re: [DISCUSS] Increased use of feature branches

Posted by Colin McCabe <cm...@apache.org>.
On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote:
> > On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com> wrote:
> > 
> > I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
> > There is an overhead in working with branches, there are both technical costs and administrative issues 
> > which discourages developers from using branches.
> > 
> > I think the biggest issue with branch based development is that fact that other developers do not use a branch.
> > If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
> > and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
> > would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 

Even if everyone used branches for all development, person X might merge
their branch before person Y, forcing person Y to do a rebase or merge. 
It is not the presence of absence of branches that causes the need to
merge or rebase, but the presence of absence of "churn."

We try to minimize "churn" in many ways.  For example, we discourage
people from making trivial whitespace changes to parts of the code
they're not modifying in their patch.  Or doing things like letting
their editor change the line ending of files from LF to CR/LF.  However,
in the final analysis, churn will always exist because development
exists.

> > development, we should probably move to that model for most development to avoid this tax on people who
> > actually end up working in the branches.
> > 
> > I do have a question in my mind though: What is being proposed is that we move active development to branches 
> > if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
> > check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
> > have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
> > at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
> > and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
> > 
> > And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
> > so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.

I'm not sure I really understand the goal of the "trunk-incompat"
proposal.  Like Karthik asked earlier in this thread, isn't it really
just a rename of the existing trunk branch?
It sounds like the policy is going to be exactly the same as now:
incompatible stuff in trunk/trunk-incompat/whatever, 3.x compatible
changes in the 3.x line, 2.x compatible changes in the 2.x line, etc.
etc.

I think we should just create branch-3 and follow the same policy we
followed with branch-2 and branch-1.  Switching around the names doesn't
really change the policy, and it creates confusion since it's
inconsistent with what we did earlier.

I think one of the big frustrations with trunk is that features sat
there a while without being released because they weren't compatible
with branch-2-- the shell script rewrite, for example.  However, this
reflects a fundamental tradeoff-- either incompatible features can't be
developed at all in the lifetime of Hadoop 3.x, or we will need
somewhere to put them.  The trunk-incompat proposal is like saying that
you've solved the prison overcrowding problem by renaming all prisons to
"correctional facilities."

> > 
> > One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
> > request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
> > Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
> > have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
> > instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 

I thought the feature branch merge voting period had been shortened to 5
days rather than 7?  We should probably spell this out on
https://hadoop.apache.org/bylaws.html .  Like I said above, I don't
believe that *all* development should be on feature branches, just
biggish stuff that is likely to be controversial and/or disruptive.  The
suggestion I made earlier is that if 3 people ask you for a branch, you
should definitely strongly consider a branch.

I do think we should shorten the voting period for adding new branch
committers... making it 3 or 4 days would be fine.  After all, the work
of branch committers is reviewed during the merge in any case.

best,
Colin


> > 
> > What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
> > Right now it feels like Apache’s process encourages committing directly to trunk than a branch
> > 
> > Thanks
> > Anu
> 
> 
> It's a per project process. In slider, we've used a git flow: all work
> goes in a feature branch, then merge in with a merge point. This gives a
> better history of workflow, as an individual body of work is an ordered
> sequence of operations, independent of everything else. This makes cherry
> picking a sequence easier, it even makes unrolling a series of changes
> easier: until the entire set of changes is committed, there is nothing to
> back out.
> 
> 1. there's the rebase/merge problem: coping with conflicting change.
> Rebasing helps, but makes team dev complex. And, if there are big
> conflict changes, its often easier to take the current diff with trunk
> branch and reapply it than try to rebase a sequence of operations. You
> don't always need to rebase though; an FB can repeatedly merge in trunk,
> for a history which may not be self contained, but does isolate the
> feature dev from everyone else's work.
> 
> 2. Changes don't get exposed more broadly until the feature is in. That
> may reduce review, but for those of us who work on downstream code it
> means: nothing breaks until the complete feature is in. You may not
> realise it, but those of us who do compile downstream things (slider,
> spark) against even branch-2 always fear discovering what's just broken
> at the API level alone. And that's "the stable branch". I haven't dared
> build against trunk for a while.
> 
> 3. It's a real PITA trying to do development which spans >1 feature
> branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207
> and HADOP-13208 this weekend). There I'm working in one branch and
> generating two separate patches. That's hard to do in a single feature
> branch.,
> 
> 4. The rules for feature branch merge. If I get a patch into trunk, it's
> in the codebase. If I get it into a feature branch, there's the risk the
> entire feature branch doesn't get in. Fix: for short lived feature
> branches, we have an RTC policy strict enough we can say "if a feature
> branch commit is in. it's considered good enough, even if a few more
> successor commits are required before the whole sequence of commits are
> considered stable.
> 
> 5. If you do lots of incremental patches (as feature branches encourage),
> the patch history gets very noisy. Maybe here the patches can be rolled
> up for the final commit. This is how Spark works.
> 
> 6. Jenkins doesn't test feature branches today. Can yetus do this if I
> give a name of any branch? If so, for a feature branch of > 1w we could
> just fork the trunk jenkins builds too, but have it only email the
> committers.
> 
> 7. That final merge process needs to be rigorous from the regression
> testing perspective. the last commit on a feature branch should be the
> one to
> 
> Feature branches need to be short lived to cope with change well. And if
> you are doing fundamental changes (e.g core APIs), there is some
> incentive to get that common feature in, while you still get the full
> implementation stable in a feature branch. But: you'd be better be
> confident that the stuff in trunk isn't going to break. Nobody gets to
> break the main build —or at least not for longer than it takes for the
> merge to be reverted.
> 
> I think maybe we should try doing very-short-lived feature branches, with
> a simple policy:
> 
> -self contained patch which delivers a complete feature/fix: single
> patch. These are things where it means
> 
> -something which is an intermediate step to delivering something: part of
> a feature branch. A branch where the process for committing patches is as
> rigorous as for trunk —so there's no ambiguity about *whether* a
> feature is merged in, only *when*
> 
> 
> 
> 
> B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB•È[œÝXœØܚX™KK[XZ[ˆœËY]‹][œÝXœØܚX™PYÛܘ\XÚK›Ü™ÃB‘›ÜˆY][ۘ[ÛÛ[X[™ËK[XZ[ˆœËY]‹Z[YÛܘ\XÚK›Ü™ÃBƒ

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Steve Loughran <st...@hortonworks.com>.
> On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com> wrote:
> 
> I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
> There is an overhead in working with branches, there are both technical costs and administrative issues 
> which discourages developers from using branches.
> 
> I think the biggest issue with branch based development is that fact that other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
> and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
> development, we should probably move to that model for most development to avoid this tax on people who
> actually end up working in the branches.
> 
> I do have a question in my mind though: What is being proposed is that we move active development to branches 
> if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
> check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
> have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
> at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
> and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
> 
> And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
> so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.
> 
> One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
> request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
> have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
> instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 
> 
> What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
> Right now it feels like Apache’s process encourages committing directly to trunk than a branch
> 
> Thanks
> Anu


It's a per project process. In slider, we've used a git flow: all work goes in a feature branch, then merge in with a merge point. This gives a better history of workflow, as an individual body of work is an ordered sequence of operations, independent of everything else. This makes cherry picking a sequence easier, it even makes unrolling a series of changes easier: until the entire set of changes is committed, there is nothing to back out.

1. there's the rebase/merge problem: coping with conflicting change. Rebasing helps, but makes team dev complex. And, if there are big conflict changes, its often easier to take the current diff with trunk branch and reapply it than try to rebase a sequence of operations. You don't always need to rebase though; an FB can repeatedly merge in trunk, for a history which may not be self contained, but does isolate the feature dev from everyone else's work.

2. Changes don't get exposed more broadly until the feature is in. That may reduce review, but for those of us who work on downstream code it means: nothing breaks until the complete feature is in. You may not realise it, but those of us who do compile downstream things (slider, spark) against even branch-2 always fear discovering what's just broken at the API level alone. And that's "the stable branch". I haven't dared build against trunk for a while.

3. It's a real PITA trying to do development which spans >1 feature branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207 and HADOP-13208 this weekend). There I'm working in one branch and generating two separate patches. That's hard to do in a single feature branch.,

4. The rules for feature branch merge. If I get a patch into trunk, it's in the codebase. If I get it into a feature branch, there's the risk the entire feature branch doesn't get in. Fix: for short lived feature branches, we have an RTC policy strict enough we can say "if a feature branch commit is in. it's considered good enough, even if a few more successor commits are required before the whole sequence of commits are considered stable.

5. If you do lots of incremental patches (as feature branches encourage), the patch history gets very noisy. Maybe here the patches can be rolled up for the final commit. This is how Spark works.

6. Jenkins doesn't test feature branches today. Can yetus do this if I give a name of any branch? If so, for a feature branch of > 1w we could just fork the trunk jenkins builds too, but have it only email the committers.

7. That final merge process needs to be rigorous from the regression testing perspective. the last commit on a feature branch should be the one to

Feature branches need to be short lived to cope with change well. And if you are doing fundamental changes (e.g core APIs), there is some incentive to get that common feature in, while you still get the full implementation stable in a feature branch. But: you'd be better be confident that the stuff in trunk isn't going to break. Nobody gets to break the main build —or at least not for longer than it takes for the merge to be reverted.

I think maybe we should try doing very-short-lived feature branches, with a simple policy:

-self contained patch which delivers a complete feature/fix: single patch. These are things where it means

-something which is an intermediate step to delivering something: part of a feature branch. A branch where the process for committing patches is as rigorous as for trunk —so there's no ambiguity about *whether* a feature is merged in, only *when*





Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Thanks for your thoughts Anu.

Regarding your question

> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something? so this will lead us
> back to trunk being the unstable – 3.0 being the new “branch-2”.


Andrew mentioned in the original email

> Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
> was still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.


and I agree with that sentiment. I think even if we have a "trunk-incompat"
branch to hold future incompatible changes, the situation will change
little from today. Instead of dealing with "trunk" (where incompatible
changes may appear) and "branch-3", we would be dealing with
"trunk-incompat" and "trunk". Names are largely mnemonics then.


On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
wrote:

> I actively work on two branches (Diskbalancer and ozone) and I agree with
> most of what Sangjin said.
> There is an overhead in working with branches, there are both technical
> costs and administrative issues
> which discourages developers from using branches.
>
> I think the biggest issue with branch based development is that fact that
> other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””,
> the branch based developer ends up rebasing
> and paying this price of rebasing many times. If everyone followed a model
> of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are
> moving to a branch based
> development, we should probably move to that model for most development to
> avoid this tax on people who
>  actually end up working in the branches.
>
> I do have a question in my mind though: What is being proposed is that we
> move active development to branches
> if the feature is small or incomplete, however keep the trunk open for
> check-ins. One of the biggest reason why we
> check-in into trunk and not to branch-2 is because it is a change that
> will break backward compatibility. So do we
> have an expectation of backward compatibility thru the 3.0-alpha series (I
> personally vote No, since 3.0 is experimental
> at this stage), but if we decide to support some sort of backward-compact
> then willy-nilly committing to trunk
> and still maintaining the expectation we can release Alphas from 3.0 does
> not look possible.
>
> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something?
> so this will lead us back to trunk being the unstable – 3.0 being the new
> “branch-2”.
>
> One more point: If we are moving to use a branch always – then we are
> looking at a model similar to using a git + pull
> request model. If that is so would it make sense to modify the rules to
> make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and
> checking policy – just like trunk and commits
> have been made only after a sign off from a committer, would it be
> possible to merge with a 3-day voting period
> instead of 7, or treat it just like today’s commit to trunk – but with 2
> people signing-off?
>
> What I am suggesting is reducing the administrative overheads of using a
> branch to encourage use of branching.
> Right now it feels like Apache’s process encourages committing directly to
> trunk than a branch
>
> Thanks
> Anu
>
>
> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>
> >Having worked on a major feature in a feature branch, I have some thoughts
> >and observations on feature branch development.
> >
> >IMO feature branch development v. direct commits to trunk in piecemeal is
> >really a choice of *granularity*. Do we want a series of fine-grained
> state
> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
> >
> >This makes me favor a branch-based development model for any
> "decent-sized"
> >features (we'll need to define "decent-sized" of course). Once you have
> >coarse-grained changes, it's easier to reason about what made what release
> >and in what state. As importantly, it makes it easier to back out a
> >complete feature fairly easily if that becomes necessary. My totally
> >unscientific suggestion may be if a feature takes more than dozen commits
> >and longer than a month, we should probably have a bias towards a feature
> >branch.
> >
> >Branch-based development also makes you go faster if your feature is
> >larger. I wouldn't do it the other way for timeline service v.2 for
> example.
> >
> >That said, feature branches don't come for free. Now the onus is on the
> >feature developer to constantly rebase with the trunk to keep it
> reasonably
> >integrated with the trunk. More logistics is involved for the feature
> >developer. Another big question is, when a feature branch gets big and
> it's
> >time to merge, would it get as scrutinized as a series of individual
> >commits? Since the size of merge can be big, you kind of have to rely on
> >those feature committers and those who help them.
> >
> >In terms of integrating/stabilizing, I don't think branch development
> >necessarily makes it harder. It is again granularity. In case of direct
> >commits on trunk, you do a lot more fine-grained integrations. In case of
> >branch development, you do far fewer coarse-grained integrations via
> >rebasing. If more people are doing branch-based development, it makes
> >rebasing easier to manage too.
> >
> >Going back to the related topic of where to release (trunk v. branch-X), I
> >think that is more of a proxy of the real question of "how do we maintain
> >quality and stability of the trunk?". Even if we release from the trunk,
> if
> >our bar for merging to trunk is low, the quality will not improve
> >automatically. So I think we ought to tackle the quality question first.
> >
> >My 2 cents.
> >
> >
> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
> >
> >> Thanks for the notes Andrew, Junping, Karthik.
> >>
> >> Here are some of my understandings:
> >>
> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> >> Hadoop today, without legacy workloads, trunk is what he/she should use.
> >> - Therefore, each commit to trunk should be transactional -- atomic,
> >> consistent, isolated (from other uncommitted patches); I'm not so sure
> >> about durability, Hadoop might be gone in 50 years :). As a committer, I
> >> should be able to look at a patch and determine whether it's a
> >> self-contained improvement of trunk, without looking at other
> uncommitted
> >> patches.
> >> - Some comments inline:
> >>
> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
> >>
> >> > Comparing with advantages, I believe the disadvantages of shipping any
> >> > releases directly from trunk are more obvious and significant:
> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
> have
> >> > to wait to commit to trunk or put into a separated branch that could
> >> delay
> >> > feature development progress as additional vote process get involved
> even
> >> > the feature is simple and harmless.
> >> >
> >> Thanks Junping, those are valid concerns. I think we should clearly
> >> separate incompatible with  uncompleted / half-done work in this
> >> discussion. Whether people should commit incompatible changes to trunk
> is a
> >> much more tricky question (related to trunk-incompat etc.). But per my
> >> comment above, IMHO, *not committing uncompleted work to trunk* should
> be a
> >> much easier principle to agree upon.
> >>
> >>
> >> > - For small feature with only 1 or 2 commits, that need three +1 from
> >> PMCs
> >> > will increase the bar largely for contributors who just start to
> >> contribute
> >> > on Hadoop features but no such sufficient support.
> >> >
> >> Development overhead is another valid concern. I think our rule-of-thumb
> >> should be that, small-medium new features should be proposed as a single
> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> >> beyond a single JIRA/patch, use a feature branch.
> >>
> >>
> >> >
> >> > Given these concerns, I am open to other options, like: proposed by
> Vinod
> >> > or Chris, but rather than to release anything directly from trunk.
> >> >
> >> > - This point doesn't necessarily need to be resolved now though, since
> >> > again we're still doing alphas.
> >> > No. I think we have to settle down this first. Without a common agreed
> >> and
> >> > transparent release process and branches in community, any release
> >> (alpha,
> >> > beta) bits is only called a private release but not a official apache
> >> > hadoop release (even alpha).
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Junping
> >> > ________________________________________
> >> > From: Karthik Kambatla <ka...@cloudera.com>
> >> > Sent: Friday, June 10, 2016 7:49 AM
> >> > To: Andrew Wang
> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> >> > Subject: Re: [DISCUSS] Increased use of feature branches
> >> >
> >> > Thanks for restarting this thread Andrew. I really hope we can get
> this
> >> > across to a VOTE so it is clear.
> >> >
> >> > I see a few advantages shipping from trunk:
> >> >
> >> >    - The lack of need for one additional backport each time.
> >> >    - Feature rot in trunk
> >> >
> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
> can
> >> > continue doing 3.x releases off branch-3 even after we move trunk to
> 4.x
> >> (I
> >> > said it :))
> >> >
> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > On a separate thread, a question was raised about 3.x branching and
> use
> >> > of
> >> > > feature branches going forward.
> >> > >
> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> >> > thread
> >> > > that has spanned the years, with Vinod making this proposal
> (building
> >> on
> >> > > ideas from others who also commented in the email thread):
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >> > >
> >> > > Pasting here for ease:
> >> > >
> >> > > On an unrelated note, offline I was pitching to a bunch of
> >> > > contributors another idea to deal
> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
> directly*.
> >> > >
> >> > > What this gains us is that
> >> > >  - Trunk is always nearly stable or nearly ready for releases
> >> > >  - We no longer have some code lying around in some branch (today’s
> >> > > trunk) that is not releasable
> >> > > because it gets mixed with other undesirable and incompatible
> changes.
> >> > >  - This needs to be coupled with more discipline on individual
> >> > > features - medium to to large
> >> > > features are always worked upon in branches and get merged into
> trunk
> >> > > (and a nearing release!)
> >> > > when they are ready
> >> > >  - All incompatible changes go into some sort of a trunk-incompat
> >> > > branch and stay there till
> >> > > we accumulate enough of those to warrant another major release.
> >> > >
> >> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> >> > 3.0.0,
> >> > > there's no need for this branch yet. This aspect of Vinod's proposal
> >> was
> >> > > still under a bit of discussion; Chris Douglas though we should cut
> a
> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> >> > thinking.
> >> > > This point doesn't necessarily need to be resolved now though, since
> >> > again
> >> > > we're still doing alphas.
> >> > >
> >> > > What we should get consensus on is the goal of keeping trunk stable,
> >> and
> >> > > achieving that by doing more development on feature branches and
> being
> >> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> >> the
> >> > > more recent one on the async API) is that people are generally in
> favor
> >> > of
> >> > > this.
> >> > >
> >> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> >> > > appreciate everyone's timely response in this matter.
> >> > >
> >> > > Thanks,
> >> > > Andrew
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >> >
> >> >
> >>
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Steve Loughran <st...@hortonworks.com>.
> On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com> wrote:
> 
> I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
> There is an overhead in working with branches, there are both technical costs and administrative issues 
> which discourages developers from using branches.
> 
> I think the biggest issue with branch based development is that fact that other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
> and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
> development, we should probably move to that model for most development to avoid this tax on people who
> actually end up working in the branches.
> 
> I do have a question in my mind though: What is being proposed is that we move active development to branches 
> if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
> check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
> have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
> at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
> and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
> 
> And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
> so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.
> 
> One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
> request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
> have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
> instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 
> 
> What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
> Right now it feels like Apache’s process encourages committing directly to trunk than a branch
> 
> Thanks
> Anu


It's a per project process. In slider, we've used a git flow: all work goes in a feature branch, then merge in with a merge point. This gives a better history of workflow, as an individual body of work is an ordered sequence of operations, independent of everything else. This makes cherry picking a sequence easier, it even makes unrolling a series of changes easier: until the entire set of changes is committed, there is nothing to back out.

1. there's the rebase/merge problem: coping with conflicting change. Rebasing helps, but makes team dev complex. And, if there are big conflict changes, its often easier to take the current diff with trunk branch and reapply it than try to rebase a sequence of operations. You don't always need to rebase though; an FB can repeatedly merge in trunk, for a history which may not be self contained, but does isolate the feature dev from everyone else's work.

2. Changes don't get exposed more broadly until the feature is in. That may reduce review, but for those of us who work on downstream code it means: nothing breaks until the complete feature is in. You may not realise it, but those of us who do compile downstream things (slider, spark) against even branch-2 always fear discovering what's just broken at the API level alone. And that's "the stable branch". I haven't dared build against trunk for a while.

3. It's a real PITA trying to do development which spans >1 feature branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207 and HADOP-13208 this weekend). There I'm working in one branch and generating two separate patches. That's hard to do in a single feature branch.,

4. The rules for feature branch merge. If I get a patch into trunk, it's in the codebase. If I get it into a feature branch, there's the risk the entire feature branch doesn't get in. Fix: for short lived feature branches, we have an RTC policy strict enough we can say "if a feature branch commit is in. it's considered good enough, even if a few more successor commits are required before the whole sequence of commits are considered stable.

5. If you do lots of incremental patches (as feature branches encourage), the patch history gets very noisy. Maybe here the patches can be rolled up for the final commit. This is how Spark works.

6. Jenkins doesn't test feature branches today. Can yetus do this if I give a name of any branch? If so, for a feature branch of > 1w we could just fork the trunk jenkins builds too, but have it only email the committers.

7. That final merge process needs to be rigorous from the regression testing perspective. the last commit on a feature branch should be the one to

Feature branches need to be short lived to cope with change well. And if you are doing fundamental changes (e.g core APIs), there is some incentive to get that common feature in, while you still get the full implementation stable in a feature branch. But: you'd be better be confident that the stuff in trunk isn't going to break. Nobody gets to break the main build —or at least not for longer than it takes for the merge to be reverted.

I think maybe we should try doing very-short-lived feature branches, with a simple policy:

-self contained patch which delivers a complete feature/fix: single patch. These are things where it means

-something which is an intermediate step to delivering something: part of a feature branch. A branch where the process for committing patches is as rigorous as for trunk —so there's no ambiguity about *whether* a feature is merged in, only *when*





Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Thanks for your thoughts Anu.

Regarding your question

> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something? so this will lead us
> back to trunk being the unstable – 3.0 being the new “branch-2”.


Andrew mentioned in the original email

> Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
> was still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.


and I agree with that sentiment. I think even if we have a "trunk-incompat"
branch to hold future incompatible changes, the situation will change
little from today. Instead of dealing with "trunk" (where incompatible
changes may appear) and "branch-3", we would be dealing with
"trunk-incompat" and "trunk". Names are largely mnemonics then.


On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
wrote:

> I actively work on two branches (Diskbalancer and ozone) and I agree with
> most of what Sangjin said.
> There is an overhead in working with branches, there are both technical
> costs and administrative issues
> which discourages developers from using branches.
>
> I think the biggest issue with branch based development is that fact that
> other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””,
> the branch based developer ends up rebasing
> and paying this price of rebasing many times. If everyone followed a model
> of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are
> moving to a branch based
> development, we should probably move to that model for most development to
> avoid this tax on people who
>  actually end up working in the branches.
>
> I do have a question in my mind though: What is being proposed is that we
> move active development to branches
> if the feature is small or incomplete, however keep the trunk open for
> check-ins. One of the biggest reason why we
> check-in into trunk and not to branch-2 is because it is a change that
> will break backward compatibility. So do we
> have an expectation of backward compatibility thru the 3.0-alpha series (I
> personally vote No, since 3.0 is experimental
> at this stage), but if we decide to support some sort of backward-compact
> then willy-nilly committing to trunk
> and still maintaining the expectation we can release Alphas from 3.0 does
> not look possible.
>
> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something?
> so this will lead us back to trunk being the unstable – 3.0 being the new
> “branch-2”.
>
> One more point: If we are moving to use a branch always – then we are
> looking at a model similar to using a git + pull
> request model. If that is so would it make sense to modify the rules to
> make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and
> checking policy – just like trunk and commits
> have been made only after a sign off from a committer, would it be
> possible to merge with a 3-day voting period
> instead of 7, or treat it just like today’s commit to trunk – but with 2
> people signing-off?
>
> What I am suggesting is reducing the administrative overheads of using a
> branch to encourage use of branching.
> Right now it feels like Apache’s process encourages committing directly to
> trunk than a branch
>
> Thanks
> Anu
>
>
> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>
> >Having worked on a major feature in a feature branch, I have some thoughts
> >and observations on feature branch development.
> >
> >IMO feature branch development v. direct commits to trunk in piecemeal is
> >really a choice of *granularity*. Do we want a series of fine-grained
> state
> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
> >
> >This makes me favor a branch-based development model for any
> "decent-sized"
> >features (we'll need to define "decent-sized" of course). Once you have
> >coarse-grained changes, it's easier to reason about what made what release
> >and in what state. As importantly, it makes it easier to back out a
> >complete feature fairly easily if that becomes necessary. My totally
> >unscientific suggestion may be if a feature takes more than dozen commits
> >and longer than a month, we should probably have a bias towards a feature
> >branch.
> >
> >Branch-based development also makes you go faster if your feature is
> >larger. I wouldn't do it the other way for timeline service v.2 for
> example.
> >
> >That said, feature branches don't come for free. Now the onus is on the
> >feature developer to constantly rebase with the trunk to keep it
> reasonably
> >integrated with the trunk. More logistics is involved for the feature
> >developer. Another big question is, when a feature branch gets big and
> it's
> >time to merge, would it get as scrutinized as a series of individual
> >commits? Since the size of merge can be big, you kind of have to rely on
> >those feature committers and those who help them.
> >
> >In terms of integrating/stabilizing, I don't think branch development
> >necessarily makes it harder. It is again granularity. In case of direct
> >commits on trunk, you do a lot more fine-grained integrations. In case of
> >branch development, you do far fewer coarse-grained integrations via
> >rebasing. If more people are doing branch-based development, it makes
> >rebasing easier to manage too.
> >
> >Going back to the related topic of where to release (trunk v. branch-X), I
> >think that is more of a proxy of the real question of "how do we maintain
> >quality and stability of the trunk?". Even if we release from the trunk,
> if
> >our bar for merging to trunk is low, the quality will not improve
> >automatically. So I think we ought to tackle the quality question first.
> >
> >My 2 cents.
> >
> >
> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
> >
> >> Thanks for the notes Andrew, Junping, Karthik.
> >>
> >> Here are some of my understandings:
> >>
> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> >> Hadoop today, without legacy workloads, trunk is what he/she should use.
> >> - Therefore, each commit to trunk should be transactional -- atomic,
> >> consistent, isolated (from other uncommitted patches); I'm not so sure
> >> about durability, Hadoop might be gone in 50 years :). As a committer, I
> >> should be able to look at a patch and determine whether it's a
> >> self-contained improvement of trunk, without looking at other
> uncommitted
> >> patches.
> >> - Some comments inline:
> >>
> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
> >>
> >> > Comparing with advantages, I believe the disadvantages of shipping any
> >> > releases directly from trunk are more obvious and significant:
> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
> have
> >> > to wait to commit to trunk or put into a separated branch that could
> >> delay
> >> > feature development progress as additional vote process get involved
> even
> >> > the feature is simple and harmless.
> >> >
> >> Thanks Junping, those are valid concerns. I think we should clearly
> >> separate incompatible with  uncompleted / half-done work in this
> >> discussion. Whether people should commit incompatible changes to trunk
> is a
> >> much more tricky question (related to trunk-incompat etc.). But per my
> >> comment above, IMHO, *not committing uncompleted work to trunk* should
> be a
> >> much easier principle to agree upon.
> >>
> >>
> >> > - For small feature with only 1 or 2 commits, that need three +1 from
> >> PMCs
> >> > will increase the bar largely for contributors who just start to
> >> contribute
> >> > on Hadoop features but no such sufficient support.
> >> >
> >> Development overhead is another valid concern. I think our rule-of-thumb
> >> should be that, small-medium new features should be proposed as a single
> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> >> beyond a single JIRA/patch, use a feature branch.
> >>
> >>
> >> >
> >> > Given these concerns, I am open to other options, like: proposed by
> Vinod
> >> > or Chris, but rather than to release anything directly from trunk.
> >> >
> >> > - This point doesn't necessarily need to be resolved now though, since
> >> > again we're still doing alphas.
> >> > No. I think we have to settle down this first. Without a common agreed
> >> and
> >> > transparent release process and branches in community, any release
> >> (alpha,
> >> > beta) bits is only called a private release but not a official apache
> >> > hadoop release (even alpha).
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Junping
> >> > ________________________________________
> >> > From: Karthik Kambatla <ka...@cloudera.com>
> >> > Sent: Friday, June 10, 2016 7:49 AM
> >> > To: Andrew Wang
> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> >> > Subject: Re: [DISCUSS] Increased use of feature branches
> >> >
> >> > Thanks for restarting this thread Andrew. I really hope we can get
> this
> >> > across to a VOTE so it is clear.
> >> >
> >> > I see a few advantages shipping from trunk:
> >> >
> >> >    - The lack of need for one additional backport each time.
> >> >    - Feature rot in trunk
> >> >
> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
> can
> >> > continue doing 3.x releases off branch-3 even after we move trunk to
> 4.x
> >> (I
> >> > said it :))
> >> >
> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > On a separate thread, a question was raised about 3.x branching and
> use
> >> > of
> >> > > feature branches going forward.
> >> > >
> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> >> > thread
> >> > > that has spanned the years, with Vinod making this proposal
> (building
> >> on
> >> > > ideas from others who also commented in the email thread):
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >> > >
> >> > > Pasting here for ease:
> >> > >
> >> > > On an unrelated note, offline I was pitching to a bunch of
> >> > > contributors another idea to deal
> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
> directly*.
> >> > >
> >> > > What this gains us is that
> >> > >  - Trunk is always nearly stable or nearly ready for releases
> >> > >  - We no longer have some code lying around in some branch (today’s
> >> > > trunk) that is not releasable
> >> > > because it gets mixed with other undesirable and incompatible
> changes.
> >> > >  - This needs to be coupled with more discipline on individual
> >> > > features - medium to to large
> >> > > features are always worked upon in branches and get merged into
> trunk
> >> > > (and a nearing release!)
> >> > > when they are ready
> >> > >  - All incompatible changes go into some sort of a trunk-incompat
> >> > > branch and stay there till
> >> > > we accumulate enough of those to warrant another major release.
> >> > >
> >> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> >> > 3.0.0,
> >> > > there's no need for this branch yet. This aspect of Vinod's proposal
> >> was
> >> > > still under a bit of discussion; Chris Douglas though we should cut
> a
> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> >> > thinking.
> >> > > This point doesn't necessarily need to be resolved now though, since
> >> > again
> >> > > we're still doing alphas.
> >> > >
> >> > > What we should get consensus on is the goal of keeping trunk stable,
> >> and
> >> > > achieving that by doing more development on feature branches and
> being
> >> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> >> the
> >> > > more recent one on the async API) is that people are generally in
> favor
> >> > of
> >> > > this.
> >> > >
> >> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> >> > > appreciate everyone's timely response in this matter.
> >> > >
> >> > > Thanks,
> >> > > Andrew
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >> >
> >> >
> >>
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Steve Loughran <st...@hortonworks.com>.
> On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com> wrote:
> 
> I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
> There is an overhead in working with branches, there are both technical costs and administrative issues 
> which discourages developers from using branches.
> 
> I think the biggest issue with branch based development is that fact that other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
> and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
> development, we should probably move to that model for most development to avoid this tax on people who
> actually end up working in the branches.
> 
> I do have a question in my mind though: What is being proposed is that we move active development to branches 
> if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
> check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
> have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
> at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
> and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
> 
> And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
> so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.
> 
> One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
> request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
> have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
> instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 
> 
> What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
> Right now it feels like Apache’s process encourages committing directly to trunk than a branch
> 
> Thanks
> Anu


It's a per project process. In slider, we've used a git flow: all work goes in a feature branch, then merge in with a merge point. This gives a better history of workflow, as an individual body of work is an ordered sequence of operations, independent of everything else. This makes cherry picking a sequence easier, it even makes unrolling a series of changes easier: until the entire set of changes is committed, there is nothing to back out.

1. there's the rebase/merge problem: coping with conflicting change. Rebasing helps, but makes team dev complex. And, if there are big conflict changes, its often easier to take the current diff with trunk branch and reapply it than try to rebase a sequence of operations. You don't always need to rebase though; an FB can repeatedly merge in trunk, for a history which may not be self contained, but does isolate the feature dev from everyone else's work.

2. Changes don't get exposed more broadly until the feature is in. That may reduce review, but for those of us who work on downstream code it means: nothing breaks until the complete feature is in. You may not realise it, but those of us who do compile downstream things (slider, spark) against even branch-2 always fear discovering what's just broken at the API level alone. And that's "the stable branch". I haven't dared build against trunk for a while.

3. It's a real PITA trying to do development which spans >1 feature branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207 and HADOP-13208 this weekend). There I'm working in one branch and generating two separate patches. That's hard to do in a single feature branch.,

4. The rules for feature branch merge. If I get a patch into trunk, it's in the codebase. If I get it into a feature branch, there's the risk the entire feature branch doesn't get in. Fix: for short lived feature branches, we have an RTC policy strict enough we can say "if a feature branch commit is in. it's considered good enough, even if a few more successor commits are required before the whole sequence of commits are considered stable.

5. If you do lots of incremental patches (as feature branches encourage), the patch history gets very noisy. Maybe here the patches can be rolled up for the final commit. This is how Spark works.

6. Jenkins doesn't test feature branches today. Can yetus do this if I give a name of any branch? If so, for a feature branch of > 1w we could just fork the trunk jenkins builds too, but have it only email the committers.

7. That final merge process needs to be rigorous from the regression testing perspective. the last commit on a feature branch should be the one to

Feature branches need to be short lived to cope with change well. And if you are doing fundamental changes (e.g core APIs), there is some incentive to get that common feature in, while you still get the full implementation stable in a feature branch. But: you'd be better be confident that the stuff in trunk isn't going to break. Nobody gets to break the main build —or at least not for longer than it takes for the merge to be reverted.

I think maybe we should try doing very-short-lived feature branches, with a simple policy:

-self contained patch which delivers a complete feature/fix: single patch. These are things where it means

-something which is an intermediate step to delivering something: part of a feature branch. A branch where the process for committing patches is as rigorous as for trunk —so there's no ambiguity about *whether* a feature is merged in, only *when*





Re: [DISCUSS] Increased use of feature branches

Posted by Steve Loughran <st...@hortonworks.com>.
> On 10 Jun 2016, at 20:37, Anu Engineer <ae...@hortonworks.com> wrote:
> 
> I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
> There is an overhead in working with branches, there are both technical costs and administrative issues 
> which discourages developers from using branches.
> 
> I think the biggest issue with branch based development is that fact that other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
> and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
> development, we should probably move to that model for most development to avoid this tax on people who
> actually end up working in the branches.
> 
> I do have a question in my mind though: What is being proposed is that we move active development to branches 
> if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
> check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
> have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
> at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
> and still maintaining the expectation we can release Alphas from 3.0 does not look possible.
> 
> And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
> so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.
> 
> One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
> request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
> have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
> instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 
> 
> What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
> Right now it feels like Apache’s process encourages committing directly to trunk than a branch
> 
> Thanks
> Anu


It's a per project process. In slider, we've used a git flow: all work goes in a feature branch, then merge in with a merge point. This gives a better history of workflow, as an individual body of work is an ordered sequence of operations, independent of everything else. This makes cherry picking a sequence easier, it even makes unrolling a series of changes easier: until the entire set of changes is committed, there is nothing to back out.

1. there's the rebase/merge problem: coping with conflicting change. Rebasing helps, but makes team dev complex. And, if there are big conflict changes, its often easier to take the current diff with trunk branch and reapply it than try to rebase a sequence of operations. You don't always need to rebase though; an FB can repeatedly merge in trunk, for a history which may not be self contained, but does isolate the feature dev from everyone else's work.

2. Changes don't get exposed more broadly until the feature is in. That may reduce review, but for those of us who work on downstream code it means: nothing breaks until the complete feature is in. You may not realise it, but those of us who do compile downstream things (slider, spark) against even branch-2 always fear discovering what's just broken at the API level alone. And that's "the stable branch". I haven't dared build against trunk for a while.

3. It's a real PITA trying to do development which spans >1 feature branch. Even today it's tricky with code spanning >1 patch (HADOOP-13207 and HADOP-13208 this weekend). There I'm working in one branch and generating two separate patches. That's hard to do in a single feature branch.,

4. The rules for feature branch merge. If I get a patch into trunk, it's in the codebase. If I get it into a feature branch, there's the risk the entire feature branch doesn't get in. Fix: for short lived feature branches, we have an RTC policy strict enough we can say "if a feature branch commit is in. it's considered good enough, even if a few more successor commits are required before the whole sequence of commits are considered stable.

5. If you do lots of incremental patches (as feature branches encourage), the patch history gets very noisy. Maybe here the patches can be rolled up for the final commit. This is how Spark works.

6. Jenkins doesn't test feature branches today. Can yetus do this if I give a name of any branch? If so, for a feature branch of > 1w we could just fork the trunk jenkins builds too, but have it only email the committers.

7. That final merge process needs to be rigorous from the regression testing perspective. the last commit on a feature branch should be the one to

Feature branches need to be short lived to cope with change well. And if you are doing fundamental changes (e.g core APIs), there is some incentive to get that common feature in, while you still get the full implementation stable in a feature branch. But: you'd be better be confident that the stuff in trunk isn't going to break. Nobody gets to break the main build —or at least not for longer than it takes for the merge to be reverted.

I think maybe we should try doing very-short-lived feature branches, with a simple policy:

-self contained patch which delivers a complete feature/fix: single patch. These are things where it means

-something which is an intermediate step to delivering something: part of a feature branch. A branch where the process for committing patches is as rigorous as for trunk —so there's no ambiguity about *whether* a feature is merged in, only *when*





Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Thanks for your thoughts Anu.

Regarding your question

> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something? so this will lead us
> back to trunk being the unstable – 3.0 being the new “branch-2”.


Andrew mentioned in the original email

> Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0, there's no need for this branch yet. This aspect of Vinod's proposal
> was still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.


and I agree with that sentiment. I think even if we have a "trunk-incompat"
branch to hold future incompatible changes, the situation will change
little from today. Instead of dealing with "trunk" (where incompatible
changes may appear) and "branch-3", we would be dealing with
"trunk-incompat" and "trunk". Names are largely mnemonics then.


On Fri, Jun 10, 2016 at 12:37 PM, Anu Engineer <ae...@hortonworks.com>
wrote:

> I actively work on two branches (Diskbalancer and ozone) and I agree with
> most of what Sangjin said.
> There is an overhead in working with branches, there are both technical
> costs and administrative issues
> which discourages developers from using branches.
>
> I think the biggest issue with branch based development is that fact that
> other developers do not use a branch.
> If a small feature appears as a series of commits to “”datanode.java””,
> the branch based developer ends up rebasing
> and paying this price of rebasing many times. If everyone followed a model
> of branch + Pull request, other branches
> would not have to deal with continues rebasing to trunk commits. If we are
> moving to a branch based
> development, we should probably move to that model for most development to
> avoid this tax on people who
>  actually end up working in the branches.
>
> I do have a question in my mind though: What is being proposed is that we
> move active development to branches
> if the feature is small or incomplete, however keep the trunk open for
> check-ins. One of the biggest reason why we
> check-in into trunk and not to branch-2 is because it is a change that
> will break backward compatibility. So do we
> have an expectation of backward compatibility thru the 3.0-alpha series (I
> personally vote No, since 3.0 is experimental
> at this stage), but if we decide to support some sort of backward-compact
> then willy-nilly committing to trunk
> and still maintaining the expectation we can release Alphas from 3.0 does
> not look possible.
>
> And then comes the question, once 3.0 becomes official, where do we
> check-in a change,  if that would break something?
> so this will lead us back to trunk being the unstable – 3.0 being the new
> “branch-2”.
>
> One more point: If we are moving to use a branch always – then we are
> looking at a model similar to using a git + pull
> request model. If that is so would it make sense to modify the rules to
> make these branches easier to merge?
> Say for example, if all commits in a branch has followed review and
> checking policy – just like trunk and commits
> have been made only after a sign off from a committer, would it be
> possible to merge with a 3-day voting period
> instead of 7, or treat it just like today’s commit to trunk – but with 2
> people signing-off?
>
> What I am suggesting is reducing the administrative overheads of using a
> branch to encourage use of branching.
> Right now it feels like Apache’s process encourages committing directly to
> trunk than a branch
>
> Thanks
> Anu
>
>
> On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <
> sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:
>
> >Having worked on a major feature in a feature branch, I have some thoughts
> >and observations on feature branch development.
> >
> >IMO feature branch development v. direct commits to trunk in piecemeal is
> >really a choice of *granularity*. Do we want a series of fine-grained
> state
> >changes on trunk or fewer coarse-grained chunks of commits on trunk?
> >
> >This makes me favor a branch-based development model for any
> "decent-sized"
> >features (we'll need to define "decent-sized" of course). Once you have
> >coarse-grained changes, it's easier to reason about what made what release
> >and in what state. As importantly, it makes it easier to back out a
> >complete feature fairly easily if that becomes necessary. My totally
> >unscientific suggestion may be if a feature takes more than dozen commits
> >and longer than a month, we should probably have a bias towards a feature
> >branch.
> >
> >Branch-based development also makes you go faster if your feature is
> >larger. I wouldn't do it the other way for timeline service v.2 for
> example.
> >
> >That said, feature branches don't come for free. Now the onus is on the
> >feature developer to constantly rebase with the trunk to keep it
> reasonably
> >integrated with the trunk. More logistics is involved for the feature
> >developer. Another big question is, when a feature branch gets big and
> it's
> >time to merge, would it get as scrutinized as a series of individual
> >commits? Since the size of merge can be big, you kind of have to rely on
> >those feature committers and those who help them.
> >
> >In terms of integrating/stabilizing, I don't think branch development
> >necessarily makes it harder. It is again granularity. In case of direct
> >commits on trunk, you do a lot more fine-grained integrations. In case of
> >branch development, you do far fewer coarse-grained integrations via
> >rebasing. If more people are doing branch-based development, it makes
> >rebasing easier to manage too.
> >
> >Going back to the related topic of where to release (trunk v. branch-X), I
> >think that is more of a proxy of the real question of "how do we maintain
> >quality and stability of the trunk?". Even if we release from the trunk,
> if
> >our bar for merging to trunk is low, the quality will not improve
> >automatically. So I think we ought to tackle the quality question first.
> >
> >My 2 cents.
> >
> >
> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
> >
> >> Thanks for the notes Andrew, Junping, Karthik.
> >>
> >> Here are some of my understandings:
> >>
> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> >> Hadoop today, without legacy workloads, trunk is what he/she should use.
> >> - Therefore, each commit to trunk should be transactional -- atomic,
> >> consistent, isolated (from other uncommitted patches); I'm not so sure
> >> about durability, Hadoop might be gone in 50 years :). As a committer, I
> >> should be able to look at a patch and determine whether it's a
> >> self-contained improvement of trunk, without looking at other
> uncommitted
> >> patches.
> >> - Some comments inline:
> >>
> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
> >>
> >> > Comparing with advantages, I believe the disadvantages of shipping any
> >> > releases directly from trunk are more obvious and significant:
> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
> have
> >> > to wait to commit to trunk or put into a separated branch that could
> >> delay
> >> > feature development progress as additional vote process get involved
> even
> >> > the feature is simple and harmless.
> >> >
> >> Thanks Junping, those are valid concerns. I think we should clearly
> >> separate incompatible with  uncompleted / half-done work in this
> >> discussion. Whether people should commit incompatible changes to trunk
> is a
> >> much more tricky question (related to trunk-incompat etc.). But per my
> >> comment above, IMHO, *not committing uncompleted work to trunk* should
> be a
> >> much easier principle to agree upon.
> >>
> >>
> >> > - For small feature with only 1 or 2 commits, that need three +1 from
> >> PMCs
> >> > will increase the bar largely for contributors who just start to
> >> contribute
> >> > on Hadoop features but no such sufficient support.
> >> >
> >> Development overhead is another valid concern. I think our rule-of-thumb
> >> should be that, small-medium new features should be proposed as a single
> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> >> beyond a single JIRA/patch, use a feature branch.
> >>
> >>
> >> >
> >> > Given these concerns, I am open to other options, like: proposed by
> Vinod
> >> > or Chris, but rather than to release anything directly from trunk.
> >> >
> >> > - This point doesn't necessarily need to be resolved now though, since
> >> > again we're still doing alphas.
> >> > No. I think we have to settle down this first. Without a common agreed
> >> and
> >> > transparent release process and branches in community, any release
> >> (alpha,
> >> > beta) bits is only called a private release but not a official apache
> >> > hadoop release (even alpha).
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Junping
> >> > ________________________________________
> >> > From: Karthik Kambatla <ka...@cloudera.com>
> >> > Sent: Friday, June 10, 2016 7:49 AM
> >> > To: Andrew Wang
> >> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> >> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> >> > Subject: Re: [DISCUSS] Increased use of feature branches
> >> >
> >> > Thanks for restarting this thread Andrew. I really hope we can get
> this
> >> > across to a VOTE so it is clear.
> >> >
> >> > I see a few advantages shipping from trunk:
> >> >
> >> >    - The lack of need for one additional backport each time.
> >> >    - Feature rot in trunk
> >> >
> >> > Instead of creating branch-3, I recommend creating branch-3.x so we
> can
> >> > continue doing 3.x releases off branch-3 even after we move trunk to
> 4.x
> >> (I
> >> > said it :))
> >> >
> >> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > On a separate thread, a question was raised about 3.x branching and
> use
> >> > of
> >> > > feature branches going forward.
> >> > >
> >> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> >> > thread
> >> > > that has spanned the years, with Vinod making this proposal
> (building
> >> on
> >> > > ideas from others who also commented in the email thread):
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >> > >
> >> > > Pasting here for ease:
> >> > >
> >> > > On an unrelated note, offline I was pitching to a bunch of
> >> > > contributors another idea to deal
> >> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk
> directly*.
> >> > >
> >> > > What this gains us is that
> >> > >  - Trunk is always nearly stable or nearly ready for releases
> >> > >  - We no longer have some code lying around in some branch (today’s
> >> > > trunk) that is not releasable
> >> > > because it gets mixed with other undesirable and incompatible
> changes.
> >> > >  - This needs to be coupled with more discipline on individual
> >> > > features - medium to to large
> >> > > features are always worked upon in branches and get merged into
> trunk
> >> > > (and a nearing release!)
> >> > > when they are ready
> >> > >  - All incompatible changes go into some sort of a trunk-incompat
> >> > > branch and stay there till
> >> > > we accumulate enough of those to warrant another major release.
> >> > >
> >> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> >> > 3.0.0,
> >> > > there's no need for this branch yet. This aspect of Vinod's proposal
> >> was
> >> > > still under a bit of discussion; Chris Douglas though we should cut
> a
> >> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> >> > thinking.
> >> > > This point doesn't necessarily need to be resolved now though, since
> >> > again
> >> > > we're still doing alphas.
> >> > >
> >> > > What we should get consensus on is the goal of keeping trunk stable,
> >> and
> >> > > achieving that by doing more development on feature branches and
> being
> >> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> >> the
> >> > > more recent one on the async API) is that people are generally in
> favor
> >> > of
> >> > > this.
> >> > >
> >> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> >> > > appreciate everyone's timely response in this matter.
> >> > >
> >> > > Thanks,
> >> > > Andrew
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> >> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >> >
> >> >
> >>
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Anu Engineer <ae...@hortonworks.com>.
I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
There is an overhead in working with branches, there are both technical costs and administrative issues 
which discourages developers from using branches.

I think the biggest issue with branch based development is that fact that other developers do not use a branch.
If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
development, we should probably move to that model for most development to avoid this tax on people who
 actually end up working in the branches.

I do have a question in my mind though: What is being proposed is that we move active development to branches 
if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
and still maintaining the expectation we can release Alphas from 3.0 does not look possible.

And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.

One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 

What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
Right now it feels like Apache’s process encourages committing directly to trunk than a branch

Thanks
Anu


On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:

>Having worked on a major feature in a feature branch, I have some thoughts
>and observations on feature branch development.
>
>IMO feature branch development v. direct commits to trunk in piecemeal is
>really a choice of *granularity*. Do we want a series of fine-grained state
>changes on trunk or fewer coarse-grained chunks of commits on trunk?
>
>This makes me favor a branch-based development model for any "decent-sized"
>features (we'll need to define "decent-sized" of course). Once you have
>coarse-grained changes, it's easier to reason about what made what release
>and in what state. As importantly, it makes it easier to back out a
>complete feature fairly easily if that becomes necessary. My totally
>unscientific suggestion may be if a feature takes more than dozen commits
>and longer than a month, we should probably have a bias towards a feature
>branch.
>
>Branch-based development also makes you go faster if your feature is
>larger. I wouldn't do it the other way for timeline service v.2 for example.
>
>That said, feature branches don't come for free. Now the onus is on the
>feature developer to constantly rebase with the trunk to keep it reasonably
>integrated with the trunk. More logistics is involved for the feature
>developer. Another big question is, when a feature branch gets big and it's
>time to merge, would it get as scrutinized as a series of individual
>commits? Since the size of merge can be big, you kind of have to rely on
>those feature committers and those who help them.
>
>In terms of integrating/stabilizing, I don't think branch development
>necessarily makes it harder. It is again granularity. In case of direct
>commits on trunk, you do a lot more fine-grained integrations. In case of
>branch development, you do far fewer coarse-grained integrations via
>rebasing. If more people are doing branch-based development, it makes
>rebasing easier to manage too.
>
>Going back to the related topic of where to release (trunk v. branch-X), I
>think that is more of a proxy of the real question of "how do we maintain
>quality and stability of the trunk?". Even if we release from the trunk, if
>our bar for merging to trunk is low, the quality will not improve
>automatically. So I think we ought to tackle the quality question first.
>
>My 2 cents.
>
>
>On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>
>> Thanks for the notes Andrew, Junping, Karthik.
>>
>> Here are some of my understandings:
>>
>> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> Hadoop today, without legacy workloads, trunk is what he/she should use.
>> - Therefore, each commit to trunk should be transactional -- atomic,
>> consistent, isolated (from other uncommitted patches); I'm not so sure
>> about durability, Hadoop might be gone in 50 years :). As a committer, I
>> should be able to look at a patch and determine whether it's a
>> self-contained improvement of trunk, without looking at other uncommitted
>> patches.
>> - Some comments inline:
>>
>> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>>
>> > Comparing with advantages, I believe the disadvantages of shipping any
>> > releases directly from trunk are more obvious and significant:
>> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
>> > to wait to commit to trunk or put into a separated branch that could
>> delay
>> > feature development progress as additional vote process get involved even
>> > the feature is simple and harmless.
>> >
>> Thanks Junping, those are valid concerns. I think we should clearly
>> separate incompatible with  uncompleted / half-done work in this
>> discussion. Whether people should commit incompatible changes to trunk is a
>> much more tricky question (related to trunk-incompat etc.). But per my
>> comment above, IMHO, *not committing uncompleted work to trunk* should be a
>> much easier principle to agree upon.
>>
>>
>> > - For small feature with only 1 or 2 commits, that need three +1 from
>> PMCs
>> > will increase the bar largely for contributors who just start to
>> contribute
>> > on Hadoop features but no such sufficient support.
>> >
>> Development overhead is another valid concern. I think our rule-of-thumb
>> should be that, small-medium new features should be proposed as a single
>> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
>> beyond a single JIRA/patch, use a feature branch.
>>
>>
>> >
>> > Given these concerns, I am open to other options, like: proposed by Vinod
>> > or Chris, but rather than to release anything directly from trunk.
>> >
>> > - This point doesn't necessarily need to be resolved now though, since
>> > again we're still doing alphas.
>> > No. I think we have to settle down this first. Without a common agreed
>> and
>> > transparent release process and branches in community, any release
>> (alpha,
>> > beta) bits is only called a private release but not a official apache
>> > hadoop release (even alpha).
>> >
>> >
>> > Thanks,
>> >
>> > Junping
>> > ________________________________________
>> > From: Karthik Kambatla <ka...@cloudera.com>
>> > Sent: Friday, June 10, 2016 7:49 AM
>> > To: Andrew Wang
>> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >
>> > Thanks for restarting this thread Andrew. I really hope we can get this
>> > across to a VOTE so it is clear.
>> >
>> > I see a few advantages shipping from trunk:
>> >
>> >    - The lack of need for one additional backport each time.
>> >    - Feature rot in trunk
>> >
>> > Instead of creating branch-3, I recommend creating branch-3.x so we can
>> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
>> (I
>> > said it :))
>> >
>> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > On a separate thread, a question was raised about 3.x branching and use
>> > of
>> > > feature branches going forward.
>> > >
>> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> > thread
>> > > that has spanned the years, with Vinod making this proposal (building
>> on
>> > > ideas from others who also commented in the email thread):
>> > >
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> > >
>> > > Pasting here for ease:
>> > >
>> > > On an unrelated note, offline I was pitching to a bunch of
>> > > contributors another idea to deal
>> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>> > >
>> > > What this gains us is that
>> > >  - Trunk is always nearly stable or nearly ready for releases
>> > >  - We no longer have some code lying around in some branch (today’s
>> > > trunk) that is not releasable
>> > > because it gets mixed with other undesirable and incompatible changes.
>> > >  - This needs to be coupled with more discipline on individual
>> > > features - medium to to large
>> > > features are always worked upon in branches and get merged into trunk
>> > > (and a nearing release!)
>> > > when they are ready
>> > >  - All incompatible changes go into some sort of a trunk-incompat
>> > > branch and stay there till
>> > > we accumulate enough of those to warrant another major release.
>> > >
>> > > Regarding "trunk-incompat", since we're still in the alpha stage for
>> > 3.0.0,
>> > > there's no need for this branch yet. This aspect of Vinod's proposal
>> was
>> > > still under a bit of discussion; Chris Douglas though we should cut a
>> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> > thinking.
>> > > This point doesn't necessarily need to be resolved now though, since
>> > again
>> > > we're still doing alphas.
>> > >
>> > > What we should get consensus on is the goal of keeping trunk stable,
>> and
>> > > achieving that by doing more development on feature branches and being
>> > > judicious about merges. My sense from the Hadoop 3 email thread (and
>> the
>> > > more recent one on the async API) is that people are generally in favor
>> > of
>> > > this.
>> > >
>> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
>> > > appreciate everyone's timely response in this matter.
>> > >
>> > > Thanks,
>> > > Andrew
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >
>> >
>>


Re: [DISCUSS] Increased use of feature branches

Posted by Anu Engineer <ae...@hortonworks.com>.
I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
There is an overhead in working with branches, there are both technical costs and administrative issues 
which discourages developers from using branches.

I think the biggest issue with branch based development is that fact that other developers do not use a branch.
If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
development, we should probably move to that model for most development to avoid this tax on people who
 actually end up working in the branches.

I do have a question in my mind though: What is being proposed is that we move active development to branches 
if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
and still maintaining the expectation we can release Alphas from 3.0 does not look possible.

And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.

One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 

What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
Right now it feels like Apache’s process encourages committing directly to trunk than a branch

Thanks
Anu


On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:

>Having worked on a major feature in a feature branch, I have some thoughts
>and observations on feature branch development.
>
>IMO feature branch development v. direct commits to trunk in piecemeal is
>really a choice of *granularity*. Do we want a series of fine-grained state
>changes on trunk or fewer coarse-grained chunks of commits on trunk?
>
>This makes me favor a branch-based development model for any "decent-sized"
>features (we'll need to define "decent-sized" of course). Once you have
>coarse-grained changes, it's easier to reason about what made what release
>and in what state. As importantly, it makes it easier to back out a
>complete feature fairly easily if that becomes necessary. My totally
>unscientific suggestion may be if a feature takes more than dozen commits
>and longer than a month, we should probably have a bias towards a feature
>branch.
>
>Branch-based development also makes you go faster if your feature is
>larger. I wouldn't do it the other way for timeline service v.2 for example.
>
>That said, feature branches don't come for free. Now the onus is on the
>feature developer to constantly rebase with the trunk to keep it reasonably
>integrated with the trunk. More logistics is involved for the feature
>developer. Another big question is, when a feature branch gets big and it's
>time to merge, would it get as scrutinized as a series of individual
>commits? Since the size of merge can be big, you kind of have to rely on
>those feature committers and those who help them.
>
>In terms of integrating/stabilizing, I don't think branch development
>necessarily makes it harder. It is again granularity. In case of direct
>commits on trunk, you do a lot more fine-grained integrations. In case of
>branch development, you do far fewer coarse-grained integrations via
>rebasing. If more people are doing branch-based development, it makes
>rebasing easier to manage too.
>
>Going back to the related topic of where to release (trunk v. branch-X), I
>think that is more of a proxy of the real question of "how do we maintain
>quality and stability of the trunk?". Even if we release from the trunk, if
>our bar for merging to trunk is low, the quality will not improve
>automatically. So I think we ought to tackle the quality question first.
>
>My 2 cents.
>
>
>On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>
>> Thanks for the notes Andrew, Junping, Karthik.
>>
>> Here are some of my understandings:
>>
>> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> Hadoop today, without legacy workloads, trunk is what he/she should use.
>> - Therefore, each commit to trunk should be transactional -- atomic,
>> consistent, isolated (from other uncommitted patches); I'm not so sure
>> about durability, Hadoop might be gone in 50 years :). As a committer, I
>> should be able to look at a patch and determine whether it's a
>> self-contained improvement of trunk, without looking at other uncommitted
>> patches.
>> - Some comments inline:
>>
>> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>>
>> > Comparing with advantages, I believe the disadvantages of shipping any
>> > releases directly from trunk are more obvious and significant:
>> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
>> > to wait to commit to trunk or put into a separated branch that could
>> delay
>> > feature development progress as additional vote process get involved even
>> > the feature is simple and harmless.
>> >
>> Thanks Junping, those are valid concerns. I think we should clearly
>> separate incompatible with  uncompleted / half-done work in this
>> discussion. Whether people should commit incompatible changes to trunk is a
>> much more tricky question (related to trunk-incompat etc.). But per my
>> comment above, IMHO, *not committing uncompleted work to trunk* should be a
>> much easier principle to agree upon.
>>
>>
>> > - For small feature with only 1 or 2 commits, that need three +1 from
>> PMCs
>> > will increase the bar largely for contributors who just start to
>> contribute
>> > on Hadoop features but no such sufficient support.
>> >
>> Development overhead is another valid concern. I think our rule-of-thumb
>> should be that, small-medium new features should be proposed as a single
>> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
>> beyond a single JIRA/patch, use a feature branch.
>>
>>
>> >
>> > Given these concerns, I am open to other options, like: proposed by Vinod
>> > or Chris, but rather than to release anything directly from trunk.
>> >
>> > - This point doesn't necessarily need to be resolved now though, since
>> > again we're still doing alphas.
>> > No. I think we have to settle down this first. Without a common agreed
>> and
>> > transparent release process and branches in community, any release
>> (alpha,
>> > beta) bits is only called a private release but not a official apache
>> > hadoop release (even alpha).
>> >
>> >
>> > Thanks,
>> >
>> > Junping
>> > ________________________________________
>> > From: Karthik Kambatla <ka...@cloudera.com>
>> > Sent: Friday, June 10, 2016 7:49 AM
>> > To: Andrew Wang
>> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >
>> > Thanks for restarting this thread Andrew. I really hope we can get this
>> > across to a VOTE so it is clear.
>> >
>> > I see a few advantages shipping from trunk:
>> >
>> >    - The lack of need for one additional backport each time.
>> >    - Feature rot in trunk
>> >
>> > Instead of creating branch-3, I recommend creating branch-3.x so we can
>> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
>> (I
>> > said it :))
>> >
>> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > On a separate thread, a question was raised about 3.x branching and use
>> > of
>> > > feature branches going forward.
>> > >
>> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> > thread
>> > > that has spanned the years, with Vinod making this proposal (building
>> on
>> > > ideas from others who also commented in the email thread):
>> > >
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> > >
>> > > Pasting here for ease:
>> > >
>> > > On an unrelated note, offline I was pitching to a bunch of
>> > > contributors another idea to deal
>> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>> > >
>> > > What this gains us is that
>> > >  - Trunk is always nearly stable or nearly ready for releases
>> > >  - We no longer have some code lying around in some branch (today’s
>> > > trunk) that is not releasable
>> > > because it gets mixed with other undesirable and incompatible changes.
>> > >  - This needs to be coupled with more discipline on individual
>> > > features - medium to to large
>> > > features are always worked upon in branches and get merged into trunk
>> > > (and a nearing release!)
>> > > when they are ready
>> > >  - All incompatible changes go into some sort of a trunk-incompat
>> > > branch and stay there till
>> > > we accumulate enough of those to warrant another major release.
>> > >
>> > > Regarding "trunk-incompat", since we're still in the alpha stage for
>> > 3.0.0,
>> > > there's no need for this branch yet. This aspect of Vinod's proposal
>> was
>> > > still under a bit of discussion; Chris Douglas though we should cut a
>> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> > thinking.
>> > > This point doesn't necessarily need to be resolved now though, since
>> > again
>> > > we're still doing alphas.
>> > >
>> > > What we should get consensus on is the goal of keeping trunk stable,
>> and
>> > > achieving that by doing more development on feature branches and being
>> > > judicious about merges. My sense from the Hadoop 3 email thread (and
>> the
>> > > more recent one on the async API) is that people are generally in favor
>> > of
>> > > this.
>> > >
>> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
>> > > appreciate everyone's timely response in this matter.
>> > >
>> > > Thanks,
>> > > Andrew
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >
>> >
>>


Re: [DISCUSS] Increased use of feature branches

Posted by Anu Engineer <ae...@hortonworks.com>.
I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
There is an overhead in working with branches, there are both technical costs and administrative issues 
which discourages developers from using branches.

I think the biggest issue with branch based development is that fact that other developers do not use a branch.
If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
development, we should probably move to that model for most development to avoid this tax on people who
 actually end up working in the branches.

I do have a question in my mind though: What is being proposed is that we move active development to branches 
if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
and still maintaining the expectation we can release Alphas from 3.0 does not look possible.

And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.

One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 

What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
Right now it feels like Apache’s process encourages committing directly to trunk than a branch

Thanks
Anu


On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:

>Having worked on a major feature in a feature branch, I have some thoughts
>and observations on feature branch development.
>
>IMO feature branch development v. direct commits to trunk in piecemeal is
>really a choice of *granularity*. Do we want a series of fine-grained state
>changes on trunk or fewer coarse-grained chunks of commits on trunk?
>
>This makes me favor a branch-based development model for any "decent-sized"
>features (we'll need to define "decent-sized" of course). Once you have
>coarse-grained changes, it's easier to reason about what made what release
>and in what state. As importantly, it makes it easier to back out a
>complete feature fairly easily if that becomes necessary. My totally
>unscientific suggestion may be if a feature takes more than dozen commits
>and longer than a month, we should probably have a bias towards a feature
>branch.
>
>Branch-based development also makes you go faster if your feature is
>larger. I wouldn't do it the other way for timeline service v.2 for example.
>
>That said, feature branches don't come for free. Now the onus is on the
>feature developer to constantly rebase with the trunk to keep it reasonably
>integrated with the trunk. More logistics is involved for the feature
>developer. Another big question is, when a feature branch gets big and it's
>time to merge, would it get as scrutinized as a series of individual
>commits? Since the size of merge can be big, you kind of have to rely on
>those feature committers and those who help them.
>
>In terms of integrating/stabilizing, I don't think branch development
>necessarily makes it harder. It is again granularity. In case of direct
>commits on trunk, you do a lot more fine-grained integrations. In case of
>branch development, you do far fewer coarse-grained integrations via
>rebasing. If more people are doing branch-based development, it makes
>rebasing easier to manage too.
>
>Going back to the related topic of where to release (trunk v. branch-X), I
>think that is more of a proxy of the real question of "how do we maintain
>quality and stability of the trunk?". Even if we release from the trunk, if
>our bar for merging to trunk is low, the quality will not improve
>automatically. So I think we ought to tackle the quality question first.
>
>My 2 cents.
>
>
>On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>
>> Thanks for the notes Andrew, Junping, Karthik.
>>
>> Here are some of my understandings:
>>
>> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> Hadoop today, without legacy workloads, trunk is what he/she should use.
>> - Therefore, each commit to trunk should be transactional -- atomic,
>> consistent, isolated (from other uncommitted patches); I'm not so sure
>> about durability, Hadoop might be gone in 50 years :). As a committer, I
>> should be able to look at a patch and determine whether it's a
>> self-contained improvement of trunk, without looking at other uncommitted
>> patches.
>> - Some comments inline:
>>
>> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>>
>> > Comparing with advantages, I believe the disadvantages of shipping any
>> > releases directly from trunk are more obvious and significant:
>> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
>> > to wait to commit to trunk or put into a separated branch that could
>> delay
>> > feature development progress as additional vote process get involved even
>> > the feature is simple and harmless.
>> >
>> Thanks Junping, those are valid concerns. I think we should clearly
>> separate incompatible with  uncompleted / half-done work in this
>> discussion. Whether people should commit incompatible changes to trunk is a
>> much more tricky question (related to trunk-incompat etc.). But per my
>> comment above, IMHO, *not committing uncompleted work to trunk* should be a
>> much easier principle to agree upon.
>>
>>
>> > - For small feature with only 1 or 2 commits, that need three +1 from
>> PMCs
>> > will increase the bar largely for contributors who just start to
>> contribute
>> > on Hadoop features but no such sufficient support.
>> >
>> Development overhead is another valid concern. I think our rule-of-thumb
>> should be that, small-medium new features should be proposed as a single
>> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
>> beyond a single JIRA/patch, use a feature branch.
>>
>>
>> >
>> > Given these concerns, I am open to other options, like: proposed by Vinod
>> > or Chris, but rather than to release anything directly from trunk.
>> >
>> > - This point doesn't necessarily need to be resolved now though, since
>> > again we're still doing alphas.
>> > No. I think we have to settle down this first. Without a common agreed
>> and
>> > transparent release process and branches in community, any release
>> (alpha,
>> > beta) bits is only called a private release but not a official apache
>> > hadoop release (even alpha).
>> >
>> >
>> > Thanks,
>> >
>> > Junping
>> > ________________________________________
>> > From: Karthik Kambatla <ka...@cloudera.com>
>> > Sent: Friday, June 10, 2016 7:49 AM
>> > To: Andrew Wang
>> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >
>> > Thanks for restarting this thread Andrew. I really hope we can get this
>> > across to a VOTE so it is clear.
>> >
>> > I see a few advantages shipping from trunk:
>> >
>> >    - The lack of need for one additional backport each time.
>> >    - Feature rot in trunk
>> >
>> > Instead of creating branch-3, I recommend creating branch-3.x so we can
>> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
>> (I
>> > said it :))
>> >
>> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > On a separate thread, a question was raised about 3.x branching and use
>> > of
>> > > feature branches going forward.
>> > >
>> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> > thread
>> > > that has spanned the years, with Vinod making this proposal (building
>> on
>> > > ideas from others who also commented in the email thread):
>> > >
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> > >
>> > > Pasting here for ease:
>> > >
>> > > On an unrelated note, offline I was pitching to a bunch of
>> > > contributors another idea to deal
>> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>> > >
>> > > What this gains us is that
>> > >  - Trunk is always nearly stable or nearly ready for releases
>> > >  - We no longer have some code lying around in some branch (today’s
>> > > trunk) that is not releasable
>> > > because it gets mixed with other undesirable and incompatible changes.
>> > >  - This needs to be coupled with more discipline on individual
>> > > features - medium to to large
>> > > features are always worked upon in branches and get merged into trunk
>> > > (and a nearing release!)
>> > > when they are ready
>> > >  - All incompatible changes go into some sort of a trunk-incompat
>> > > branch and stay there till
>> > > we accumulate enough of those to warrant another major release.
>> > >
>> > > Regarding "trunk-incompat", since we're still in the alpha stage for
>> > 3.0.0,
>> > > there's no need for this branch yet. This aspect of Vinod's proposal
>> was
>> > > still under a bit of discussion; Chris Douglas though we should cut a
>> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> > thinking.
>> > > This point doesn't necessarily need to be resolved now though, since
>> > again
>> > > we're still doing alphas.
>> > >
>> > > What we should get consensus on is the goal of keeping trunk stable,
>> and
>> > > achieving that by doing more development on feature branches and being
>> > > judicious about merges. My sense from the Hadoop 3 email thread (and
>> the
>> > > more recent one on the async API) is that people are generally in favor
>> > of
>> > > this.
>> > >
>> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
>> > > appreciate everyone's timely response in this matter.
>> > >
>> > > Thanks,
>> > > Andrew
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >
>> >
>>


Re: [DISCUSS] Increased use of feature branches

Posted by Anu Engineer <ae...@hortonworks.com>.
I actively work on two branches (Diskbalancer and ozone) and I agree with most of what Sangjin said. 
There is an overhead in working with branches, there are both technical costs and administrative issues 
which discourages developers from using branches.

I think the biggest issue with branch based development is that fact that other developers do not use a branch.
If a small feature appears as a series of commits to “”datanode.java””, the branch based developer ends up rebasing 
and paying this price of rebasing many times. If everyone followed a model of branch + Pull request, other branches
would not have to deal with continues rebasing to trunk commits. If we are moving to a branch based 
development, we should probably move to that model for most development to avoid this tax on people who
 actually end up working in the branches.

I do have a question in my mind though: What is being proposed is that we move active development to branches 
if the feature is small or incomplete, however keep the trunk open for check-ins. One of the biggest reason why we 
check-in into trunk and not to branch-2 is because it is a change that will break backward compatibility. So do we 
have an expectation of backward compatibility thru the 3.0-alpha series (I personally vote No, since 3.0 is experimental 
at this stage), but if we decide to support some sort of backward-compact then willy-nilly committing to trunk 
and still maintaining the expectation we can release Alphas from 3.0 does not look possible.

And then comes the question, once 3.0 becomes official, where do we check-in a change,  if that would break something? 
so this will lead us back to trunk being the unstable – 3.0 being the new “branch-2”.

One more point: If we are moving to use a branch always – then we are looking at a model similar to using a git + pull 
request model. If that is so would it make sense to modify the rules to make these branches easier to merge?
Say for example, if all commits in a branch has followed review and checking policy – just like trunk and commits 
have been made only after a sign off from a committer, would it be possible to merge with a 3-day voting period 
instead of 7, or treat it just like today’s commit to trunk – but with 2 people signing-off? 

What I am suggesting is reducing the administrative overheads of using a branch to encourage use of branching.  
Right now it feels like Apache’s process encourages committing directly to trunk than a branch

Thanks
Anu


On 6/10/16, 10:50 AM, "sjlee0@gmail.com on behalf of Sangjin Lee" <sjlee0@gmail.com on behalf of sjlee@apache.org> wrote:

>Having worked on a major feature in a feature branch, I have some thoughts
>and observations on feature branch development.
>
>IMO feature branch development v. direct commits to trunk in piecemeal is
>really a choice of *granularity*. Do we want a series of fine-grained state
>changes on trunk or fewer coarse-grained chunks of commits on trunk?
>
>This makes me favor a branch-based development model for any "decent-sized"
>features (we'll need to define "decent-sized" of course). Once you have
>coarse-grained changes, it's easier to reason about what made what release
>and in what state. As importantly, it makes it easier to back out a
>complete feature fairly easily if that becomes necessary. My totally
>unscientific suggestion may be if a feature takes more than dozen commits
>and longer than a month, we should probably have a bias towards a feature
>branch.
>
>Branch-based development also makes you go faster if your feature is
>larger. I wouldn't do it the other way for timeline service v.2 for example.
>
>That said, feature branches don't come for free. Now the onus is on the
>feature developer to constantly rebase with the trunk to keep it reasonably
>integrated with the trunk. More logistics is involved for the feature
>developer. Another big question is, when a feature branch gets big and it's
>time to merge, would it get as scrutinized as a series of individual
>commits? Since the size of merge can be big, you kind of have to rely on
>those feature committers and those who help them.
>
>In terms of integrating/stabilizing, I don't think branch development
>necessarily makes it harder. It is again granularity. In case of direct
>commits on trunk, you do a lot more fine-grained integrations. In case of
>branch development, you do far fewer coarse-grained integrations via
>rebasing. If more people are doing branch-based development, it makes
>rebasing easier to manage too.
>
>Going back to the related topic of where to release (trunk v. branch-X), I
>think that is more of a proxy of the real question of "how do we maintain
>quality and stability of the trunk?". Even if we release from the trunk, if
>our bar for merging to trunk is low, the quality will not improve
>automatically. So I think we ought to tackle the quality question first.
>
>My 2 cents.
>
>
>On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:
>
>> Thanks for the notes Andrew, Junping, Karthik.
>>
>> Here are some of my understandings:
>>
>> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>> Hadoop today, without legacy workloads, trunk is what he/she should use.
>> - Therefore, each commit to trunk should be transactional -- atomic,
>> consistent, isolated (from other uncommitted patches); I'm not so sure
>> about durability, Hadoop might be gone in 50 years :). As a committer, I
>> should be able to look at a patch and determine whether it's a
>> self-contained improvement of trunk, without looking at other uncommitted
>> patches.
>> - Some comments inline:
>>
>> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>>
>> > Comparing with advantages, I believe the disadvantages of shipping any
>> > releases directly from trunk are more obvious and significant:
>> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
>> > to wait to commit to trunk or put into a separated branch that could
>> delay
>> > feature development progress as additional vote process get involved even
>> > the feature is simple and harmless.
>> >
>> Thanks Junping, those are valid concerns. I think we should clearly
>> separate incompatible with  uncompleted / half-done work in this
>> discussion. Whether people should commit incompatible changes to trunk is a
>> much more tricky question (related to trunk-incompat etc.). But per my
>> comment above, IMHO, *not committing uncompleted work to trunk* should be a
>> much easier principle to agree upon.
>>
>>
>> > - For small feature with only 1 or 2 commits, that need three +1 from
>> PMCs
>> > will increase the bar largely for contributors who just start to
>> contribute
>> > on Hadoop features but no such sufficient support.
>> >
>> Development overhead is another valid concern. I think our rule-of-thumb
>> should be that, small-medium new features should be proposed as a single
>> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
>> beyond a single JIRA/patch, use a feature branch.
>>
>>
>> >
>> > Given these concerns, I am open to other options, like: proposed by Vinod
>> > or Chris, but rather than to release anything directly from trunk.
>> >
>> > - This point doesn't necessarily need to be resolved now though, since
>> > again we're still doing alphas.
>> > No. I think we have to settle down this first. Without a common agreed
>> and
>> > transparent release process and branches in community, any release
>> (alpha,
>> > beta) bits is only called a private release but not a official apache
>> > hadoop release (even alpha).
>> >
>> >
>> > Thanks,
>> >
>> > Junping
>> > ________________________________________
>> > From: Karthik Kambatla <ka...@cloudera.com>
>> > Sent: Friday, June 10, 2016 7:49 AM
>> > To: Andrew Wang
>> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> > Subject: Re: [DISCUSS] Increased use of feature branches
>> >
>> > Thanks for restarting this thread Andrew. I really hope we can get this
>> > across to a VOTE so it is clear.
>> >
>> > I see a few advantages shipping from trunk:
>> >
>> >    - The lack of need for one additional backport each time.
>> >    - Feature rot in trunk
>> >
>> > Instead of creating branch-3, I recommend creating branch-3.x so we can
>> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
>> (I
>> > said it :))
>> >
>> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > On a separate thread, a question was raised about 3.x branching and use
>> > of
>> > > feature branches going forward.
>> > >
>> > > We discussed this previously on the "Looking to a Hadoop 3 release"
>> > thread
>> > > that has spanned the years, with Vinod making this proposal (building
>> on
>> > > ideas from others who also commented in the email thread):
>> > >
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>> > >
>> > > Pasting here for ease:
>> > >
>> > > On an unrelated note, offline I was pitching to a bunch of
>> > > contributors another idea to deal
>> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>> > >
>> > > What this gains us is that
>> > >  - Trunk is always nearly stable or nearly ready for releases
>> > >  - We no longer have some code lying around in some branch (today’s
>> > > trunk) that is not releasable
>> > > because it gets mixed with other undesirable and incompatible changes.
>> > >  - This needs to be coupled with more discipline on individual
>> > > features - medium to to large
>> > > features are always worked upon in branches and get merged into trunk
>> > > (and a nearing release!)
>> > > when they are ready
>> > >  - All incompatible changes go into some sort of a trunk-incompat
>> > > branch and stay there till
>> > > we accumulate enough of those to warrant another major release.
>> > >
>> > > Regarding "trunk-incompat", since we're still in the alpha stage for
>> > 3.0.0,
>> > > there's no need for this branch yet. This aspect of Vinod's proposal
>> was
>> > > still under a bit of discussion; Chris Douglas though we should cut a
>> > > branch-3 for the first 3.0.0 beta, which aligns with my original
>> > thinking.
>> > > This point doesn't necessarily need to be resolved now though, since
>> > again
>> > > we're still doing alphas.
>> > >
>> > > What we should get consensus on is the goal of keeping trunk stable,
>> and
>> > > achieving that by doing more development on feature branches and being
>> > > judicious about merges. My sense from the Hadoop 3 email thread (and
>> the
>> > > more recent one on the async API) is that people are generally in favor
>> > of
>> > > this.
>> > >
>> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
>> > > appreciate everyone's timely response in this matter.
>> > >
>> > > Thanks,
>> > > Andrew
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
>> >
>> >
>>


Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Having worked on a major feature in a feature branch, I have some thoughts
and observations on feature branch development.

IMO feature branch development v. direct commits to trunk in piecemeal is
really a choice of *granularity*. Do we want a series of fine-grained state
changes on trunk or fewer coarse-grained chunks of commits on trunk?

This makes me favor a branch-based development model for any "decent-sized"
features (we'll need to define "decent-sized" of course). Once you have
coarse-grained changes, it's easier to reason about what made what release
and in what state. As importantly, it makes it easier to back out a
complete feature fairly easily if that becomes necessary. My totally
unscientific suggestion may be if a feature takes more than dozen commits
and longer than a month, we should probably have a bias towards a feature
branch.

Branch-based development also makes you go faster if your feature is
larger. I wouldn't do it the other way for timeline service v.2 for example.

That said, feature branches don't come for free. Now the onus is on the
feature developer to constantly rebase with the trunk to keep it reasonably
integrated with the trunk. More logistics is involved for the feature
developer. Another big question is, when a feature branch gets big and it's
time to merge, would it get as scrutinized as a series of individual
commits? Since the size of merge can be big, you kind of have to rely on
those feature committers and those who help them.

In terms of integrating/stabilizing, I don't think branch development
necessarily makes it harder. It is again granularity. In case of direct
commits on trunk, you do a lot more fine-grained integrations. In case of
branch development, you do far fewer coarse-grained integrations via
rebasing. If more people are doing branch-based development, it makes
rebasing easier to manage too.

Going back to the related topic of where to release (trunk v. branch-X), I
think that is more of a proxy of the real question of "how do we maintain
quality and stability of the trunk?". Even if we release from the trunk, if
our bar for merging to trunk is low, the quality will not improve
automatically. So I think we ought to tackle the quality question first.

My 2 cents.


On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:

> Thanks for the notes Andrew, Junping, Karthik.
>
> Here are some of my understandings:
>
> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> Hadoop today, without legacy workloads, trunk is what he/she should use.
> - Therefore, each commit to trunk should be transactional -- atomic,
> consistent, isolated (from other uncommitted patches); I'm not so sure
> about durability, Hadoop might be gone in 50 years :). As a committer, I
> should be able to look at a patch and determine whether it's a
> self-contained improvement of trunk, without looking at other uncommitted
> patches.
> - Some comments inline:
>
> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>
> > Comparing with advantages, I believe the disadvantages of shipping any
> > releases directly from trunk are more obvious and significant:
> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> > to wait to commit to trunk or put into a separated branch that could
> delay
> > feature development progress as additional vote process get involved even
> > the feature is simple and harmless.
> >
> Thanks Junping, those are valid concerns. I think we should clearly
> separate incompatible with  uncompleted / half-done work in this
> discussion. Whether people should commit incompatible changes to trunk is a
> much more tricky question (related to trunk-incompat etc.). But per my
> comment above, IMHO, *not committing uncompleted work to trunk* should be a
> much easier principle to agree upon.
>
>
> > - For small feature with only 1 or 2 commits, that need three +1 from
> PMCs
> > will increase the bar largely for contributors who just start to
> contribute
> > on Hadoop features but no such sufficient support.
> >
> Development overhead is another valid concern. I think our rule-of-thumb
> should be that, small-medium new features should be proposed as a single
> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> beyond a single JIRA/patch, use a feature branch.
>
>
> >
> > Given these concerns, I am open to other options, like: proposed by Vinod
> > or Chris, but rather than to release anything directly from trunk.
> >
> > - This point doesn't necessarily need to be resolved now though, since
> > again we're still doing alphas.
> > No. I think we have to settle down this first. Without a common agreed
> and
> > transparent release process and branches in community, any release
> (alpha,
> > beta) bits is only called a private release but not a official apache
> > hadoop release (even alpha).
> >
> >
> > Thanks,
> >
> > Junping
> > ________________________________________
> > From: Karthik Kambatla <ka...@cloudera.com>
> > Sent: Friday, June 10, 2016 7:49 AM
> > To: Andrew Wang
> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> > Subject: Re: [DISCUSS] Increased use of feature branches
> >
> > Thanks for restarting this thread Andrew. I really hope we can get this
> > across to a VOTE so it is clear.
> >
> > I see a few advantages shipping from trunk:
> >
> >    - The lack of need for one additional backport each time.
> >    - Feature rot in trunk
> >
> > Instead of creating branch-3, I recommend creating branch-3.x so we can
> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
> (I
> > said it :))
> >
> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > On a separate thread, a question was raised about 3.x branching and use
> > of
> > > feature branches going forward.
> > >
> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> > thread
> > > that has spanned the years, with Vinod making this proposal (building
> on
> > > ideas from others who also commented in the email thread):
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> > >
> > > Pasting here for ease:
> > >
> > > On an unrelated note, offline I was pitching to a bunch of
> > > contributors another idea to deal
> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> > >
> > > What this gains us is that
> > >  - Trunk is always nearly stable or nearly ready for releases
> > >  - We no longer have some code lying around in some branch (today’s
> > > trunk) that is not releasable
> > > because it gets mixed with other undesirable and incompatible changes.
> > >  - This needs to be coupled with more discipline on individual
> > > features - medium to to large
> > > features are always worked upon in branches and get merged into trunk
> > > (and a nearing release!)
> > > when they are ready
> > >  - All incompatible changes go into some sort of a trunk-incompat
> > > branch and stay there till
> > > we accumulate enough of those to warrant another major release.
> > >
> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> > 3.0.0,
> > > there's no need for this branch yet. This aspect of Vinod's proposal
> was
> > > still under a bit of discussion; Chris Douglas though we should cut a
> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> > thinking.
> > > This point doesn't necessarily need to be resolved now though, since
> > again
> > > we're still doing alphas.
> > >
> > > What we should get consensus on is the goal of keeping trunk stable,
> and
> > > achieving that by doing more development on feature branches and being
> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> the
> > > more recent one on the async API) is that people are generally in favor
> > of
> > > this.
> > >
> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > > appreciate everyone's timely response in this matter.
> > >
> > > Thanks,
> > > Andrew
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Having worked on a major feature in a feature branch, I have some thoughts
and observations on feature branch development.

IMO feature branch development v. direct commits to trunk in piecemeal is
really a choice of *granularity*. Do we want a series of fine-grained state
changes on trunk or fewer coarse-grained chunks of commits on trunk?

This makes me favor a branch-based development model for any "decent-sized"
features (we'll need to define "decent-sized" of course). Once you have
coarse-grained changes, it's easier to reason about what made what release
and in what state. As importantly, it makes it easier to back out a
complete feature fairly easily if that becomes necessary. My totally
unscientific suggestion may be if a feature takes more than dozen commits
and longer than a month, we should probably have a bias towards a feature
branch.

Branch-based development also makes you go faster if your feature is
larger. I wouldn't do it the other way for timeline service v.2 for example.

That said, feature branches don't come for free. Now the onus is on the
feature developer to constantly rebase with the trunk to keep it reasonably
integrated with the trunk. More logistics is involved for the feature
developer. Another big question is, when a feature branch gets big and it's
time to merge, would it get as scrutinized as a series of individual
commits? Since the size of merge can be big, you kind of have to rely on
those feature committers and those who help them.

In terms of integrating/stabilizing, I don't think branch development
necessarily makes it harder. It is again granularity. In case of direct
commits on trunk, you do a lot more fine-grained integrations. In case of
branch development, you do far fewer coarse-grained integrations via
rebasing. If more people are doing branch-based development, it makes
rebasing easier to manage too.

Going back to the related topic of where to release (trunk v. branch-X), I
think that is more of a proxy of the real question of "how do we maintain
quality and stability of the trunk?". Even if we release from the trunk, if
our bar for merging to trunk is low, the quality will not improve
automatically. So I think we ought to tackle the quality question first.

My 2 cents.


On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:

> Thanks for the notes Andrew, Junping, Karthik.
>
> Here are some of my understandings:
>
> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> Hadoop today, without legacy workloads, trunk is what he/she should use.
> - Therefore, each commit to trunk should be transactional -- atomic,
> consistent, isolated (from other uncommitted patches); I'm not so sure
> about durability, Hadoop might be gone in 50 years :). As a committer, I
> should be able to look at a patch and determine whether it's a
> self-contained improvement of trunk, without looking at other uncommitted
> patches.
> - Some comments inline:
>
> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>
> > Comparing with advantages, I believe the disadvantages of shipping any
> > releases directly from trunk are more obvious and significant:
> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> > to wait to commit to trunk or put into a separated branch that could
> delay
> > feature development progress as additional vote process get involved even
> > the feature is simple and harmless.
> >
> Thanks Junping, those are valid concerns. I think we should clearly
> separate incompatible with  uncompleted / half-done work in this
> discussion. Whether people should commit incompatible changes to trunk is a
> much more tricky question (related to trunk-incompat etc.). But per my
> comment above, IMHO, *not committing uncompleted work to trunk* should be a
> much easier principle to agree upon.
>
>
> > - For small feature with only 1 or 2 commits, that need three +1 from
> PMCs
> > will increase the bar largely for contributors who just start to
> contribute
> > on Hadoop features but no such sufficient support.
> >
> Development overhead is another valid concern. I think our rule-of-thumb
> should be that, small-medium new features should be proposed as a single
> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> beyond a single JIRA/patch, use a feature branch.
>
>
> >
> > Given these concerns, I am open to other options, like: proposed by Vinod
> > or Chris, but rather than to release anything directly from trunk.
> >
> > - This point doesn't necessarily need to be resolved now though, since
> > again we're still doing alphas.
> > No. I think we have to settle down this first. Without a common agreed
> and
> > transparent release process and branches in community, any release
> (alpha,
> > beta) bits is only called a private release but not a official apache
> > hadoop release (even alpha).
> >
> >
> > Thanks,
> >
> > Junping
> > ________________________________________
> > From: Karthik Kambatla <ka...@cloudera.com>
> > Sent: Friday, June 10, 2016 7:49 AM
> > To: Andrew Wang
> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> > Subject: Re: [DISCUSS] Increased use of feature branches
> >
> > Thanks for restarting this thread Andrew. I really hope we can get this
> > across to a VOTE so it is clear.
> >
> > I see a few advantages shipping from trunk:
> >
> >    - The lack of need for one additional backport each time.
> >    - Feature rot in trunk
> >
> > Instead of creating branch-3, I recommend creating branch-3.x so we can
> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
> (I
> > said it :))
> >
> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > On a separate thread, a question was raised about 3.x branching and use
> > of
> > > feature branches going forward.
> > >
> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> > thread
> > > that has spanned the years, with Vinod making this proposal (building
> on
> > > ideas from others who also commented in the email thread):
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> > >
> > > Pasting here for ease:
> > >
> > > On an unrelated note, offline I was pitching to a bunch of
> > > contributors another idea to deal
> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> > >
> > > What this gains us is that
> > >  - Trunk is always nearly stable or nearly ready for releases
> > >  - We no longer have some code lying around in some branch (today’s
> > > trunk) that is not releasable
> > > because it gets mixed with other undesirable and incompatible changes.
> > >  - This needs to be coupled with more discipline on individual
> > > features - medium to to large
> > > features are always worked upon in branches and get merged into trunk
> > > (and a nearing release!)
> > > when they are ready
> > >  - All incompatible changes go into some sort of a trunk-incompat
> > > branch and stay there till
> > > we accumulate enough of those to warrant another major release.
> > >
> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> > 3.0.0,
> > > there's no need for this branch yet. This aspect of Vinod's proposal
> was
> > > still under a bit of discussion; Chris Douglas though we should cut a
> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> > thinking.
> > > This point doesn't necessarily need to be resolved now though, since
> > again
> > > we're still doing alphas.
> > >
> > > What we should get consensus on is the goal of keeping trunk stable,
> and
> > > achieving that by doing more development on feature branches and being
> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> the
> > > more recent one on the async API) is that people are generally in favor
> > of
> > > this.
> > >
> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > > appreciate everyone's timely response in this matter.
> > >
> > > Thanks,
> > > Andrew
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Having worked on a major feature in a feature branch, I have some thoughts
and observations on feature branch development.

IMO feature branch development v. direct commits to trunk in piecemeal is
really a choice of *granularity*. Do we want a series of fine-grained state
changes on trunk or fewer coarse-grained chunks of commits on trunk?

This makes me favor a branch-based development model for any "decent-sized"
features (we'll need to define "decent-sized" of course). Once you have
coarse-grained changes, it's easier to reason about what made what release
and in what state. As importantly, it makes it easier to back out a
complete feature fairly easily if that becomes necessary. My totally
unscientific suggestion may be if a feature takes more than dozen commits
and longer than a month, we should probably have a bias towards a feature
branch.

Branch-based development also makes you go faster if your feature is
larger. I wouldn't do it the other way for timeline service v.2 for example.

That said, feature branches don't come for free. Now the onus is on the
feature developer to constantly rebase with the trunk to keep it reasonably
integrated with the trunk. More logistics is involved for the feature
developer. Another big question is, when a feature branch gets big and it's
time to merge, would it get as scrutinized as a series of individual
commits? Since the size of merge can be big, you kind of have to rely on
those feature committers and those who help them.

In terms of integrating/stabilizing, I don't think branch development
necessarily makes it harder. It is again granularity. In case of direct
commits on trunk, you do a lot more fine-grained integrations. In case of
branch development, you do far fewer coarse-grained integrations via
rebasing. If more people are doing branch-based development, it makes
rebasing easier to manage too.

Going back to the related topic of where to release (trunk v. branch-X), I
think that is more of a proxy of the real question of "how do we maintain
quality and stability of the trunk?". Even if we release from the trunk, if
our bar for merging to trunk is low, the quality will not improve
automatically. So I think we ought to tackle the quality question first.

My 2 cents.


On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:

> Thanks for the notes Andrew, Junping, Karthik.
>
> Here are some of my understandings:
>
> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> Hadoop today, without legacy workloads, trunk is what he/she should use.
> - Therefore, each commit to trunk should be transactional -- atomic,
> consistent, isolated (from other uncommitted patches); I'm not so sure
> about durability, Hadoop might be gone in 50 years :). As a committer, I
> should be able to look at a patch and determine whether it's a
> self-contained improvement of trunk, without looking at other uncommitted
> patches.
> - Some comments inline:
>
> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>
> > Comparing with advantages, I believe the disadvantages of shipping any
> > releases directly from trunk are more obvious and significant:
> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> > to wait to commit to trunk or put into a separated branch that could
> delay
> > feature development progress as additional vote process get involved even
> > the feature is simple and harmless.
> >
> Thanks Junping, those are valid concerns. I think we should clearly
> separate incompatible with  uncompleted / half-done work in this
> discussion. Whether people should commit incompatible changes to trunk is a
> much more tricky question (related to trunk-incompat etc.). But per my
> comment above, IMHO, *not committing uncompleted work to trunk* should be a
> much easier principle to agree upon.
>
>
> > - For small feature with only 1 or 2 commits, that need three +1 from
> PMCs
> > will increase the bar largely for contributors who just start to
> contribute
> > on Hadoop features but no such sufficient support.
> >
> Development overhead is another valid concern. I think our rule-of-thumb
> should be that, small-medium new features should be proposed as a single
> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> beyond a single JIRA/patch, use a feature branch.
>
>
> >
> > Given these concerns, I am open to other options, like: proposed by Vinod
> > or Chris, but rather than to release anything directly from trunk.
> >
> > - This point doesn't necessarily need to be resolved now though, since
> > again we're still doing alphas.
> > No. I think we have to settle down this first. Without a common agreed
> and
> > transparent release process and branches in community, any release
> (alpha,
> > beta) bits is only called a private release but not a official apache
> > hadoop release (even alpha).
> >
> >
> > Thanks,
> >
> > Junping
> > ________________________________________
> > From: Karthik Kambatla <ka...@cloudera.com>
> > Sent: Friday, June 10, 2016 7:49 AM
> > To: Andrew Wang
> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> > Subject: Re: [DISCUSS] Increased use of feature branches
> >
> > Thanks for restarting this thread Andrew. I really hope we can get this
> > across to a VOTE so it is clear.
> >
> > I see a few advantages shipping from trunk:
> >
> >    - The lack of need for one additional backport each time.
> >    - Feature rot in trunk
> >
> > Instead of creating branch-3, I recommend creating branch-3.x so we can
> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
> (I
> > said it :))
> >
> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > On a separate thread, a question was raised about 3.x branching and use
> > of
> > > feature branches going forward.
> > >
> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> > thread
> > > that has spanned the years, with Vinod making this proposal (building
> on
> > > ideas from others who also commented in the email thread):
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> > >
> > > Pasting here for ease:
> > >
> > > On an unrelated note, offline I was pitching to a bunch of
> > > contributors another idea to deal
> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> > >
> > > What this gains us is that
> > >  - Trunk is always nearly stable or nearly ready for releases
> > >  - We no longer have some code lying around in some branch (today’s
> > > trunk) that is not releasable
> > > because it gets mixed with other undesirable and incompatible changes.
> > >  - This needs to be coupled with more discipline on individual
> > > features - medium to to large
> > > features are always worked upon in branches and get merged into trunk
> > > (and a nearing release!)
> > > when they are ready
> > >  - All incompatible changes go into some sort of a trunk-incompat
> > > branch and stay there till
> > > we accumulate enough of those to warrant another major release.
> > >
> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> > 3.0.0,
> > > there's no need for this branch yet. This aspect of Vinod's proposal
> was
> > > still under a bit of discussion; Chris Douglas though we should cut a
> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> > thinking.
> > > This point doesn't necessarily need to be resolved now though, since
> > again
> > > we're still doing alphas.
> > >
> > > What we should get consensus on is the goal of keeping trunk stable,
> and
> > > achieving that by doing more development on feature branches and being
> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> the
> > > more recent one on the async API) is that people are generally in favor
> > of
> > > this.
> > >
> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > > appreciate everyone's timely response in this matter.
> > >
> > > Thanks,
> > > Andrew
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Sangjin Lee <sj...@apache.org>.
Having worked on a major feature in a feature branch, I have some thoughts
and observations on feature branch development.

IMO feature branch development v. direct commits to trunk in piecemeal is
really a choice of *granularity*. Do we want a series of fine-grained state
changes on trunk or fewer coarse-grained chunks of commits on trunk?

This makes me favor a branch-based development model for any "decent-sized"
features (we'll need to define "decent-sized" of course). Once you have
coarse-grained changes, it's easier to reason about what made what release
and in what state. As importantly, it makes it easier to back out a
complete feature fairly easily if that becomes necessary. My totally
unscientific suggestion may be if a feature takes more than dozen commits
and longer than a month, we should probably have a bias towards a feature
branch.

Branch-based development also makes you go faster if your feature is
larger. I wouldn't do it the other way for timeline service v.2 for example.

That said, feature branches don't come for free. Now the onus is on the
feature developer to constantly rebase with the trunk to keep it reasonably
integrated with the trunk. More logistics is involved for the feature
developer. Another big question is, when a feature branch gets big and it's
time to merge, would it get as scrutinized as a series of individual
commits? Since the size of merge can be big, you kind of have to rely on
those feature committers and those who help them.

In terms of integrating/stabilizing, I don't think branch development
necessarily makes it harder. It is again granularity. In case of direct
commits on trunk, you do a lot more fine-grained integrations. In case of
branch development, you do far fewer coarse-grained integrations via
rebasing. If more people are doing branch-based development, it makes
rebasing easier to manage too.

Going back to the related topic of where to release (trunk v. branch-X), I
think that is more of a proxy of the real question of "how do we maintain
quality and stability of the trunk?". Even if we release from the trunk, if
our bar for merging to trunk is low, the quality will not improve
automatically. So I think we ought to tackle the quality question first.

My 2 cents.


On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <zh...@apache.org> wrote:

> Thanks for the notes Andrew, Junping, Karthik.
>
> Here are some of my understandings:
>
> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
> Hadoop today, without legacy workloads, trunk is what he/she should use.
> - Therefore, each commit to trunk should be transactional -- atomic,
> consistent, isolated (from other uncommitted patches); I'm not so sure
> about durability, Hadoop might be gone in 50 years :). As a committer, I
> should be able to look at a patch and determine whether it's a
> self-contained improvement of trunk, without looking at other uncommitted
> patches.
> - Some comments inline:
>
> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:
>
> > Comparing with advantages, I believe the disadvantages of shipping any
> > releases directly from trunk are more obvious and significant:
> > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> > to wait to commit to trunk or put into a separated branch that could
> delay
> > feature development progress as additional vote process get involved even
> > the feature is simple and harmless.
> >
> Thanks Junping, those are valid concerns. I think we should clearly
> separate incompatible with  uncompleted / half-done work in this
> discussion. Whether people should commit incompatible changes to trunk is a
> much more tricky question (related to trunk-incompat etc.). But per my
> comment above, IMHO, *not committing uncompleted work to trunk* should be a
> much easier principle to agree upon.
>
>
> > - For small feature with only 1 or 2 commits, that need three +1 from
> PMCs
> > will increase the bar largely for contributors who just start to
> contribute
> > on Hadoop features but no such sufficient support.
> >
> Development overhead is another valid concern. I think our rule-of-thumb
> should be that, small-medium new features should be proposed as a single
> JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
> beyond a single JIRA/patch, use a feature branch.
>
>
> >
> > Given these concerns, I am open to other options, like: proposed by Vinod
> > or Chris, but rather than to release anything directly from trunk.
> >
> > - This point doesn't necessarily need to be resolved now though, since
> > again we're still doing alphas.
> > No. I think we have to settle down this first. Without a common agreed
> and
> > transparent release process and branches in community, any release
> (alpha,
> > beta) bits is only called a private release but not a official apache
> > hadoop release (even alpha).
> >
> >
> > Thanks,
> >
> > Junping
> > ________________________________________
> > From: Karthik Kambatla <ka...@cloudera.com>
> > Sent: Friday, June 10, 2016 7:49 AM
> > To: Andrew Wang
> > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> > mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> > Subject: Re: [DISCUSS] Increased use of feature branches
> >
> > Thanks for restarting this thread Andrew. I really hope we can get this
> > across to a VOTE so it is clear.
> >
> > I see a few advantages shipping from trunk:
> >
> >    - The lack of need for one additional backport each time.
> >    - Feature rot in trunk
> >
> > Instead of creating branch-3, I recommend creating branch-3.x so we can
> > continue doing 3.x releases off branch-3 even after we move trunk to 4.x
> (I
> > said it :))
> >
> > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > On a separate thread, a question was raised about 3.x branching and use
> > of
> > > feature branches going forward.
> > >
> > > We discussed this previously on the "Looking to a Hadoop 3 release"
> > thread
> > > that has spanned the years, with Vinod making this proposal (building
> on
> > > ideas from others who also commented in the email thread):
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> > >
> > > Pasting here for ease:
> > >
> > > On an unrelated note, offline I was pitching to a bunch of
> > > contributors another idea to deal
> > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> > >
> > > What this gains us is that
> > >  - Trunk is always nearly stable or nearly ready for releases
> > >  - We no longer have some code lying around in some branch (today’s
> > > trunk) that is not releasable
> > > because it gets mixed with other undesirable and incompatible changes.
> > >  - This needs to be coupled with more discipline on individual
> > > features - medium to to large
> > > features are always worked upon in branches and get merged into trunk
> > > (and a nearing release!)
> > > when they are ready
> > >  - All incompatible changes go into some sort of a trunk-incompat
> > > branch and stay there till
> > > we accumulate enough of those to warrant another major release.
> > >
> > > Regarding "trunk-incompat", since we're still in the alpha stage for
> > 3.0.0,
> > > there's no need for this branch yet. This aspect of Vinod's proposal
> was
> > > still under a bit of discussion; Chris Douglas though we should cut a
> > > branch-3 for the first 3.0.0 beta, which aligns with my original
> > thinking.
> > > This point doesn't necessarily need to be resolved now though, since
> > again
> > > we're still doing alphas.
> > >
> > > What we should get consensus on is the goal of keeping trunk stable,
> and
> > > achieving that by doing more development on feature branches and being
> > > judicious about merges. My sense from the Hadoop 3 email thread (and
> the
> > > more recent one on the async API) is that people are generally in favor
> > of
> > > this.
> > >
> > > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > > appreciate everyone's timely response in this matter.
> > >
> > > Thanks,
> > > Andrew
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Zhe Zhang <zh...@apache.org>.
Thanks for the notes Andrew, Junping, Karthik.

Here are some of my understandings:

- Trunk is the "latest and greatest" of Hadoop. If a user starts using
Hadoop today, without legacy workloads, trunk is what he/she should use.
- Therefore, each commit to trunk should be transactional -- atomic,
consistent, isolated (from other uncommitted patches); I'm not so sure
about durability, Hadoop might be gone in 50 years :). As a committer, I
should be able to look at a patch and determine whether it's a
self-contained improvement of trunk, without looking at other uncommitted
patches.
- Some comments inline:

On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>
Thanks Junping, those are valid concerns. I think we should clearly
separate incompatible with  uncompleted / half-done work in this
discussion. Whether people should commit incompatible changes to trunk is a
much more tricky question (related to trunk-incompat etc.). But per my
comment above, IMHO, *not committing uncompleted work to trunk* should be a
much easier principle to agree upon.


> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>
Development overhead is another valid concern. I think our rule-of-thumb
should be that, small-medium new features should be proposed as a single
JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
beyond a single JIRA/patch, use a feature branch.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Inline.

On Fri, Jun 10, 2016 at 6:56 AM, Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>

Including these sorts of commits in trunk is a major pain.

One example from a recent mistake I made:
YARN-2877 and YARN-1011 had some common changes. Instead of putting them in
a separate branch, I committed these common changes to trunk because well
we don't release from trunk and what can go wrong. After a few days, other
contributors and committers started feeling annoyed about having to submit
two different patches for trunk and branch-2. This inconvenience led to
those patches being pulled into branch-2 even though they were not ready
for inclusion in branch-2 or a 2.x release.

I feel the major friction for feature branches comes from only some
features using it. If everyone uses feature branches and we have better
processes around quantifying the stability of a feature branch, feature
branches should make for a smoother experience for everyone.

It is not uncommon for features to get merged into trunk before being ready
with promises of follow-up work. While that might very well be the intent
of contributors, other work items come up and things get sidelined. How
often have we seen features without HA and security.


>
> - These commits left in separated branches are isolated and get more
> chance to conflict each other, and more bugs could be involved due to
> conflicts and/or less eyes watching/bless on isolated branches.
>

Partially agree. There is a tradeoff here: if we keep putting them into
trunk, they (1) destabilize trunk, and (2) conflict with other bug fixes
and smaller improvements.


>
> - More unnecessary arguments/debates will happen on if some commits should
> land on trunk or a separated branch, just like what we have recently.
>

Again, clearly defining the requirements to be merged into trunk will make
this easier. How is this different from what we do today for branch-2? If
we still have debates, that is probably required? Not having them today is
actually a concern?


>
> - Because branches will get increased massively, more community efforts
> will be spent on review & vote for branches merge that means less effort
> will be spent on other commits review given our review bandwidth is quite
> short so far.
>

Yes and no. Strictly using feature branches will serialize features.
Integrating with other features is a one-time, albeit more involved,
process instead of multiple rebases/resolutions each somewhat involved.


>
> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>

If a feature/improvement is not supported by 3 committers (not PMC
members), it is probably worth looking at why. May be, this feature should
not be included at all?

I am open to changing the requirements for a merge. What do you think of
one +1 (thorough review) and two +0s (high-level review).

If the concern is finding enough committers, I would like for the PMC to
consider voting in more committers and increasing bandwidth.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>

I actually thought this was Vinod's proposal. My understanding is Andrew is
resurfacing this so we finalize things.


>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
I am absolutely with Junping here. Changing this process primarily requires
a change in our mental model. I think it is pretty important that we decide
on one approach preferably before doing an alpha release.

To clarify: our current approach (trunk and branch-2) has been working
okay. The only issue I see is in the way we take merging into trunk
lightly. If we have well-defined requirements for merging to trunk and take
those seriously, I am comfortable with using the approach for 3.x. The new
proposal forces following these requirements and hence I like it more.


>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Zhe Zhang <zh...@apache.org>.
Thanks for the notes Andrew, Junping, Karthik.

Here are some of my understandings:

- Trunk is the "latest and greatest" of Hadoop. If a user starts using
Hadoop today, without legacy workloads, trunk is what he/she should use.
- Therefore, each commit to trunk should be transactional -- atomic,
consistent, isolated (from other uncommitted patches); I'm not so sure
about durability, Hadoop might be gone in 50 years :). As a committer, I
should be able to look at a patch and determine whether it's a
self-contained improvement of trunk, without looking at other uncommitted
patches.
- Some comments inline:

On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>
Thanks Junping, those are valid concerns. I think we should clearly
separate incompatible with  uncompleted / half-done work in this
discussion. Whether people should commit incompatible changes to trunk is a
much more tricky question (related to trunk-incompat etc.). But per my
comment above, IMHO, *not committing uncompleted work to trunk* should be a
much easier principle to agree upon.


> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>
Development overhead is another valid concern. I think our rule-of-thumb
should be that, small-medium new features should be proposed as a single
JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
beyond a single JIRA/patch, use a feature branch.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Zhe Zhang <zh...@apache.org>.
Thanks for the notes Andrew, Junping, Karthik.

Here are some of my understandings:

- Trunk is the "latest and greatest" of Hadoop. If a user starts using
Hadoop today, without legacy workloads, trunk is what he/she should use.
- Therefore, each commit to trunk should be transactional -- atomic,
consistent, isolated (from other uncommitted patches); I'm not so sure
about durability, Hadoop might be gone in 50 years :). As a committer, I
should be able to look at a patch and determine whether it's a
self-contained improvement of trunk, without looking at other uncommitted
patches.
- Some comments inline:

On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>
Thanks Junping, those are valid concerns. I think we should clearly
separate incompatible with  uncompleted / half-done work in this
discussion. Whether people should commit incompatible changes to trunk is a
much more tricky question (related to trunk-incompat etc.). But per my
comment above, IMHO, *not committing uncompleted work to trunk* should be a
much easier principle to agree upon.


> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>
Development overhead is another valid concern. I think our rule-of-thumb
should be that, small-medium new features should be proposed as a single
JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
beyond a single JIRA/patch, use a feature branch.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Inline.

On Fri, Jun 10, 2016 at 6:56 AM, Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>

Including these sorts of commits in trunk is a major pain.

One example from a recent mistake I made:
YARN-2877 and YARN-1011 had some common changes. Instead of putting them in
a separate branch, I committed these common changes to trunk because well
we don't release from trunk and what can go wrong. After a few days, other
contributors and committers started feeling annoyed about having to submit
two different patches for trunk and branch-2. This inconvenience led to
those patches being pulled into branch-2 even though they were not ready
for inclusion in branch-2 or a 2.x release.

I feel the major friction for feature branches comes from only some
features using it. If everyone uses feature branches and we have better
processes around quantifying the stability of a feature branch, feature
branches should make for a smoother experience for everyone.

It is not uncommon for features to get merged into trunk before being ready
with promises of follow-up work. While that might very well be the intent
of contributors, other work items come up and things get sidelined. How
often have we seen features without HA and security.


>
> - These commits left in separated branches are isolated and get more
> chance to conflict each other, and more bugs could be involved due to
> conflicts and/or less eyes watching/bless on isolated branches.
>

Partially agree. There is a tradeoff here: if we keep putting them into
trunk, they (1) destabilize trunk, and (2) conflict with other bug fixes
and smaller improvements.


>
> - More unnecessary arguments/debates will happen on if some commits should
> land on trunk or a separated branch, just like what we have recently.
>

Again, clearly defining the requirements to be merged into trunk will make
this easier. How is this different from what we do today for branch-2? If
we still have debates, that is probably required? Not having them today is
actually a concern?


>
> - Because branches will get increased massively, more community efforts
> will be spent on review & vote for branches merge that means less effort
> will be spent on other commits review given our review bandwidth is quite
> short so far.
>

Yes and no. Strictly using feature branches will serialize features.
Integrating with other features is a one-time, albeit more involved,
process instead of multiple rebases/resolutions each somewhat involved.


>
> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>

If a feature/improvement is not supported by 3 committers (not PMC
members), it is probably worth looking at why. May be, this feature should
not be included at all?

I am open to changing the requirements for a merge. What do you think of
one +1 (thorough review) and two +0s (high-level review).

If the concern is finding enough committers, I would like for the PMC to
consider voting in more committers and increasing bandwidth.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>

I actually thought this was Vinod's proposal. My understanding is Andrew is
resurfacing this so we finalize things.


>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
I am absolutely with Junping here. Changing this process primarily requires
a change in our mental model. I think it is pretty important that we decide
on one approach preferably before doing an alpha release.

To clarify: our current approach (trunk and branch-2) has been working
okay. The only issue I see is in the way we take merging into trunk
lightly. If we have well-defined requirements for merging to trunk and take
those seriously, I am comfortable with using the approach for 3.x. The new
proposal forces following these requirements and hence I like it more.


>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Inline.

On Fri, Jun 10, 2016 at 6:56 AM, Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>

Including these sorts of commits in trunk is a major pain.

One example from a recent mistake I made:
YARN-2877 and YARN-1011 had some common changes. Instead of putting them in
a separate branch, I committed these common changes to trunk because well
we don't release from trunk and what can go wrong. After a few days, other
contributors and committers started feeling annoyed about having to submit
two different patches for trunk and branch-2. This inconvenience led to
those patches being pulled into branch-2 even though they were not ready
for inclusion in branch-2 or a 2.x release.

I feel the major friction for feature branches comes from only some
features using it. If everyone uses feature branches and we have better
processes around quantifying the stability of a feature branch, feature
branches should make for a smoother experience for everyone.

It is not uncommon for features to get merged into trunk before being ready
with promises of follow-up work. While that might very well be the intent
of contributors, other work items come up and things get sidelined. How
often have we seen features without HA and security.


>
> - These commits left in separated branches are isolated and get more
> chance to conflict each other, and more bugs could be involved due to
> conflicts and/or less eyes watching/bless on isolated branches.
>

Partially agree. There is a tradeoff here: if we keep putting them into
trunk, they (1) destabilize trunk, and (2) conflict with other bug fixes
and smaller improvements.


>
> - More unnecessary arguments/debates will happen on if some commits should
> land on trunk or a separated branch, just like what we have recently.
>

Again, clearly defining the requirements to be merged into trunk will make
this easier. How is this different from what we do today for branch-2? If
we still have debates, that is probably required? Not having them today is
actually a concern?


>
> - Because branches will get increased massively, more community efforts
> will be spent on review & vote for branches merge that means less effort
> will be spent on other commits review given our review bandwidth is quite
> short so far.
>

Yes and no. Strictly using feature branches will serialize features.
Integrating with other features is a one-time, albeit more involved,
process instead of multiple rebases/resolutions each somewhat involved.


>
> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>

If a feature/improvement is not supported by 3 committers (not PMC
members), it is probably worth looking at why. May be, this feature should
not be included at all?

I am open to changing the requirements for a merge. What do you think of
one +1 (thorough review) and two +0s (high-level review).

If the concern is finding enough committers, I would like for the PMC to
consider voting in more committers and increasing bandwidth.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>

I actually thought this was Vinod's proposal. My understanding is Andrew is
resurfacing this so we finalize things.


>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
I am absolutely with Junping here. Changing this process primarily requires
a change in our mental model. I think it is pretty important that we decide
on one approach preferably before doing an alpha release.

To clarify: our current approach (trunk and branch-2) has been working
okay. The only issue I see is in the way we take merging into trunk
lightly. If we have well-defined requirements for merging to trunk and take
those seriously, I am comfortable with using the approach for 3.x. The new
proposal forces following these requirements and hence I like it more.


>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Inline.

On Fri, Jun 10, 2016 at 6:56 AM, Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>

Including these sorts of commits in trunk is a major pain.

One example from a recent mistake I made:
YARN-2877 and YARN-1011 had some common changes. Instead of putting them in
a separate branch, I committed these common changes to trunk because well
we don't release from trunk and what can go wrong. After a few days, other
contributors and committers started feeling annoyed about having to submit
two different patches for trunk and branch-2. This inconvenience led to
those patches being pulled into branch-2 even though they were not ready
for inclusion in branch-2 or a 2.x release.

I feel the major friction for feature branches comes from only some
features using it. If everyone uses feature branches and we have better
processes around quantifying the stability of a feature branch, feature
branches should make for a smoother experience for everyone.

It is not uncommon for features to get merged into trunk before being ready
with promises of follow-up work. While that might very well be the intent
of contributors, other work items come up and things get sidelined. How
often have we seen features without HA and security.


>
> - These commits left in separated branches are isolated and get more
> chance to conflict each other, and more bugs could be involved due to
> conflicts and/or less eyes watching/bless on isolated branches.
>

Partially agree. There is a tradeoff here: if we keep putting them into
trunk, they (1) destabilize trunk, and (2) conflict with other bug fixes
and smaller improvements.


>
> - More unnecessary arguments/debates will happen on if some commits should
> land on trunk or a separated branch, just like what we have recently.
>

Again, clearly defining the requirements to be merged into trunk will make
this easier. How is this different from what we do today for branch-2? If
we still have debates, that is probably required? Not having them today is
actually a concern?


>
> - Because branches will get increased massively, more community efforts
> will be spent on review & vote for branches merge that means less effort
> will be spent on other commits review given our review bandwidth is quite
> short so far.
>

Yes and no. Strictly using feature branches will serialize features.
Integrating with other features is a one-time, albeit more involved,
process instead of multiple rebases/resolutions each somewhat involved.


>
> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>

If a feature/improvement is not supported by 3 committers (not PMC
members), it is probably worth looking at why. May be, this feature should
not be included at all?

I am open to changing the requirements for a merge. What do you think of
one +1 (thorough review) and two +0s (high-level review).

If the concern is finding enough committers, I would like for the PMC to
consider voting in more committers and increasing bandwidth.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>

I actually thought this was Vinod's proposal. My understanding is Andrew is
resurfacing this so we finalize things.


>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
I am absolutely with Junping here. Changing this process primarily requires
a change in our mental model. I think it is pretty important that we decide
on one approach preferably before doing an alpha release.

To clarify: our current approach (trunk and branch-2) has been working
okay. The only issue I see is in the way we take merging into trunk
lightly. If we have well-defined requirements for merging to trunk and take
those seriously, I am comfortable with using the approach for 3.x. The new
proposal forces following these requirements and hence I like it more.


>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>

Re: [DISCUSS] Increased use of feature branches

Posted by Zhe Zhang <zh...@apache.org>.
Thanks for the notes Andrew, Junping, Karthik.

Here are some of my understandings:

- Trunk is the "latest and greatest" of Hadoop. If a user starts using
Hadoop today, without legacy workloads, trunk is what he/she should use.
- Therefore, each commit to trunk should be transactional -- atomic,
consistent, isolated (from other uncommitted patches); I'm not so sure
about durability, Hadoop might be gone in 50 years :). As a committer, I
should be able to look at a patch and determine whether it's a
self-contained improvement of trunk, without looking at other uncommitted
patches.
- Some comments inline:

On Fri, Jun 10, 2016 at 6:56 AM Junping Du <jd...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>
Thanks Junping, those are valid concerns. I think we should clearly
separate incompatible with  uncompleted / half-done work in this
discussion. Whether people should commit incompatible changes to trunk is a
much more tricky question (related to trunk-incompat etc.). But per my
comment above, IMHO, *not committing uncompleted work to trunk* should be a
much easier principle to agree upon.


> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>
Development overhead is another valid concern. I think our rule-of-thumb
should be that, small-medium new features should be proposed as a single
JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes
beyond a single JIRA/patch, use a feature branch.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
> Thanks,
>
> Junping
> ________________________________________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Friday, June 10, 2016 7:49 AM
> To: Andrew Wang
> Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Increased use of feature branches
>
> Thanks for restarting this thread Andrew. I really hope we can get this
> across to a VOTE so it is clear.
>
> I see a few advantages shipping from trunk:
>
>    - The lack of need for one additional backport each time.
>    - Feature rot in trunk
>
> Instead of creating branch-3, I recommend creating branch-3.x so we can
> continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
> said it :))
>
> On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
> wrote:
>
> > Hi all,
> >
> > On a separate thread, a question was raised about 3.x branching and use
> of
> > feature branches going forward.
> >
> > We discussed this previously on the "Looking to a Hadoop 3 release"
> thread
> > that has spanned the years, with Vinod making this proposal (building on
> > ideas from others who also commented in the email thread):
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
> >
> > Pasting here for ease:
> >
> > On an unrelated note, offline I was pitching to a bunch of
> > contributors another idea to deal
> > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
> >
> > What this gains us is that
> >  - Trunk is always nearly stable or nearly ready for releases
> >  - We no longer have some code lying around in some branch (today’s
> > trunk) that is not releasable
> > because it gets mixed with other undesirable and incompatible changes.
> >  - This needs to be coupled with more discipline on individual
> > features - medium to to large
> > features are always worked upon in branches and get merged into trunk
> > (and a nearing release!)
> > when they are ready
> >  - All incompatible changes go into some sort of a trunk-incompat
> > branch and stay there till
> > we accumulate enough of those to warrant another major release.
> >
> > Regarding "trunk-incompat", since we're still in the alpha stage for
> 3.0.0,
> > there's no need for this branch yet. This aspect of Vinod's proposal was
> > still under a bit of discussion; Chris Douglas though we should cut a
> > branch-3 for the first 3.0.0 beta, which aligns with my original
> thinking.
> > This point doesn't necessarily need to be resolved now though, since
> again
> > we're still doing alphas.
> >
> > What we should get consensus on is the goal of keeping trunk stable, and
> > achieving that by doing more development on feature branches and being
> > judicious about merges. My sense from the Hadoop 3 email thread (and the
> > more recent one on the async API) is that people are generally in favor
> of
> > this.
> >
> > We're just about ready to do the first 3.0.0 alpha, so would greatly
> > appreciate everyone's timely response in this matter.
> >
> > Thanks,
> > Andrew
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Increased use of feature branches

Posted by Junping Du <jd...@hortonworks.com>.
Comparing with advantages, I believe the disadvantages of shipping any releases directly from trunk are more obvious and significant:
- A lot of commits (incompatible, risky, uncompleted feature, etc.) have to wait to commit to trunk or put into a separated branch that could delay feature development progress as additional vote process get involved even the feature is simple and harmless.

- These commits left in separated branches are isolated and get more chance to conflict each other, and more bugs could be involved due to conflicts and/or less eyes watching/bless on isolated branches.

- More unnecessary arguments/debates will happen on if some commits should land on trunk or a separated branch, just like what we have recently.

- Because branches will get increased massively, more community efforts will be spent on review & vote for branches merge that means less effort will be spent on other commits review given our review bandwidth is quite short so far.

- For small feature with only 1 or 2 commits, that need three +1 from PMCs will increase the bar largely for contributors who just start to contribute on Hadoop features but no such sufficient support.

Given these concerns, I am open to other options, like: proposed by Vinod or Chris, but rather than to release anything directly from trunk.

- This point doesn't necessarily need to be resolved now though, since again we're still doing alphas.
No. I think we have to settle down this first. Without a common agreed and transparent release process and branches in community, any release (alpha, beta) bits is only called a private release but not a official apache hadoop release (even alpha).


Thanks,

Junping
________________________________________
From: Karthik Kambatla <ka...@cloudera.com>
Sent: Friday, June 10, 2016 7:49 AM
To: Andrew Wang
Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Increased use of feature branches

Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Junping Du <jd...@hortonworks.com>.
Comparing with advantages, I believe the disadvantages of shipping any releases directly from trunk are more obvious and significant:
- A lot of commits (incompatible, risky, uncompleted feature, etc.) have to wait to commit to trunk or put into a separated branch that could delay feature development progress as additional vote process get involved even the feature is simple and harmless.

- These commits left in separated branches are isolated and get more chance to conflict each other, and more bugs could be involved due to conflicts and/or less eyes watching/bless on isolated branches.

- More unnecessary arguments/debates will happen on if some commits should land on trunk or a separated branch, just like what we have recently.

- Because branches will get increased massively, more community efforts will be spent on review & vote for branches merge that means less effort will be spent on other commits review given our review bandwidth is quite short so far.

- For small feature with only 1 or 2 commits, that need three +1 from PMCs will increase the bar largely for contributors who just start to contribute on Hadoop features but no such sufficient support.

Given these concerns, I am open to other options, like: proposed by Vinod or Chris, but rather than to release anything directly from trunk.

- This point doesn't necessarily need to be resolved now though, since again we're still doing alphas.
No. I think we have to settle down this first. Without a common agreed and transparent release process and branches in community, any release (alpha, beta) bits is only called a private release but not a official apache hadoop release (even alpha).


Thanks,

Junping
________________________________________
From: Karthik Kambatla <ka...@cloudera.com>
Sent: Friday, June 10, 2016 7:49 AM
To: Andrew Wang
Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Increased use of feature branches

Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Junping Du <jd...@hortonworks.com>.
Comparing with advantages, I believe the disadvantages of shipping any releases directly from trunk are more obvious and significant:
- A lot of commits (incompatible, risky, uncompleted feature, etc.) have to wait to commit to trunk or put into a separated branch that could delay feature development progress as additional vote process get involved even the feature is simple and harmless.

- These commits left in separated branches are isolated and get more chance to conflict each other, and more bugs could be involved due to conflicts and/or less eyes watching/bless on isolated branches.

- More unnecessary arguments/debates will happen on if some commits should land on trunk or a separated branch, just like what we have recently.

- Because branches will get increased massively, more community efforts will be spent on review & vote for branches merge that means less effort will be spent on other commits review given our review bandwidth is quite short so far.

- For small feature with only 1 or 2 commits, that need three +1 from PMCs will increase the bar largely for contributors who just start to contribute on Hadoop features but no such sufficient support.

Given these concerns, I am open to other options, like: proposed by Vinod or Chris, but rather than to release anything directly from trunk.

- This point doesn't necessarily need to be resolved now though, since again we're still doing alphas.
No. I think we have to settle down this first. Without a common agreed and transparent release process and branches in community, any release (alpha, beta) bits is only called a private release but not a official apache hadoop release (even alpha).


Thanks,

Junping
________________________________________
From: Karthik Kambatla <ka...@cloudera.com>
Sent: Friday, June 10, 2016 7:49 AM
To: Andrew Wang
Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Increased use of feature branches

Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Junping Du <jd...@hortonworks.com>.
Comparing with advantages, I believe the disadvantages of shipping any releases directly from trunk are more obvious and significant:
- A lot of commits (incompatible, risky, uncompleted feature, etc.) have to wait to commit to trunk or put into a separated branch that could delay feature development progress as additional vote process get involved even the feature is simple and harmless.

- These commits left in separated branches are isolated and get more chance to conflict each other, and more bugs could be involved due to conflicts and/or less eyes watching/bless on isolated branches.

- More unnecessary arguments/debates will happen on if some commits should land on trunk or a separated branch, just like what we have recently.

- Because branches will get increased massively, more community efforts will be spent on review & vote for branches merge that means less effort will be spent on other commits review given our review bandwidth is quite short so far.

- For small feature with only 1 or 2 commits, that need three +1 from PMCs will increase the bar largely for contributors who just start to contribute on Hadoop features but no such sufficient support.

Given these concerns, I am open to other options, like: proposed by Vinod or Chris, but rather than to release anything directly from trunk.

- This point doesn't necessarily need to be resolved now though, since again we're still doing alphas.
No. I think we have to settle down this first. Without a common agreed and transparent release process and branches in community, any release (alpha, beta) bits is only called a private release but not a official apache hadoop release (even alpha).


Thanks,

Junping
________________________________________
From: Karthik Kambatla <ka...@cloudera.com>
Sent: Friday, June 10, 2016 7:49 AM
To: Andrew Wang
Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Increased use of feature branches

Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>

Re: [DISCUSS] Increased use of feature branches

Posted by Karthik Kambatla <ka...@cloudera.com>.
Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <an...@cloudera.com>
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>