You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Eli Collins <el...@cloudera.com> on 2010/06/01 23:28:18 UTC

Re: [DISCUSSION] Proposal for making core Hadoop changes

Hey Jeff,

Blueprints (it's a launchpad thing) is more of an issue tracking
system (launchpad doesn't put features/enhancements in their bug
database), eg drizzle has lots of blueprints, and blueprints for
cleaning up code, adding config flags, etc. We'll use jira for that
kind of stuff, the HEP is for larger stuff that needs more upfront
discussion.

Thanks,
Eli

On Mon, May 31, 2010 at 10:16 AM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
> A far more lightweight example of multi-issue feature planning in an open
> source project comes from Drizzle and their "blueprints":
> https://blueprints.launchpad.net/drizzle.
>
> Each "spec" has a drafter, an approver, and an assignee; declares the other
> specs on which it depends; points to the relevant branches in the source
> tree and issues in the issue tracker; and has a priority, definition state,
> and implementation state.
>
> I don't know how it's working out for them in practice, but on paper it
> looks quite nice.
>
> On Wed, May 26, 2010 at 9:13 AM, Eli Collins <el...@cloudera.com> wrote:
>
>> > No, but I'd estimate the cost of merging at 1-2 days work a week just to
>> > pull in the code *and identify why the tests are failing*. Git may be
>> better
>> > at merging in changes, but if Hadoop doesn't work on my machine after the
>> > merge, I need to identify whether its my code, the merged code, some
>> machine
>> > quirk, etc. It's the testing that is the problem for me, not the
>> > merge effort. That's the Hadoop own tests any my own functional test
>> suites,
>> > the ones that bring up clusters and push work through. Those are the
>> > troublespots, as they do things that hadoop's own tests don't do, like as
>> > for all the JSP pages.
>>
>> I've lived off a git branch of common/hdfs for half a year with a big
>> uncommitted patch, it's no where near 1-2 days of effort per week to
>> merge in changes from trunk. If the tests are passing on trunk, and
>> they fail after your merge then those are real test failures due to
>> your change (and therefore should require effort). The issues with
>> your internal tests failing due to changes on trunk is the same
>> whether you merge or you just do an update - you have to update before
>> checking in the patch anyway - so that issue is about the state of
>> trunk when you merge or update, rather than about being on a branch.
>>
>> >
>> >> Might find the
>> >> following interesting:
>> >> http://incubator.apache.org/learn/rules-for-revolutionaries.html
>> >
>> > There's a long story behind JDD's paper, I'm glad you have read it, it
>> does
>> > lay out what is effectively the ASF process for effecting significant
>> change
>> > -but it doesn't imply that's the only process for having changes.
>> >
>>
>> Just to be clear I don't mean imply that branches are the only process
>> for making changes. Interesting that this is considered the effective
>> ASF process, it hasn't seemed to me that recent big features on hadoop
>> have used it, only one I'm aware of that was done on a branch was
>> append.
>>
>> > I think gradual evolution in trunk is good, it lets people play with
>> what's
>> > coming in. Having lots of separate branches and everyone's private
>> release
>> > being a merge of many patches that you choose is bad.
>>
>> Agreed.  Personally I don't think people should release from branches.
>> And in practice I don't think you'll see lots of branches, people can
>> and would still develop on trunk. Getting changes merged from a branch
>> back to trunk before the whole branch is merged is a good thing, the
>> whole branch may never be merged and that's OK too. Branches are a
>> mechanism, releases are policy.
>>
>> Thanks,
>> Eli
>>
>

Re: [DISCUSSION] Proposal for making core Hadoop changes

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Sure, each project can choose to use the framework in the way they see fit
on Launchpad. I wanted to call out their use of metadata as being
particularly nice. We may want to consider similar fields and applications
of those fields for HEPs.

On Tue, Jun 1, 2010 at 2:28 PM, Eli Collins <el...@cloudera.com> wrote:

> Hey Jeff,
>
> Blueprints (it's a launchpad thing) is more of an issue tracking
> system (launchpad doesn't put features/enhancements in their bug
> database), eg drizzle has lots of blueprints, and blueprints for
> cleaning up code, adding config flags, etc. We'll use jira for that
> kind of stuff, the HEP is for larger stuff that needs more upfront
> discussion.
>
> Thanks,
> Eli
>
> On Mon, May 31, 2010 at 10:16 AM, Jeff Hammerbacher <ha...@cloudera.com>
> wrote:
> > A far more lightweight example of multi-issue feature planning in an open
> > source project comes from Drizzle and their "blueprints":
> > https://blueprints.launchpad.net/drizzle.
> >
> > Each "spec" has a drafter, an approver, and an assignee; declares the
> other
> > specs on which it depends; points to the relevant branches in the source
> > tree and issues in the issue tracker; and has a priority, definition
> state,
> > and implementation state.
> >
> > I don't know how it's working out for them in practice, but on paper it
> > looks quite nice.
> >
> > On Wed, May 26, 2010 at 9:13 AM, Eli Collins <el...@cloudera.com> wrote:
> >
> >> > No, but I'd estimate the cost of merging at 1-2 days work a week just
> to
> >> > pull in the code *and identify why the tests are failing*. Git may be
> >> better
> >> > at merging in changes, but if Hadoop doesn't work on my machine after
> the
> >> > merge, I need to identify whether its my code, the merged code, some
> >> machine
> >> > quirk, etc. It's the testing that is the problem for me, not the
> >> > merge effort. That's the Hadoop own tests any my own functional test
> >> suites,
> >> > the ones that bring up clusters and push work through. Those are the
> >> > troublespots, as they do things that hadoop's own tests don't do, like
> as
> >> > for all the JSP pages.
> >>
> >> I've lived off a git branch of common/hdfs for half a year with a big
> >> uncommitted patch, it's no where near 1-2 days of effort per week to
> >> merge in changes from trunk. If the tests are passing on trunk, and
> >> they fail after your merge then those are real test failures due to
> >> your change (and therefore should require effort). The issues with
> >> your internal tests failing due to changes on trunk is the same
> >> whether you merge or you just do an update - you have to update before
> >> checking in the patch anyway - so that issue is about the state of
> >> trunk when you merge or update, rather than about being on a branch.
> >>
> >> >
> >> >> Might find the
> >> >> following interesting:
> >> >> http://incubator.apache.org/learn/rules-for-revolutionaries.html
> >> >
> >> > There's a long story behind JDD's paper, I'm glad you have read it, it
> >> does
> >> > lay out what is effectively the ASF process for effecting significant
> >> change
> >> > -but it doesn't imply that's the only process for having changes.
> >> >
> >>
> >> Just to be clear I don't mean imply that branches are the only process
> >> for making changes. Interesting that this is considered the effective
> >> ASF process, it hasn't seemed to me that recent big features on hadoop
> >> have used it, only one I'm aware of that was done on a branch was
> >> append.
> >>
> >> > I think gradual evolution in trunk is good, it lets people play with
> >> what's
> >> > coming in. Having lots of separate branches and everyone's private
> >> release
> >> > being a merge of many patches that you choose is bad.
> >>
> >> Agreed.  Personally I don't think people should release from branches.
> >> And in practice I don't think you'll see lots of branches, people can
> >> and would still develop on trunk. Getting changes merged from a branch
> >> back to trunk before the whole branch is merged is a good thing, the
> >> whole branch may never be merged and that's OK too. Branches are a
> >> mechanism, releases are policy.
> >>
> >> Thanks,
> >> Eli
> >>
> >
>