You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Owen O'Malley <om...@apache.org> on 2011/05/04 19:31:55 UTC

[VOTE] Release candidate 0.20.203.0-rc1

Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. 

The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/

Please download it, inspect it, compile it, and test it. Clearly, I'm +1.

-- Owen

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Hi Folks,

This is a release vote, let's stay focused.  On this thread I think appropriate responses are either 

+1 and some short commentary  (assuming you've tried it and it works)

or

-1 and some short commentary.  It would also be cool if you noted if you've tried it.

----

In the spirit of my feedback, I'll respond to this under another subject.

Thanks,

E14

On May 4, 2011, at 12:17 PM, Eli Collins wrote:

> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
>> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.
>> 
>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>> 
>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>> 
>> -- Owen
> 
> Hey Owen,
> 
> Thanks for incorporating all the feedback and additional changes. It's
> great that this release won't be a regression against our previous
> stable release.
> 
> I would like to call out that we are not just voting to adopt a
> particular release, we are starting a new version scheme for the
> project, doing new feature development on maintenance release branches
> (before trunk), and we're saying it's OK to release software that
> hasn't been reviewed by the community.
> 
> I'd like to hear from our development community not just that we want
> to do a release from this branch but that we want to adopt these other
> changes as well. Here's a summary of the major *remaining* issues and
> a recommendation on how to proceed:
> 
> 1. There are about ~50 changes that have jiras that are committed to
> the branch that are not yet in trunk. The next release (0.22) will be
> a regression against this release, with respect to these particular
> changes. Recomendation: we should get these changes in trunk before
> releasing so that new features do not show up in maintenace branches
> first.
> 
> 2. There are 192 patches that were committed to the branch without
> reference to any Jira in the commit message. Some of these may have
> already been forward ported, but it is very difficult to match them up
> and evaluate which ones have been committed. Some are troublesome,
> when spot checking the commits I found some that have been done by
> non-committers with no public review that introduced an apparent
> performance regressions (eg see HADOOP-7255). Recommendation: we
> should update the commit log to make sure there is a jira for each
> issue, and all changes have been reviewed/committed. This is the way
> we've always done releases.
> 
> 3. The new versioning scheme major.minor.point.X the new "X" component
> allows for new feature development on point releases. Recomendation:
> we should discuss in a separate thread whether we want to do new
> feature development on maintenance branches and if so to adopt this
> new version scheme.
> 
> Thanks,
> Eli


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.
>
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>
> -- Owen

Hey Owen,

Thanks for incorporating all the feedback and additional changes. It's
great that this release won't be a regression against our previous
stable release.

I would like to call out that we are not just voting to adopt a
particular release, we are starting a new version scheme for the
project, doing new feature development on maintenance release branches
(before trunk), and we're saying it's OK to release software that
hasn't been reviewed by the community.

I'd like to hear from our development community not just that we want
to do a release from this branch but that we want to adopt these other
changes as well. Here's a summary of the major *remaining* issues and
a recommendation on how to proceed:

1. There are about ~50 changes that have jiras that are committed to
the branch that are not yet in trunk. The next release (0.22) will be
a regression against this release, with respect to these particular
changes. Recomendation: we should get these changes in trunk before
releasing so that new features do not show up in maintenace branches
first.

2. There are 192 patches that were committed to the branch without
reference to any Jira in the commit message. Some of these may have
already been forward ported, but it is very difficult to match them up
and evaluate which ones have been committed. Some are troublesome,
when spot checking the commits I found some that have been done by
non-committers with no public review that introduced an apparent
performance regressions (eg see HADOOP-7255). Recommendation: we
should update the commit log to make sure there is a jira for each
issue, and all changes have been reviewed/committed. This is the way
we've always done releases.

3. The new versioning scheme major.minor.point.X the new "X" component
allows for new feature development on point releases. Recomendation:
we should discuss in a separate thread whether we want to do new
feature development on maintenance branches and if so to adopt this
new version scheme.

Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
> Entertaining concerns like a one-to-one
> correspondence between commits and JIRA issues is bizarre in this
> context.

It's not about whether there's a jira, it's about whether the code was
reviewed.  We think code should be reviewed and vote on by the
community before releasing it. That's how we've always rolled.

Everyone agrees releases are too infrequent, that's not an excuse for
steam rolling the community.

Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Chris Douglas <cd...@apache.org>.
I'm +1 on releasing rc1. The signature and hashes match on the
artifact, ran some of the more aggressive MR tests. Reviewed changes
from rc0.

It looks like we need a FAQ for this release, if only to prevent the
same questions from being asked and answered across different threads
and lists. Reservations, regressions, and pending work can also be
documented there.

Right now, Apache Hadoop releases are not recommended by its
community. Instead, not only our end users, but other Apache projects
run Cloudera's distribution. From all those wearing their Apache hat,
I would like to see more effort directed toward a release that we can
recommend soon and less time spent compiling tasks to delay it.

Releasing this will complicate the documented process. However, that
process *has not produced a usable release* for the last two out of
six years. This is failure. Entertaining concerns like a one-to-one
correspondence between commits and JIRA issues is bizarre in this
context. Let's find a way to make progress instead of tossing
pharisaic accusations of illegitimacy. -C

On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.
>
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>
> -- Owen

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Mahadev Konar <ma...@apache.org>.
+1 for the release.

I downloaded the release, verified checksums, built and deployed. Ran
randomwriter jobs on it.

Everything passes.

-- 
thanks
mahadev
@mahadevkonar

On Wed, May 4, 2011 at 3:05 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
> On May 4, 2011, at 10:31 AM, Owen O'Malley wrote:
>
>> Here's an updated release candidate for 0.20.203.0. I've incorporated the
>> feedback and included all of the patches from 0.20.2, which is the last
>> stable release. I also fixed the eclipse-plugin problem.
>>
>> The candidate is at:
>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>>
>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>>
>
> +1
>
> Downloaded release, checked checksums, built, deployed single-node cluster.
>
> Arun
>
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On May 4, 2011, at 10:31 AM, Owen O'Malley wrote:

> Here's an updated release candidate for 0.20.203.0. I've  
> incorporated the feedback and included all of the patches from  
> 0.20.2, which is the last stable release. I also fixed the eclipse- 
> plugin problem.
>
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
> Please download it, inspect it, compile it, and test it. Clearly,  
> I'm +1.
>

+1

Downloaded release, checked checksums, built, deployed single-node  
cluster.

Arun


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Todd Lipcon <to...@cloudera.com>.
-1 for the same reasons I outlined in my email yesterday. This is not a
community artifact following the community's processes, and thus should not
be an official release until those issues are addressed.

On Wed, May 4, 2011 at 3:17 PM, Doug Cutting <cu...@apache.org> wrote:

> -1
>
> This candidate has lots of patches that are not in trunk, potentially
> adding regressions to 0.22 and 0.23.  This should be addressed before we
> release from 0.20-security.  We should also not move to four-component
> version numbering.  A release from the 0.20-security branch should
> perhaps be called 0.20.100.
>
> Doug
>
> On 05/04/2011 10:31 AM, Owen O'Malley wrote:
> > Here's an updated release candidate for 0.20.203.0. I've incorporated the
> feedback and included all of the patches from 0.20.2, which is the last
> stable release. I also fixed the eclipse-plugin problem.
> >
> > The candidate is at:
> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
> >
> > Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
> >
> > -- Owen
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Doug Cutting <cu...@apache.org>.
-1

This candidate has lots of patches that are not in trunk, potentially
adding regressions to 0.22 and 0.23.  This should be addressed before we
release from 0.20-security.  We should also not move to four-component
version numbering.  A release from the 0.20-security branch should
perhaps be called 0.20.100.

Doug

On 05/04/2011 10:31 AM, Owen O'Malley wrote:
> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. 
> 
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
> 
> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
> 
> -- Owen

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Konstantin Boudnik <co...@apache.org>.
On Wed, May 4, 2011 at 15:06, Suresh Srinivas <su...@yahoo-inc.com> wrote:
> Eli,
>
> How many of these patches that you find troublesome are in CDH already?

How is that relevant to the release vote and discrepancies listed in
Eli's email?

> Regards,
> Suresh
>
>
> On 5/4/11 3:03 PM, "Eli Collins" <el...@cloudera.com> wrote:
>
>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
>>> Here's an updated release candidate for 0.20.203.0. I've incorporated the
>>> feedback and included all of the patches from 0.20.2, which is the last
>>> stable release. I also fixed the eclipse-plugin problem.
>>>
>>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>>>
>>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>>>
>>> -- Owen
>>
>> While rc2 is an improvement on rc1, I am -1 on this particular rc.  Rationale:
>>
>> This rc contains many patches not yet committed to trunk. This would
>> cause the next major release (0.22) to be a feature regression against
>> our latest stable release (203), were 0.22 released soon.
>>
>> This rc contains many patches not yet reviewed by the community via
>> the normal process (jira, patch against trunk, merge to a release
>> branch). I think we should respect the existing community process that
>> has been used for all previous releases.
>>
>> This rc introduces a new development and braching model (new feature
>> development outside trunk) and Hadoop versioning scheme without
>> sufficient discussion or proposal of these changes with the community.
>>
>> We should establish new process before the release, a release is not
>> the appropriate mechanism for changing our review and development
>> process or versioning .
>>
>> I do support a release from branch-0.20-security that follows the
>> existing, established community process.
>>
>> Thanks,
>> Eli
>
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Lars Francke <la...@gmail.com>.
> "   BZ-4182948. Add statistics logging to Fred for better visibility into
> startup time costs. (Matt Foley)"
> - I believe I saw a note from Matt on the JIRA yesterday about this feature,
> where he decided that the version done in 203 wasn't a good approach, and
> it's done differently in trunk (not sure if done yet).

Could anyone elaborate on what this "Fred" is that has been coming up
on these threads a few times now?

And is there something like a RELEASE NOTES draft that I could look
over? I try to follow these mailing lists as best as I can but I have
lost track of all the branches and features being worked on and I
can't imagine I'm the only one.

It would be nice to get an overview of what this release is all about
where work is being done etc.

Thanks!

Cheers,
Lars

Re: [DISCUSSION] development process of Hadoop

Posted by Eric Yang <ey...@yahoo-inc.com>.
Instead of depending on review then commit practice being the norm,  Hadoop committers can probably take advantage of the svn jira plugin.  People can actively commit to svn as long as a jira number is reference in the commit.  The commit message will show up in JIRA and leave a trail of activities for reference.  Future committers can refer back to the code history to see why the code is written the way it did.  It is less error prone to maintain patch increments.  This seems like a solvable problem by tweaking the behaviors of the hadoop committers.

Regards,
Eric

On 5/4/11 11:31 PM, "Eli Collins" <el...@cloudera.com> wrote:

On Wed, May 4, 2011 at 7:39 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
> If we reflect back and see how the development community end up in its current state for Hadoop.  There are development rapidly happening and tested in all kind of organizations.  However, Hadoop committers are only committing code that are interested by the sponsored companies.  People are coding defensively to ensuring only self serving patches would be committed, and helping others and merging problem are always prioritized secondary.  While the world demand agility, the "review then commit" process is preventing progress from happening.  Committers are afraid to commit patches because review hasn't took place.  By the time patch is reviewed, it does not apply properly.  People end up having to generate multiple version of patches to ensure the code can be applied.  The large lag time between patch generation and reviewed is taking significant toll on the community and progress.
>
> Yahoo have a great team of developers who improves Hadoop at faster pace with its own fork of the source code.  The reason that Yahoo was able to achieve faster improvement with features was due to the ability to use source code repository tools properly.  Unfortunate for Yahoo, their source code repository was not Apache svn trunk.  I applause Owen and Arun's effort for men powering and backward/forward porting the changes between yahoo github and Apache svn.  There might be some jiras that needs to be merged into Hadoop 0.20.203 branch to ensure the linage is correct.  The community should offer to help with detail listing of what is missing rather than vote -1 without concise reasoning of what is missing.
>
> JIRA is meant as a discussion and collaboration tool, but hadoop community intends to use it as the source code version control system with men powered diff maker.  While spending time in the incubator with other project, the mentors have explained that it is not ASF's philosophy to use "review then commit".

ASF's policy is that projects make this decision for themselves:
http://www.apache.org/dev/project-creation.html

The Hadoop bylaws specify that code changes are lazy consensus, ie you
need a +1 from a committer. Technically the code doesn't have to be
reviewed before committing it, that's just been the norm.

I don't think jira is technically required either, it's just been the
norm. The vote for the patch has to happen on the lists, that happens
as a side effect of jira traffic going to the dev lists.

> Hadoop community should rethink if the community is using the right tools for the right task.
>
> Use JIRA, if there is large feature set that requires brain storming, and developers should have the ability to make small incremental changes without RTC.  This will ensure developers help each other rather than policing each other.
>
> Any thoughts?
>

I think you can move quickly with RTC or CTR, I've worked on RTC
projects that have moved quickly. It requires people dedicate
bandwidth to reviewing changes. If you do want all your code reviewed
(at some point) then you're ultimately limited by review bandwidth,
with either RTC or CTR.

The time it takes to file a jira is normally insignificant compared to
the time to create and test a change. The idea with using jira is that
you propose/discuss a change before creating code. You could do that
on the lists too. I agree using just a code review tool for small
stuff would be faster, eg things that don't require a bug #, release
note, etc.

Thanks,
Eli


Re: [DISCUSSION] development process of Hadoop

Posted by Eli Collins <el...@cloudera.com>.
On Wed, May 4, 2011 at 7:39 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
> If we reflect back and see how the development community end up in its current state for Hadoop.  There are development rapidly happening and tested in all kind of organizations.  However, Hadoop committers are only committing code that are interested by the sponsored companies.  People are coding defensively to ensuring only self serving patches would be committed, and helping others and merging problem are always prioritized secondary.  While the world demand agility, the "review then commit" process is preventing progress from happening.  Committers are afraid to commit patches because review hasn't took place.  By the time patch is reviewed, it does not apply properly.  People end up having to generate multiple version of patches to ensure the code can be applied.  The large lag time between patch generation and reviewed is taking significant toll on the community and progress.
>
> Yahoo have a great team of developers who improves Hadoop at faster pace with its own fork of the source code.  The reason that Yahoo was able to achieve faster improvement with features was due to the ability to use source code repository tools properly.  Unfortunate for Yahoo, their source code repository was not Apache svn trunk.  I applause Owen and Arun's effort for men powering and backward/forward porting the changes between yahoo github and Apache svn.  There might be some jiras that needs to be merged into Hadoop 0.20.203 branch to ensure the linage is correct.  The community should offer to help with detail listing of what is missing rather than vote -1 without concise reasoning of what is missing.
>
> JIRA is meant as a discussion and collaboration tool, but hadoop community intends to use it as the source code version control system with men powered diff maker.  While spending time in the incubator with other project, the mentors have explained that it is not ASF's philosophy to use "review then commit".

ASF's policy is that projects make this decision for themselves:
http://www.apache.org/dev/project-creation.html

The Hadoop bylaws specify that code changes are lazy consensus, ie you
need a +1 from a committer. Technically the code doesn't have to be
reviewed before committing it, that's just been the norm.

I don't think jira is technically required either, it's just been the
norm. The vote for the patch has to happen on the lists, that happens
as a side effect of jira traffic going to the dev lists.

> Hadoop community should rethink if the community is using the right tools for the right task.
>
> Use JIRA, if there is large feature set that requires brain storming, and developers should have the ability to make small incremental changes without RTC.  This will ensure developers help each other rather than policing each other.
>
> Any thoughts?
>

I think you can move quickly with RTC or CTR, I've worked on RTC
projects that have moved quickly. It requires people dedicate
bandwidth to reviewing changes. If you do want all your code reviewed
(at some point) then you're ultimately limited by review bandwidth,
with either RTC or CTR.

The time it takes to file a jira is normally insignificant compared to
the time to create and test a change. The idea with using jira is that
you propose/discuss a change before creating code. You could do that
on the lists too. I agree using just a code review tool for small
stuff would be faster, eg things that don't require a bug #, release
note, etc.

Thanks,
Eli

[DISCUSSION] development process of Hadoop

Posted by Eric Yang <ey...@yahoo-inc.com>.
If we reflect back and see how the development community end up in its current state for Hadoop.  There are development rapidly happening and tested in all kind of organizations.  However, Hadoop committers are only committing code that are interested by the sponsored companies.  People are coding defensively to ensuring only self serving patches would be committed, and helping others and merging problem are always prioritized secondary.  While the world demand agility, the "review then commit" process is preventing progress from happening.  Committers are afraid to commit patches because review hasn't took place.  By the time patch is reviewed, it does not apply properly.  People end up having to generate multiple version of patches to ensure the code can be applied.  The large lag time between patch generation and reviewed is taking significant toll on the community and progress.

Yahoo have a great team of developers who improves Hadoop at faster pace with its own fork of the source code.  The reason that Yahoo was able to achieve faster improvement with features was due to the ability to use source code repository tools properly.  Unfortunate for Yahoo, their source code repository was not Apache svn trunk.  I applause Owen and Arun's effort for men powering and backward/forward porting the changes between yahoo github and Apache svn.  There might be some jiras that needs to be merged into Hadoop 0.20.203 branch to ensure the linage is correct.  The community should offer to help with detail listing of what is missing rather than vote -1 without concise reasoning of what is missing.

JIRA is meant as a discussion and collaboration tool, but hadoop community intends to use it as the source code version control system with men powered diff maker.  While spending time in the incubator with other project, the mentors have explained that it is not ASF's philosophy to use "review then commit".  Hadoop community should rethink if the community is using the right tools for the right task.

Use JIRA, if there is large feature set that requires brain storming, and developers should have the ability to make small incremental changes without RTC.  This will ensure developers help each other rather than policing each other.

Any thoughts?

Regards,
Eric

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
On Wed, May 4, 2011 at 6:18 PM, Eric Baldeschwieler
<er...@yahoo-inc.com> wrote:
> Ok. I'll bite.
>
> The point of a vote is to learn what everyone thinks. So far we have learned:
>
> 1 - the team that is trying to contribute code and do a release thinks it is ready.
>
> 2 - Cloudera does not think the release is a good idea.
>

I don't think that's true.  There's a difference between not
supporting a given rc and not supporting a release from this branch in
general.

With both of my hats on, I want code to be reviewed before being
release, I want releases to not regress against previous releases, I
don't want the next major release to regress against a stable release,
I want the community to discuss new version schemes and development
models vs adopting them by accident just because we voted on a
particular release.

Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Ok. I'll bite.

The point of a vote is to learn what everyone thinks. So far we have learned:

1 - the team that is trying to contribute code and do a release thinks it is ready. 

2 - Cloudera does not think the release is a good idea. 

No more talk between the Team contributing code and cloudera will educate us further  Let's hear from the rest of the community. 

In parallel on other threads, let's work out how to address concerns. That will be useful however the vote goes. I promise to continue to work with everyone to help drive releases. 

We've called a vote, so let it proceed. That is how apache works. 

Thanks!

---
E14 - typing on glass

PS this is my last comment on this thread. Start new ones if you are not casting a vote. 

On May 4, 2011, at 5:45 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

> I tend to agree. Changing release model of Apache Hadoop train isn't
> something that should be done in a hassle or as a part of release
> voting.
> 
> If these questions aren't addressed - let's postpone the vote and
> discuss all the complications or implications until they sorted out or
> the consensus/compromise is reached.
> 
> Cos
> 
> On Wed, May 4, 2011 at 17:39, Eli Collins <el...@cloudera.com> wrote:
>> The point is that these discussion should be sorted out, ie you don't
>> change your development and release model on a release VOTE thread,
>> you change it on a DISCUSSION thread.
>> 
>> Ie before we release this we should understand what that means. What
>> is being proposed is not just another release from branch-0.20 or
>> branch-0.22.
>> 
>> Thanks,
>> Eli
>> 
>> On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar <ma...@apache.org> wrote:
>>> Eli,
>>>  I think the intent from the email was to just vote on this thread,
>>> which I agree with.
>>>  Discussions should be done in a separate threads. Hopefully we can
>>> all stick to just voting!
>>> 
>>> thanks
>>> mahadev
>>> 
>>> On Wed, May 4, 2011 at 5:22 PM, Eli Collins <el...@cloudera.com> wrote:
>>>> Good suggestion, it would be helpful to hash out the issues around
>>>> compatibility, feature branches, version numbers, how to contribute at
>>>> Apache before putting up new votes that would be helpful, ie the vote
>>>> would go much smoother if all the issues with the previous vote were
>>>> addressed before starting a new one.
>>>> 
>>>> Thanks,
>>>> Eli
>>>> 
>>>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler
>>>> <er...@yahoo-inc.com> wrote:
>>>>> Hi folks,
>>>>> 
>>>>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.
>>>>> 
>>>>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.
>>>>> 
>>>>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.
>>>>> 
>>>>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> ---
>>>>> E14 - typing on glass
>>>>> 
>>>>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>>> 
>>>>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>>>>>> 
>>>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>>>>>>> 
>>>>>>> The list seems highly inaccurate.  Checked the first few N/A items.  All
>>>>>>>> are
>>>>>>>> false positives.
>>>>>>>> 
>>>>>>>> 
>>>>>>> Also,  can you please provide a list on features which are not related to
>>>>>>> gridmix benchmarks or herriot tests?
>>>>>>> 
>>>>>> 
>>>>>> Here are a few I quickly pulled up:
>>>>>> MAPREDUCE-2316 (docs for improved capacity scheduler)
>>>>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)
>>>>>> 
>>>>>> "   BZ-4182948. Add statistics logging to Fred for better visibility into
>>>>>> startup time costs. (Matt Foley)"
>>>>>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,
>>>>>> where he decided that the version done in 203 wasn't a good approach, and
>>>>>> it's done differently in trunk (not sure if done yet).
>>>>>> 
>>>>>> MAPREDUCE-2364 (important bug fix for localization)
>>>>>> - in fact most of localization is different in this branch compared to trunk
>>>>>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
>>>>>> the "yahoo-merge" branch,.
>>>>>> 
>>>>>> "New cunters for FileInput/OutputFormat. New Counter
>>>>>>        MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
>>>>>> 4217546"
>>>>>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
>>>>>> committed.
>>>>>> 
>>>>>> - MAPREDUCE-1904, committed without JIRA as:
>>>>>> "        . Reducing new Path(), RawFileStatus() creation overhead in
>>>>>> LocalDirAllocator"
>>>>>> not in trunk
>>>>>> 
>>>>>> +    BZ4101537 .  When a queue is built without any access rights we explain
>>>>>> the
>>>>>> +    problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
>>>>>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite
>>>>>> the JIRA there being resolved (based on looking at QueueManager in trunk)
>>>>>> 
>>>>>> "        . Remove unnecessary reference to user configuration from
>>>>>> TaskDistributedCacheManager causing memory leaks"
>>>>>> Not in trunk, not sure which JIRA it might be.. probably part of 2178.
>>>>>> 
>>>>>> Major new feature: MAPREDUCE-323 - very large rework of how job history
>>>>>> files are managed
>>>>>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
>>>>>> probably will be attacked by different JIRAs
>>>>>> Major new ops-visible feature: "metrics2" system
>>>>>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
>>>>>> a separate server
>>>>>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends
>>>>>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)
>>>>>> 
>>>>>> I have code to work on, so I won't keep going, but this is from looking at
>>>>>> the last couple months of 203.
>>>>>> 
>>>>>> -Todd
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> thanks
>>> mahadev
>>> @mahadevkonar
>>> 
>> 

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Konstantin Boudnik <co...@apache.org>.
I tend to agree. Changing release model of Apache Hadoop train isn't
something that should be done in a hassle or as a part of release
voting.

If these questions aren't addressed - let's postpone the vote and
discuss all the complications or implications until they sorted out or
the consensus/compromise is reached.

Cos

On Wed, May 4, 2011 at 17:39, Eli Collins <el...@cloudera.com> wrote:
> The point is that these discussion should be sorted out, ie you don't
> change your development and release model on a release VOTE thread,
> you change it on a DISCUSSION thread.
>
> Ie before we release this we should understand what that means. What
> is being proposed is not just another release from branch-0.20 or
> branch-0.22.
>
> Thanks,
> Eli
>
> On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar <ma...@apache.org> wrote:
>> Eli,
>>  I think the intent from the email was to just vote on this thread,
>> which I agree with.
>>  Discussions should be done in a separate threads. Hopefully we can
>> all stick to just voting!
>>
>> thanks
>> mahadev
>>
>> On Wed, May 4, 2011 at 5:22 PM, Eli Collins <el...@cloudera.com> wrote:
>>> Good suggestion, it would be helpful to hash out the issues around
>>> compatibility, feature branches, version numbers, how to contribute at
>>> Apache before putting up new votes that would be helpful, ie the vote
>>> would go much smoother if all the issues with the previous vote were
>>> addressed before starting a new one.
>>>
>>> Thanks,
>>> Eli
>>>
>>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler
>>> <er...@yahoo-inc.com> wrote:
>>>> Hi folks,
>>>>
>>>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.
>>>>
>>>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.
>>>>
>>>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.
>>>>
>>>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!
>>>>
>>>> Thanks,
>>>>
>>>> ---
>>>> E14 - typing on glass
>>>>
>>>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>>
>>>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>>>>>
>>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>>>>>>
>>>>>> The list seems highly inaccurate.  Checked the first few N/A items.  All
>>>>>>> are
>>>>>>> false positives.
>>>>>>>
>>>>>>>
>>>>>> Also,  can you please provide a list on features which are not related to
>>>>>> gridmix benchmarks or herriot tests?
>>>>>>
>>>>>
>>>>> Here are a few I quickly pulled up:
>>>>> MAPREDUCE-2316 (docs for improved capacity scheduler)
>>>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)
>>>>>
>>>>> "   BZ-4182948. Add statistics logging to Fred for better visibility into
>>>>> startup time costs. (Matt Foley)"
>>>>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,
>>>>> where he decided that the version done in 203 wasn't a good approach, and
>>>>> it's done differently in trunk (not sure if done yet).
>>>>>
>>>>> MAPREDUCE-2364 (important bug fix for localization)
>>>>> - in fact most of localization is different in this branch compared to trunk
>>>>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
>>>>> the "yahoo-merge" branch,.
>>>>>
>>>>> "New cunters for FileInput/OutputFormat. New Counter
>>>>>        MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
>>>>> 4217546"
>>>>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
>>>>> committed.
>>>>>
>>>>> - MAPREDUCE-1904, committed without JIRA as:
>>>>> "        . Reducing new Path(), RawFileStatus() creation overhead in
>>>>> LocalDirAllocator"
>>>>> not in trunk
>>>>>
>>>>> +    BZ4101537 .  When a queue is built without any access rights we explain
>>>>> the
>>>>> +    problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
>>>>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite
>>>>> the JIRA there being resolved (based on looking at QueueManager in trunk)
>>>>>
>>>>> "        . Remove unnecessary reference to user configuration from
>>>>> TaskDistributedCacheManager causing memory leaks"
>>>>> Not in trunk, not sure which JIRA it might be.. probably part of 2178.
>>>>>
>>>>> Major new feature: MAPREDUCE-323 - very large rework of how job history
>>>>> files are managed
>>>>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
>>>>> probably will be attacked by different JIRAs
>>>>> Major new ops-visible feature: "metrics2" system
>>>>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
>>>>> a separate server
>>>>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends
>>>>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)
>>>>>
>>>>> I have code to work on, so I won't keep going, but this is from looking at
>>>>> the last couple months of 203.
>>>>>
>>>>> -Todd
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>
>>>
>>
>>
>>
>> --
>> thanks
>> mahadev
>> @mahadevkonar
>>
>

RE: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Jane Chen <Ja...@marklogic.com>.
Agree.  As a new comer, I had trouble figuring out which version to adopt -- 0.20.2 vs. 0.21. This new release candidate seems to add more confusion to general users.

Jane

-----Original Message-----
From: Matei Zaharia [mailto:matei@eecs.berkeley.edu] 
Sent: Wednesday, May 04, 2011 11:21 PM
To: general@hadoop.apache.org
Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1

I'm not going to cast a vote, but I'm concerned about this for the same reasons Eli brought up -- in particular, compatibility with 0.22. I'm an author of several patches that have gone into 0.21 and trunk, only to stay on hiatus for 2 years because the project hasn't made a stable release since 0.20. (Today, many of these patches are being used through CDH, which is great, but it would be nice to see them in an Apache release too.) This push of features into 0.20.203 makes a widely used 0.22 seem even more distant. Can we at least get a confirmation that these changes will be included in 0.22, as well as a timeline?

To support a vibrant developer community, Apache Hadoop should not just be a mechanism for Yahoo and Cloudera to publish patches. It should include a well-defined process for smaller third-party contributors to push changes that will make it into a stable release within a reasonable time horizon. The lack of such a process has been a major cause for the slowdown in the project in my perspective.

Matei



On May 4, 2011, at 10:47 PM, Eric Sammer wrote:

> (non-binding) -1 for similar reasons to what Jeff and others have laid out,
> and certainly if we're going to change the development process as a side
> effect of a release vote.
> 
> On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher <ha...@cloudera.com>wrote:
> 
>> -1.
>> 
>> As Roy says, "whatever gets released will define the new norm by which
>> policies are assumed", and I certainly don't want this project to change
>> its
>> norms to accommodate bad practices. In particular, Eli presented three very
>> reasonable technical objections to this release. To summarize:
>> 
>> 1) Let's get the JIRAs that are going into this release into trunk first.
>> 2) Let's create a JIRA for each issue in the release.
>> 3) Let's stick to the release numbering conventions established for this
>> project.
>> 
>> I know the folks at Yahoo! are all professional engineers and done
>> tremendous work to help get the project to this point. There's no doubt in
>> my mind they understand the validity of the above three technical
>> objections. In fact, many of them helped author our "How to Contribute"
>> page, which established these conventions:
>> wiki.apache.org/hadoop/HowToContribute. We develop new features against
>> trunk, we create JIRAs for each issue, we review code before it goes into
>> trunk, and we only update old releases with bug fixes.
>> 
>> I couldn't be more excited to have Yahoo! once again doing development in
>> Apache, and I hope that we can work together to get the work that you've
>> done in this branch into one of our upcoming feature releases.
>> 
>> I hope those who voted +1 before Roy clarified what a release vote will
>> mean
>> for future project norms will reconsider their votes.
>> 
>> While there may be many competing agendas in this community, we all wish to
>> see Apache Hadoop releases of the highest quality. Changing our norms to
>> allow huge, unreviewed patch sets introducing new features into a past
>> release is a step in the wrong direction.
>> 
>> With a little bit of elbow grease, we can get the work done in this branch
>> into trunk, get 0.22 out the door, and be ready for a great 0.23 release.
>> 
>> Later,
>> Jeff
>> 
>> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley <nd...@mac.com> wrote:
>> 
>>> I'm really not sure yet how to vote here.  I was going to vote +1 for
>> what
>>> I was told by a number of Yahoo! committers would be a one time release
>> as
>>> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended
>>> their own distribution.  Clearly this code was not all developed as a
>>> community process, but I was going to support a one time release of what
>>> they had developed in exclusion.
>>> 
>>> Then I read Roy's email, which confused me.  We would he or I or anyone
>>> else support this release setting precedent or policy since it would walk
>>> all over our bylaws, community process, and the consensus nature of our
>>> foundation?  This release vote is a lazy majority of the PMC, but other
>>> decisions rolled up in this are supposed to be lazy majority of active
>>> committers or, in the case of code changes, a lazy consensus.  Setting
>>> policy by this release means any sufficiently large group of committers
>>> could go off and develop on their own and then commit it to a branch and
>>> call a release.
>>> 
>>> Furthermore, it now sounds like this is possibly the first in a line of
>>> feature releases off this branch.  Bug fixes releases, sure.  But feature
>>> releases?  What's wrong with trunk?
>>> 
>>> Nige
>>> 
>>> On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:
>>> 
>>>> On May 4, 2011, at 5:39 PM, Eli Collins wrote:
>>>> 
>>>>> The point is that these discussion should be sorted out, ie you don't
>>>>> change your development and release model on a release VOTE thread,
>>>>> you change it on a DISCUSSION thread.
>>>> 
>>>> That is no different than saying you have a right to veto a
>>>> release until the issue is addressed, which you don't have.
>>>> 
>>>> A release vote is a majority decision.  If the majority
>>>> decides to release, then whatever gets released will define
>>>> the new norm by which policies are assumed.  If not released,
>>>> then I suggest collaborating more on the policies before
>>>> trying to vote again.
>>>> 
>>>> Either way, we don't hold up a vote for the sake of a
>>>> policy discussion because voting is a more efficient
>>>> means of discovering if the policy really matters.
>>>> 
>>>> ....Roy
>>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.
I'm not going to cast a vote, but I'm concerned about this for the same reasons Eli brought up -- in particular, compatibility with 0.22. I'm an author of several patches that have gone into 0.21 and trunk, only to stay on hiatus for 2 years because the project hasn't made a stable release since 0.20. (Today, many of these patches are being used through CDH, which is great, but it would be nice to see them in an Apache release too.) This push of features into 0.20.203 makes a widely used 0.22 seem even more distant. Can we at least get a confirmation that these changes will be included in 0.22, as well as a timeline?

To support a vibrant developer community, Apache Hadoop should not just be a mechanism for Yahoo and Cloudera to publish patches. It should include a well-defined process for smaller third-party contributors to push changes that will make it into a stable release within a reasonable time horizon. The lack of such a process has been a major cause for the slowdown in the project in my perspective.

Matei



On May 4, 2011, at 10:47 PM, Eric Sammer wrote:

> (non-binding) -1 for similar reasons to what Jeff and others have laid out,
> and certainly if we're going to change the development process as a side
> effect of a release vote.
> 
> On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher <ha...@cloudera.com>wrote:
> 
>> -1.
>> 
>> As Roy says, "whatever gets released will define the new norm by which
>> policies are assumed", and I certainly don't want this project to change
>> its
>> norms to accommodate bad practices. In particular, Eli presented three very
>> reasonable technical objections to this release. To summarize:
>> 
>> 1) Let's get the JIRAs that are going into this release into trunk first.
>> 2) Let's create a JIRA for each issue in the release.
>> 3) Let's stick to the release numbering conventions established for this
>> project.
>> 
>> I know the folks at Yahoo! are all professional engineers and done
>> tremendous work to help get the project to this point. There's no doubt in
>> my mind they understand the validity of the above three technical
>> objections. In fact, many of them helped author our "How to Contribute"
>> page, which established these conventions:
>> wiki.apache.org/hadoop/HowToContribute. We develop new features against
>> trunk, we create JIRAs for each issue, we review code before it goes into
>> trunk, and we only update old releases with bug fixes.
>> 
>> I couldn't be more excited to have Yahoo! once again doing development in
>> Apache, and I hope that we can work together to get the work that you've
>> done in this branch into one of our upcoming feature releases.
>> 
>> I hope those who voted +1 before Roy clarified what a release vote will
>> mean
>> for future project norms will reconsider their votes.
>> 
>> While there may be many competing agendas in this community, we all wish to
>> see Apache Hadoop releases of the highest quality. Changing our norms to
>> allow huge, unreviewed patch sets introducing new features into a past
>> release is a step in the wrong direction.
>> 
>> With a little bit of elbow grease, we can get the work done in this branch
>> into trunk, get 0.22 out the door, and be ready for a great 0.23 release.
>> 
>> Later,
>> Jeff
>> 
>> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley <nd...@mac.com> wrote:
>> 
>>> I'm really not sure yet how to vote here.  I was going to vote +1 for
>> what
>>> I was told by a number of Yahoo! committers would be a one time release
>> as
>>> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended
>>> their own distribution.  Clearly this code was not all developed as a
>>> community process, but I was going to support a one time release of what
>>> they had developed in exclusion.
>>> 
>>> Then I read Roy's email, which confused me.  We would he or I or anyone
>>> else support this release setting precedent or policy since it would walk
>>> all over our bylaws, community process, and the consensus nature of our
>>> foundation?  This release vote is a lazy majority of the PMC, but other
>>> decisions rolled up in this are supposed to be lazy majority of active
>>> committers or, in the case of code changes, a lazy consensus.  Setting
>>> policy by this release means any sufficiently large group of committers
>>> could go off and develop on their own and then commit it to a branch and
>>> call a release.
>>> 
>>> Furthermore, it now sounds like this is possibly the first in a line of
>>> feature releases off this branch.  Bug fixes releases, sure.  But feature
>>> releases?  What's wrong with trunk?
>>> 
>>> Nige
>>> 
>>> On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:
>>> 
>>>> On May 4, 2011, at 5:39 PM, Eli Collins wrote:
>>>> 
>>>>> The point is that these discussion should be sorted out, ie you don't
>>>>> change your development and release model on a release VOTE thread,
>>>>> you change it on a DISCUSSION thread.
>>>> 
>>>> That is no different than saying you have a right to veto a
>>>> release until the issue is addressed, which you don't have.
>>>> 
>>>> A release vote is a majority decision.  If the majority
>>>> decides to release, then whatever gets released will define
>>>> the new norm by which policies are assumed.  If not released,
>>>> then I suggest collaborating more on the policies before
>>>> trying to vote again.
>>>> 
>>>> Either way, we don't hold up a vote for the sake of a
>>>> policy discussion because voting is a more efficient
>>>> means of discovering if the policy really matters.
>>>> 
>>>> ....Roy
>>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eric Sammer <es...@cloudera.com>.
(non-binding) -1 for similar reasons to what Jeff and others have laid out,
and certainly if we're going to change the development process as a side
effect of a release vote.

On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher <ha...@cloudera.com>wrote:

> -1.
>
> As Roy says, "whatever gets released will define the new norm by which
> policies are assumed", and I certainly don't want this project to change
> its
> norms to accommodate bad practices. In particular, Eli presented three very
> reasonable technical objections to this release. To summarize:
>
> 1) Let's get the JIRAs that are going into this release into trunk first.
> 2) Let's create a JIRA for each issue in the release.
> 3) Let's stick to the release numbering conventions established for this
> project.
>
> I know the folks at Yahoo! are all professional engineers and done
> tremendous work to help get the project to this point. There's no doubt in
> my mind they understand the validity of the above three technical
> objections. In fact, many of them helped author our "How to Contribute"
> page, which established these conventions:
> wiki.apache.org/hadoop/HowToContribute. We develop new features against
> trunk, we create JIRAs for each issue, we review code before it goes into
> trunk, and we only update old releases with bug fixes.
>
> I couldn't be more excited to have Yahoo! once again doing development in
> Apache, and I hope that we can work together to get the work that you've
> done in this branch into one of our upcoming feature releases.
>
> I hope those who voted +1 before Roy clarified what a release vote will
> mean
> for future project norms will reconsider their votes.
>
> While there may be many competing agendas in this community, we all wish to
> see Apache Hadoop releases of the highest quality. Changing our norms to
> allow huge, unreviewed patch sets introducing new features into a past
> release is a step in the wrong direction.
>
> With a little bit of elbow grease, we can get the work done in this branch
> into trunk, get 0.22 out the door, and be ready for a great 0.23 release.
>
> Later,
> Jeff
>
> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley <nd...@mac.com> wrote:
>
> > I'm really not sure yet how to vote here.  I was going to vote +1 for
> what
> > I was told by a number of Yahoo! committers would be a one time release
> as
> > Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended
> > their own distribution.  Clearly this code was not all developed as a
> > community process, but I was going to support a one time release of what
> > they had developed in exclusion.
> >
> > Then I read Roy's email, which confused me.  We would he or I or anyone
> > else support this release setting precedent or policy since it would walk
> > all over our bylaws, community process, and the consensus nature of our
> > foundation?  This release vote is a lazy majority of the PMC, but other
> > decisions rolled up in this are supposed to be lazy majority of active
> > committers or, in the case of code changes, a lazy consensus.  Setting
> > policy by this release means any sufficiently large group of committers
> > could go off and develop on their own and then commit it to a branch and
> > call a release.
> >
> > Furthermore, it now sounds like this is possibly the first in a line of
> > feature releases off this branch.  Bug fixes releases, sure.  But feature
> > releases?  What's wrong with trunk?
> >
> > Nige
> >
> > On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:
> >
> > > On May 4, 2011, at 5:39 PM, Eli Collins wrote:
> > >
> > >> The point is that these discussion should be sorted out, ie you don't
> > >> change your development and release model on a release VOTE thread,
> > >> you change it on a DISCUSSION thread.
> > >
> > > That is no different than saying you have a right to veto a
> > > release until the issue is addressed, which you don't have.
> > >
> > > A release vote is a majority decision.  If the majority
> > > decides to release, then whatever gets released will define
> > > the new norm by which policies are assumed.  If not released,
> > > then I suggest collaborating more on the policies before
> > > trying to vote again.
> > >
> > > Either way, we don't hold up a vote for the sake of a
> > > policy discussion because voting is a more efficient
> > > means of discovering if the policy really matters.
> > >
> > > ....Roy
> > >
> >
> >
>



-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
-1.

As Roy says, "whatever gets released will define the new norm by which
policies are assumed", and I certainly don't want this project to change its
norms to accommodate bad practices. In particular, Eli presented three very
reasonable technical objections to this release. To summarize:

1) Let's get the JIRAs that are going into this release into trunk first.
2) Let's create a JIRA for each issue in the release.
3) Let's stick to the release numbering conventions established for this
project.

I know the folks at Yahoo! are all professional engineers and done
tremendous work to help get the project to this point. There's no doubt in
my mind they understand the validity of the above three technical
objections. In fact, many of them helped author our "How to Contribute"
page, which established these conventions:
wiki.apache.org/hadoop/HowToContribute. We develop new features against
trunk, we create JIRAs for each issue, we review code before it goes into
trunk, and we only update old releases with bug fixes.

I couldn't be more excited to have Yahoo! once again doing development in
Apache, and I hope that we can work together to get the work that you've
done in this branch into one of our upcoming feature releases.

I hope those who voted +1 before Roy clarified what a release vote will mean
for future project norms will reconsider their votes.

While there may be many competing agendas in this community, we all wish to
see Apache Hadoop releases of the highest quality. Changing our norms to
allow huge, unreviewed patch sets introducing new features into a past
release is a step in the wrong direction.

With a little bit of elbow grease, we can get the work done in this branch
into trunk, get 0.22 out the door, and be ready for a great 0.23 release.

Later,
Jeff

On Wed, May 4, 2011 at 9:17 PM, Nigel Daley <nd...@mac.com> wrote:

> I'm really not sure yet how to vote here.  I was going to vote +1 for what
> I was told by a number of Yahoo! committers would be a one time release as
> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended
> their own distribution.  Clearly this code was not all developed as a
> community process, but I was going to support a one time release of what
> they had developed in exclusion.
>
> Then I read Roy's email, which confused me.  We would he or I or anyone
> else support this release setting precedent or policy since it would walk
> all over our bylaws, community process, and the consensus nature of our
> foundation?  This release vote is a lazy majority of the PMC, but other
> decisions rolled up in this are supposed to be lazy majority of active
> committers or, in the case of code changes, a lazy consensus.  Setting
> policy by this release means any sufficiently large group of committers
> could go off and develop on their own and then commit it to a branch and
> call a release.
>
> Furthermore, it now sounds like this is possibly the first in a line of
> feature releases off this branch.  Bug fixes releases, sure.  But feature
> releases?  What's wrong with trunk?
>
> Nige
>
> On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:
>
> > On May 4, 2011, at 5:39 PM, Eli Collins wrote:
> >
> >> The point is that these discussion should be sorted out, ie you don't
> >> change your development and release model on a release VOTE thread,
> >> you change it on a DISCUSSION thread.
> >
> > That is no different than saying you have a right to veto a
> > release until the issue is addressed, which you don't have.
> >
> > A release vote is a majority decision.  If the majority
> > decides to release, then whatever gets released will define
> > the new norm by which policies are assumed.  If not released,
> > then I suggest collaborating more on the policies before
> > trying to vote again.
> >
> > Either way, we don't hold up a vote for the sake of a
> > policy discussion because voting is a more efficient
> > means of discovering if the policy really matters.
> >
> > ....Roy
> >
>
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Nigel Daley <nd...@mac.com>.
I'm really not sure yet how to vote here.  I was going to vote +1 for what I was told by a number of Yahoo! committers would be a one time release as Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended their own distribution.  Clearly this code was not all developed as a community process, but I was going to support a one time release of what they had developed in exclusion.

Then I read Roy's email, which confused me.  We would he or I or anyone else support this release setting precedent or policy since it would walk all over our bylaws, community process, and the consensus nature of our foundation?  This release vote is a lazy majority of the PMC, but other decisions rolled up in this are supposed to be lazy majority of active committers or, in the case of code changes, a lazy consensus.  Setting policy by this release means any sufficiently large group of committers could go off and develop on their own and then commit it to a branch and call a release.

Furthermore, it now sounds like this is possibly the first in a line of feature releases off this branch.  Bug fixes releases, sure.  But feature releases?  What's wrong with trunk?

Nige

On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote:

> On May 4, 2011, at 5:39 PM, Eli Collins wrote:
> 
>> The point is that these discussion should be sorted out, ie you don't
>> change your development and release model on a release VOTE thread,
>> you change it on a DISCUSSION thread.
> 
> That is no different than saying you have a right to veto a
> release until the issue is addressed, which you don't have.
> 
> A release vote is a majority decision.  If the majority
> decides to release, then whatever gets released will define
> the new norm by which policies are assumed.  If not released,
> then I suggest collaborating more on the policies before
> trying to vote again.
> 
> Either way, we don't hold up a vote for the sake of a
> policy discussion because voting is a more efficient
> means of discovering if the policy really matters.
> 
> ....Roy
> 


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Dhruba Borthakur <dh...@gmail.com>.
+1.

I downloaded the bits, compiled and ran unit tests. Also, looked at the
source code to some extent. Looks good.

-dhruba

On Wed, May 4, 2011 at 6:56 PM, Roy T. Fielding <fi...@gbiv.com> wrote:

> On May 4, 2011, at 5:39 PM, Eli Collins wrote:
>
> > The point is that these discussion should be sorted out, ie you don't
> > change your development and release model on a release VOTE thread,
> > you change it on a DISCUSSION thread.
>
> That is no different than saying you have a right to veto a
> release until the issue is addressed, which you don't have.
>
> A release vote is a majority decision.  If the majority
> decides to release, then whatever gets released will define
> the new norm by which policies are assumed.  If not released,
> then I suggest collaborating more on the policies before
> trying to vote again.
>
> Either way, we don't hold up a vote for the sake of a
> policy discussion because voting is a more efficient
> means of discovering if the policy really matters.
>
> ....Roy
>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On May 4, 2011, at 5:39 PM, Eli Collins wrote:

> The point is that these discussion should be sorted out, ie you don't
> change your development and release model on a release VOTE thread,
> you change it on a DISCUSSION thread.

That is no different than saying you have a right to veto a
release until the issue is addressed, which you don't have.

A release vote is a majority decision.  If the majority
decides to release, then whatever gets released will define
the new norm by which policies are assumed.  If not released,
then I suggest collaborating more on the policies before
trying to vote again.

Either way, we don't hold up a vote for the sake of a
policy discussion because voting is a more efficient
means of discovering if the policy really matters.

....Roy


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
The point is that these discussion should be sorted out, ie you don't
change your development and release model on a release VOTE thread,
you change it on a DISCUSSION thread.

Ie before we release this we should understand what that means. What
is being proposed is not just another release from branch-0.20 or
branch-0.22.

Thanks,
Eli

On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar <ma...@apache.org> wrote:
> Eli,
>  I think the intent from the email was to just vote on this thread,
> which I agree with.
>  Discussions should be done in a separate threads. Hopefully we can
> all stick to just voting!
>
> thanks
> mahadev
>
> On Wed, May 4, 2011 at 5:22 PM, Eli Collins <el...@cloudera.com> wrote:
>> Good suggestion, it would be helpful to hash out the issues around
>> compatibility, feature branches, version numbers, how to contribute at
>> Apache before putting up new votes that would be helpful, ie the vote
>> would go much smoother if all the issues with the previous vote were
>> addressed before starting a new one.
>>
>> Thanks,
>> Eli
>>
>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler
>> <er...@yahoo-inc.com> wrote:
>>> Hi folks,
>>>
>>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.
>>>
>>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.
>>>
>>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.
>>>
>>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!
>>>
>>> Thanks,
>>>
>>> ---
>>> E14 - typing on glass
>>>
>>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>
>>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>>>>
>>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>>>>>
>>>>> The list seems highly inaccurate.  Checked the first few N/A items.  All
>>>>>> are
>>>>>> false positives.
>>>>>>
>>>>>>
>>>>> Also,  can you please provide a list on features which are not related to
>>>>> gridmix benchmarks or herriot tests?
>>>>>
>>>>
>>>> Here are a few I quickly pulled up:
>>>> MAPREDUCE-2316 (docs for improved capacity scheduler)
>>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)
>>>>
>>>> "   BZ-4182948. Add statistics logging to Fred for better visibility into
>>>> startup time costs. (Matt Foley)"
>>>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,
>>>> where he decided that the version done in 203 wasn't a good approach, and
>>>> it's done differently in trunk (not sure if done yet).
>>>>
>>>> MAPREDUCE-2364 (important bug fix for localization)
>>>> - in fact most of localization is different in this branch compared to trunk
>>>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
>>>> the "yahoo-merge" branch,.
>>>>
>>>> "New cunters for FileInput/OutputFormat. New Counter
>>>>        MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
>>>> 4217546"
>>>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
>>>> committed.
>>>>
>>>> - MAPREDUCE-1904, committed without JIRA as:
>>>> "        . Reducing new Path(), RawFileStatus() creation overhead in
>>>> LocalDirAllocator"
>>>> not in trunk
>>>>
>>>> +    BZ4101537 .  When a queue is built without any access rights we explain
>>>> the
>>>> +    problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
>>>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite
>>>> the JIRA there being resolved (based on looking at QueueManager in trunk)
>>>>
>>>> "        . Remove unnecessary reference to user configuration from
>>>> TaskDistributedCacheManager causing memory leaks"
>>>> Not in trunk, not sure which JIRA it might be.. probably part of 2178.
>>>>
>>>> Major new feature: MAPREDUCE-323 - very large rework of how job history
>>>> files are managed
>>>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
>>>> probably will be attacked by different JIRAs
>>>> Major new ops-visible feature: "metrics2" system
>>>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
>>>> a separate server
>>>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends
>>>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)
>>>>
>>>> I have code to work on, so I won't keep going, but this is from looking at
>>>> the last couple months of 203.
>>>>
>>>> -Todd
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>
>>
>
>
>
> --
> thanks
> mahadev
> @mahadevkonar
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Mahadev Konar <ma...@apache.org>.
Eli,
  I think the intent from the email was to just vote on this thread,
which I agree with.
 Discussions should be done in a separate threads. Hopefully we can
all stick to just voting!

thanks
mahadev

On Wed, May 4, 2011 at 5:22 PM, Eli Collins <el...@cloudera.com> wrote:
> Good suggestion, it would be helpful to hash out the issues around
> compatibility, feature branches, version numbers, how to contribute at
> Apache before putting up new votes that would be helpful, ie the vote
> would go much smoother if all the issues with the previous vote were
> addressed before starting a new one.
>
> Thanks,
> Eli
>
> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler
> <er...@yahoo-inc.com> wrote:
>> Hi folks,
>>
>> Let's stay focused. Let's take the other threads onto other threads. This is a vote.
>>
>> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.
>>
>> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.
>>
>> If you've voted, you don't need to comment further on this thread, no matter what company you work for!
>>
>> Thanks,
>>
>> ---
>> E14 - typing on glass
>>
>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>
>>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>>>
>>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>>>>
>>>> The list seems highly inaccurate.  Checked the first few N/A items.  All
>>>>> are
>>>>> false positives.
>>>>>
>>>>>
>>>> Also,  can you please provide a list on features which are not related to
>>>> gridmix benchmarks or herriot tests?
>>>>
>>>
>>> Here are a few I quickly pulled up:
>>> MAPREDUCE-2316 (docs for improved capacity scheduler)
>>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)
>>>
>>> "   BZ-4182948. Add statistics logging to Fred for better visibility into
>>> startup time costs. (Matt Foley)"
>>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,
>>> where he decided that the version done in 203 wasn't a good approach, and
>>> it's done differently in trunk (not sure if done yet).
>>>
>>> MAPREDUCE-2364 (important bug fix for localization)
>>> - in fact most of localization is different in this branch compared to trunk
>>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
>>> the "yahoo-merge" branch,.
>>>
>>> "New cunters for FileInput/OutputFormat. New Counter
>>>        MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
>>> 4217546"
>>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
>>> committed.
>>>
>>> - MAPREDUCE-1904, committed without JIRA as:
>>> "        . Reducing new Path(), RawFileStatus() creation overhead in
>>> LocalDirAllocator"
>>> not in trunk
>>>
>>> +    BZ4101537 .  When a queue is built without any access rights we explain
>>> the
>>> +    problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
>>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite
>>> the JIRA there being resolved (based on looking at QueueManager in trunk)
>>>
>>> "        . Remove unnecessary reference to user configuration from
>>> TaskDistributedCacheManager causing memory leaks"
>>> Not in trunk, not sure which JIRA it might be.. probably part of 2178.
>>>
>>> Major new feature: MAPREDUCE-323 - very large rework of how job history
>>> files are managed
>>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
>>> probably will be attacked by different JIRAs
>>> Major new ops-visible feature: "metrics2" system
>>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
>>> a separate server
>>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends
>>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)
>>>
>>> I have code to work on, so I won't keep going, but this is from looking at
>>> the last couple months of 203.
>>>
>>> -Todd
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>



-- 
thanks
mahadev
@mahadevkonar

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
Good suggestion, it would be helpful to hash out the issues around
compatibility, feature branches, version numbers, how to contribute at
Apache before putting up new votes that would be helpful, ie the vote
would go much smoother if all the issues with the previous vote were
addressed before starting a new one.

Thanks,
Eli

On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler
<er...@yahoo-inc.com> wrote:
> Hi folks,
>
> Let's stay focused. Let's take the other threads onto other threads. This is a vote.
>
> To the extent naming is a problem, let's take that to a thread and find an acceptable proposal.
>
> To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work.
>
> If you've voted, you don't need to comment further on this thread, no matter what company you work for!
>
> Thanks,
>
> ---
> E14 - typing on glass
>
> On May 4, 2011, at 4:46 PM, "Todd Lipcon" <to...@cloudera.com> wrote:
>
>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>>
>>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>>>
>>> The list seems highly inaccurate.  Checked the first few N/A items.  All
>>>> are
>>>> false positives.
>>>>
>>>>
>>> Also,  can you please provide a list on features which are not related to
>>> gridmix benchmarks or herriot tests?
>>>
>>
>> Here are a few I quickly pulled up:
>> MAPREDUCE-2316 (docs for improved capacity scheduler)
>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)
>>
>> "   BZ-4182948. Add statistics logging to Fred for better visibility into
>> startup time costs. (Matt Foley)"
>> - I believe I saw a note from Matt on the JIRA yesterday about this feature,
>> where he decided that the version done in 203 wasn't a good approach, and
>> it's done differently in trunk (not sure if done yet).
>>
>> MAPREDUCE-2364 (important bug fix for localization)
>> - in fact most of localization is different in this branch compared to trunk
>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
>> the "yahoo-merge" branch,.
>>
>> "New cunters for FileInput/OutputFormat. New Counter
>>        MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
>> 4217546"
>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
>> committed.
>>
>> - MAPREDUCE-1904, committed without JIRA as:
>> "        . Reducing new Path(), RawFileStatus() creation overhead in
>> LocalDirAllocator"
>> not in trunk
>>
>> +    BZ4101537 .  When a queue is built without any access rights we explain
>> the
>> +    problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite
>> the JIRA there being resolved (based on looking at QueueManager in trunk)
>>
>> "        . Remove unnecessary reference to user configuration from
>> TaskDistributedCacheManager causing memory leaks"
>> Not in trunk, not sure which JIRA it might be.. probably part of 2178.
>>
>> Major new feature: MAPREDUCE-323 - very large rework of how job history
>> files are managed
>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
>> probably will be attacked by different JIRAs
>> Major new ops-visible feature: "metrics2" system
>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
>> a separate server
>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends
>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)
>>
>> I have code to work on, so I won't keep going, but this is from looking at
>> the last couple months of 203.
>>
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Hi folks,

Let's stay focused. Let's take the other threads onto other threads. This is a vote. 

To the extent naming is a problem, let's take that to a thread and find an acceptable proposal. 

To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work. 

If you've voted, you don't need to comment further on this thread, no matter what company you work for!

Thanks,

---
E14 - typing on glass

On May 4, 2011, at 4:46 PM, "Todd Lipcon" <to...@cloudera.com> wrote:

> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
> 
>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>> 
>> The list seems highly inaccurate.  Checked the first few N/A items.  All
>>> are
>>> false positives.
>>> 
>>> 
>> Also,  can you please provide a list on features which are not related to
>> gridmix benchmarks or herriot tests?
>> 
> 
> Here are a few I quickly pulled up:
> MAPREDUCE-2316 (docs for improved capacity scheduler)
> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)
> 
> "   BZ-4182948. Add statistics logging to Fred for better visibility into
> startup time costs. (Matt Foley)"
> - I believe I saw a note from Matt on the JIRA yesterday about this feature,
> where he decided that the version done in 203 wasn't a good approach, and
> it's done differently in trunk (not sure if done yet).
> 
> MAPREDUCE-2364 (important bug fix for localization)
> - in fact most of localization is different in this branch compared to trunk
> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
> the "yahoo-merge" branch,.
> 
> "New cunters for FileInput/OutputFormat. New Counter
>        MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
> 4217546"
> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
> committed.
> 
> - MAPREDUCE-1904, committed without JIRA as:
> "        . Reducing new Path(), RawFileStatus() creation overhead in
> LocalDirAllocator"
> not in trunk
> 
> +    BZ4101537 .  When a queue is built without any access rights we explain
> the
> +    problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
> seems to be on trunk as MR-2411, but not committed, best I can tell, despite
> the JIRA there being resolved (based on looking at QueueManager in trunk)
> 
> "        . Remove unnecessary reference to user configuration from
> TaskDistributedCacheManager causing memory leaks"
> Not in trunk, not sure which JIRA it might be.. probably part of 2178.
> 
> Major new feature: MAPREDUCE-323 - very large rework of how job history
> files are managed
> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
> probably will be attacked by different JIRAs
> Major new ops-visible feature: "metrics2" system
> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
> a separate server
> Major new set of user-visible configurations: MAPREDUCE-1943 and friends
> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)
> 
> I have code to work on, so I won't keep going, but this is from looking at
> the last couple months of 203.
> 
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On May 4, 2011, at 4:44 PM, Todd Lipcon wrote:

> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com>  
> wrote:
>
>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>>
>> The list seems highly inaccurate.  Checked the first few N/A  
>> items.  All
>>> are
>>> false positives.
>>>
>>>
>> Also,  can you please provide a list on features which are not  
>> related to
>> gridmix benchmarks or herriot tests?
>>
>
> Here are a few I quickly pulled up:

So, it's around 10? Approximately? Also, the ones you put up were  
reviewed via jira.

Please note that several of the ones you are pointing out are already  
in y-merge branch which is nearly trunk. including MR-2378 as you  
pointed out.

Thanks for the list, I'll ensure we work on forward porting them.

Arun


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:
>
>  The list seems highly inaccurate.  Checked the first few N/A items.  All
>> are
>> false positives.
>>
>>
> Also,  can you please provide a list on features which are not related to
> gridmix benchmarks or herriot tests?
>

Here are a few I quickly pulled up:
MAPREDUCE-2316 (docs for improved capacity scheduler)
MAPREDUCE-2355 (adds new config for heartbeat dampening in MR)

"   BZ-4182948. Add statistics logging to Fred for better visibility into
startup time costs. (Matt Foley)"
- I believe I saw a note from Matt on the JIRA yesterday about this feature,
where he decided that the version done in 203 wasn't a good approach, and
it's done differently in trunk (not sure if done yet).

MAPREDUCE-2364 (important bug fix for localization)
- in fact most of localization is different in this branch compared to trunk
due to inclusion of MAPREDUCE-2378, the trunk version of which is still on
the "yahoo-merge" branch,.

"New cunters for FileInput/OutputFormat. New Counter
        MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543,
4217546"
- not sure which JIRA this is, I think I've seen a JIRA for trunk, but not
committed.

- MAPREDUCE-1904, committed without JIRA as:
"        . Reducing new Path(), RawFileStatus() creation overhead in
LocalDirAllocator"
not in trunk

+    BZ4101537 .  When a queue is built without any access rights we explain
the
+    problem.  (dking, rvw ramach)  [attachment of 2010-11-24]
seems to be on trunk as MR-2411, but not committed, best I can tell, despite
the JIRA there being resolved (based on looking at QueueManager in trunk)

"        . Remove unnecessary reference to user configuration from
TaskDistributedCacheManager causing memory leaks"
Not in trunk, not sure which JIRA it might be.. probably part of 2178.

Major new feature: MAPREDUCE-323 - very large rework of how job history
files are managed
Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though
probably will be attacked by different JIRAs
Major new ops-visible feature: "metrics2" system
Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from
a separate server
Major new set of user-visible configurations: MAPREDUCE-1943 and friends
which implement new limits in MapReduce (eg MAPREDUCE-1872 as well)

I have code to work on, so I won't keep going, but this is from looking at
the last couple months of 203.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote:

> The list seems highly inaccurate.  Checked the first few N/A items.   
> All are
> false positives.
>

Also,  can you please provide a list on features which are not related  
to gridmix benchmarks or herriot tests?

Please remember, and I have said this on list and off-list, that many  
of the forward ports obviated the need for multiple patches which show  
up in the commit logs.

thanks,
Arun

> < HADOOP-6304 N/A -- fixed in trunk via HADOOP-7110 (Todd, it was  
> fixed by you.
> Forgot?)
> < HADOOP-6598 N/A -- moved to HADOOP-6763 and committed to trunk
> < HADOOP-6653 N/A -- not applicable in trunk
> < HADOOP-6716 N/A -- as part of HADOOP-6815 which was committed to  
> trunk
> < HADOOP-6718 N/A -- Incorporated in HADOOP-6706 for 0.22.
> < HADOOP-6776 N/A -- Tom White said "This is fixed in trunk, so can  
> be closed."
>
> Regards,
> Nicholas
>
>
>
>
>
> ________________________________
> From: Eli Collins <el...@cloudera.com>
> To: general@hadoop.apache.org
> Sent: Wed, May 4, 2011 3:36:16 PM
> Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1
>
> On Wed, May 4, 2011 at 3:29 PM, Jakob Homan <jg...@gmail.com> wrote:
>> @Eli >> This rc contains many patches not yet committed to trunk.
>> If you've compiled this list, can you post it?
>>
>
> Here's the list Todd posted yesterday:
>
> http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTimKKbkuPCz61TU=8-nO8Z6PYhf7Sg@mail.gmail.com%3E
>
>
> Thanks,
> Eli


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
On Wed, May 4, 2011 at 4:09 PM, Tsz Wo (Nicholas), Sze
<s2...@yahoo.com> wrote:
> The list seems highly inaccurate.  Checked the first few N/A items.  All are
> false positives.

Yes, that's why those are marked N/A ie "Not applicable". Check out
the non N/A ones.

Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by "Tsz Wo (Nicholas), Sze" <s2...@yahoo.com>.
The list seems highly inaccurate.  Checked the first few N/A items.  All are 
false positives.

< HADOOP-6304 N/A -- fixed in trunk via HADOOP-7110 (Todd, it was fixed by you. 
Forgot?)
< HADOOP-6598 N/A -- moved to HADOOP-6763 and committed to trunk
< HADOOP-6653 N/A -- not applicable in trunk
< HADOOP-6716 N/A -- as part of HADOOP-6815 which was committed to trunk
< HADOOP-6718 N/A -- Incorporated in HADOOP-6706 for 0.22.
< HADOOP-6776 N/A -- Tom White said "This is fixed in trunk, so can be closed."

Regards,
Nicholas





________________________________
From: Eli Collins <el...@cloudera.com>
To: general@hadoop.apache.org
Sent: Wed, May 4, 2011 3:36:16 PM
Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1

On Wed, May 4, 2011 at 3:29 PM, Jakob Homan <jg...@gmail.com> wrote:
> @Eli >> This rc contains many patches not yet committed to trunk.
> If you've compiled this list, can you post it?
>

Here's the list Todd posted yesterday:

http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTimKKbkuPCz61TU=8-nO8Z6PYhf7Sg@mail.gmail.com%3E


Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
On Wed, May 4, 2011 at 3:29 PM, Jakob Homan <jg...@gmail.com> wrote:
> @Eli >> This rc contains many patches not yet committed to trunk.
> If you've compiled this list, can you post it?
>

Here's the list Todd posted yesterday:

http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTimKKbkuPCz61TU=8-nO8Z6PYhf7Sg@mail.gmail.com%3E

Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Jakob Homan <jg...@gmail.com>.
@Eli >> This rc contains many patches not yet committed to trunk.
If you've compiled this list, can you post it?

On Wed, May 4, 2011 at 3:24 PM, Eli Collins <el...@cloudera.com> wrote:
> With my Cloudera hat on..
>
> When we went through the 10x and 20x patches we only pulled a subset
> of them, primarily for security and the general improvements that we
> thought were good.  We found both incompatible changes and some
> sketchy changes that we did not pull in from a quality perspective.
> There is a big difference between a patch set that's acceptable for
> Yahoo!'s user base and one that's a more general artifact.
>
> When we evaluated the YDH patch sets we were using that frame of mind.
>  I'm now looking it in terms of an Apache release. And the place to
> review changes for an Apache release is on jira.
>
> CDH3 is based on the latest stable Apache release (20.2) so it doesn't
> regress against it.  I'm nervous about rebasing future releases on 203
> because of the compatibility and quality implications.
>
> Thanks,
> Eli
>
>
> On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <su...@yahoo-inc.com> wrote:
>> Eli,
>>
>> How many of these patches that you find troublesome are in CDH already?
>>
>> Regards,
>> Suresh
>>
>>
>> On 5/4/11 3:03 PM, "Eli Collins" <el...@cloudera.com> wrote:
>>
>>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
>>>> Here's an updated release candidate for 0.20.203.0. I've incorporated the
>>>> feedback and included all of the patches from 0.20.2, which is the last
>>>> stable release. I also fixed the eclipse-plugin problem.
>>>>
>>>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>>>>
>>>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>>>>
>>>> -- Owen
>>>
>>> While rc2 is an improvement on rc1, I am -1 on this particular rc.  Rationale:
>>>
>>> This rc contains many patches not yet committed to trunk. This would
>>> cause the next major release (0.22) to be a feature regression against
>>> our latest stable release (203), were 0.22 released soon.
>>>
>>> This rc contains many patches not yet reviewed by the community via
>>> the normal process (jira, patch against trunk, merge to a release
>>> branch). I think we should respect the existing community process that
>>> has been used for all previous releases.
>>>
>>> This rc introduces a new development and braching model (new feature
>>> development outside trunk) and Hadoop versioning scheme without
>>> sufficient discussion or proposal of these changes with the community.
>>>
>>> We should establish new process before the release, a release is not
>>> the appropriate mechanism for changing our review and development
>>> process or versioning .
>>>
>>> I do support a release from branch-0.20-security that follows the
>>> existing, established community process.
>>>
>>> Thanks,
>>> Eli
>>
>>
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 6, 2011, at 11:18 PM, Milind Bhandarkar wrote:

> Allen, there are per job limits, and per user limits in this branch. (So,
> max capacity of -1 is for the queue, but within the queue, the per user
> limits come into picture.) If I remember right, the defaults were based on
> a certain assumption of how many users would be on a queue simultaneously.
> Of course this would need to be set in the site-specific config.

	Yes, I'm aware of the changes.  What I'm basically saying is that even with those new limits taken into consideration, the math doesn't seem to hold up.



Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Milind Bhandarkar <mb...@linkedin.com>.
Allen, there are per job limits, and per user limits in this branch. (So,
max capacity of -1 is for the queue, but within the queue, the per user
limits come into picture.) If I remember right, the defaults were based on
a certain assumption of how many users would be on a queue simultaneously.
Of course this would need to be set in the site-specific config.

- milind

-- 
Milind Bhandarkar
mbhandarkar@linkedin.com
+1-650-776-3167






On 5/6/11 7:31 PM, "Allen Wittenauer" <aw...@linkedin.com> wrote:

>
>On May 6, 2011, at 6:43 PM, Todd Papaioannou wrote:
>
>> Allen,
>> 
>> Can you provide some more details into what issues you are seeing with
>>the
>> capacity scheduler? Is it just the docs don't match the code, or are you
>> seeing real issues with job scheduling?
>
>	Jobs are definitely not getting the maximum number of task slots they
>should be getting.  I'm suspecting a bug with how max-limit of -1 queues
>are handled.  I'll actually be in the office next week to try and see if
>I can figure out where things are going haywire.
>
>	[I filed a bug on this a few weeks ago before I left for vacation.  It
>was basically ignored.]


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 6, 2011, at 6:43 PM, Todd Papaioannou wrote:

> Allen,
> 
> Can you provide some more details into what issues you are seeing with the
> capacity scheduler? Is it just the docs don't match the code, or are you
> seeing real issues with job scheduling?

	Jobs are definitely not getting the maximum number of task slots they should be getting.  I'm suspecting a bug with how max-limit of -1 queues are handled.  I'll actually be in the office next week to try and see if I can figure out where things are going haywire.

	[I filed a bug on this a few weeks ago before I left for vacation.  It was basically ignored.]

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Todd Papaioannou <to...@yahoo-inc.com>.
Allen,

Can you provide some more details into what issues you are seeing with the
capacity scheduler? Is it just the docs don't match the code, or are you
seeing real issues with job scheduling?

Thanks

ToddP

On 5/6/11 5:49 PM, "Allen Wittenauer" <aw...@linkedin.com> wrote:

>
>On May 5, 2011, at 1:56 PM, Jakob Homan wrote:
>
>> +1
>> 
>> Downloaded, verified, tested on single node cluster to my
>> satisfaction.  We've also brought this release up on a sizable cluster
>> and checked its basic sanity.
>
>	All of you people doing single node tests are missing stuff.  For
>example, the regression in how the secondary namenode addr stuff works
>vs. 0.20.  
>
>	By far, the biggest problem we've found is that the capacity scheduler
>documentation doesn't actually match what the code does.  I have a hunch
>that the unit tests were written/change to match the outcome, rather than
>test what is supposed to happen.  For us, this breakage makes it unusable
>out of the box and we'll likely either go back to our (relatively stable)
>backport of 0.21's cap sched, try to fix the 0.20.203 code, or maybe even
>switch to a completely different scheduler.


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 5, 2011, at 1:56 PM, Jakob Homan wrote:

> +1
> 
> Downloaded, verified, tested on single node cluster to my
> satisfaction.  We've also brought this release up on a sizable cluster
> and checked its basic sanity.

	All of you people doing single node tests are missing stuff.  For example, the regression in how the secondary namenode addr stuff works vs. 0.20.  

	By far, the biggest problem we've found is that the capacity scheduler documentation doesn't actually match what the code does.  I have a hunch that the unit tests were written/change to match the outcome, rather than test what is supposed to happen.  For us, this breakage makes it unusable out of the box and we'll likely either go back to our (relatively stable) backport of 0.21's cap sched, try to fix the 0.20.203 code, or maybe even switch to a completely different scheduler.

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Jakob Homan <jg...@gmail.com>.
+1

Downloaded, verified, tested on single node cluster to my
satisfaction.  We've also brought this release up on a sizable cluster
and checked its basic sanity.

Regardless of the difficult path we've had over the past year, this is
a good chunk of code to get out to the community.  I'd much rather
explain a convoluted numbering system or what is or isn't in this
release than continue to apologize for having no release at all.

-Jakob


On Thu, May 5, 2011 at 12:02 PM, Suresh Srinivas <su...@yahoo-inc.com> wrote:
>
> +1
>
> Downloaded the release, validated checksums, deployed a single-node
> cluster, and ran some HDFS sanity tests, Web UI tests and mapreduce
> examples.
>
> Regards,
> Suresh
>
>
>
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Suresh Srinivas <su...@yahoo-inc.com>.
+1

Downloaded the release, validated checksums, deployed a single-node
cluster, and ran some HDFS sanity tests, Web UI tests and mapreduce
examples.

Regards,
Suresh




Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
> Your -1 vote essentially blocks the changes that are already available in
> CDH to be available from Apache open source!

As Eric mentioned, this thread is about an Apache release, not CDH.

My -1 vote does not block these changes from being released via
Apache. You can not veto a release. Releases are lazy majority, the
release is only blocked if there are more -1 votes than +1 votes.

If these changes are contributed on jira, discussed and reviewed, and
committed to trunk I'm happy to support the release.  There's a big
difference between asking that a release respect the Apache community
process and blocking it. If you want to get the release out how about
contributing the work via the normal means so the community can review
it like we review all other code changes.

Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Suresh Srinivas <su...@yahoo-inc.com>.
Here is a snippet from your blog -
http://www.cloudera.com/blog/2010/10/cdh3-beta-3-now-available/

------

Security Enhancements
As one of the primary contributors and largest production users of Hadoop,
Yahoo! publishes the source tree for the version of Hadoop that they run on
their production clusters. We are pleased to announce that we have merged
Yahoo¹s source tree into CDH3b3. This merge brings many improvements
developed at Yahoo! into CDH, including improvements for MapReduce
scalability on 1000+-node clusters and several new tools for benchmarking
and testing Hadoop.
------

It would be great, if you can list how many of 192 changes were reviewed and
became part of CDH.

Your -1 vote essentially blocks the changes that are already available in
CDH to be available from Apache open source!


On 5/4/11 3:30 PM, "Todd Lipcon" <to...@cloudera.com> wrote:

> With Cloudera hat on, I agree with Eli's assessment.
> 
> With Apache hat on, I don't see how this is at all relevant to the task at
> hand. I would make the same arguments against taking CDH3 and releasing it
> as an ASF artifact -- we'd also have a certain amount of work to do to make
> sure that all of the patches are in trunk, first. Additionally, I'd want to
> outline what the inclusion criteria would be for that branch.
> 
> -Todd
> 
> On Wed, May 4, 2011 at 3:24 PM, Eli Collins <el...@cloudera.com> wrote:
> 
>> With my Cloudera hat on..
>> 
>> When we went through the 10x and 20x patches we only pulled a subset
>> of them, primarily for security and the general improvements that we
>> thought were good.  We found both incompatible changes and some
>> sketchy changes that we did not pull in from a quality perspective.
>> There is a big difference between a patch set that's acceptable for
>> Yahoo!'s user base and one that's a more general artifact.
>> 
>> When we evaluated the YDH patch sets we were using that frame of mind.
>>  I'm now looking it in terms of an Apache release. And the place to
>> review changes for an Apache release is on jira.
>> 
>> CDH3 is based on the latest stable Apache release (20.2) so it doesn't
>> regress against it.  I'm nervous about rebasing future releases on 203
>> because of the compatibility and quality implications.
>> 
>> Thanks,
>> Eli
>> 
>> 
>> On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <su...@yahoo-inc.com>
>> wrote:
>>> Eli,
>>> 
>>> How many of these patches that you find troublesome are in CDH already?
>>> 
>>> Regards,
>>> Suresh
>>> 
>>> 
>>> On 5/4/11 3:03 PM, "Eli Collins" <el...@cloudera.com> wrote:
>>> 
>>>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org>
>> wrote:
>>>>> Here's an updated release candidate for 0.20.203.0. I've incorporated
>> the
>>>>> feedback and included all of the patches from 0.20.2, which is the last
>>>>> stable release. I also fixed the eclipse-plugin problem.
>>>>> 
>>>>> The candidate is at:
>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>>>>> 
>>>>> Please download it, inspect it, compile it, and test it. Clearly, I'm
>> +1.
>>>>> 
>>>>> -- Owen
>>>> 
>>>> While rc2 is an improvement on rc1, I am -1 on this particular rc.
>>  Rationale:
>>>> 
>>>> This rc contains many patches not yet committed to trunk. This would
>>>> cause the next major release (0.22) to be a feature regression against
>>>> our latest stable release (203), were 0.22 released soon.
>>>> 
>>>> This rc contains many patches not yet reviewed by the community via
>>>> the normal process (jira, patch against trunk, merge to a release
>>>> branch). I think we should respect the existing community process that
>>>> has been used for all previous releases.
>>>> 
>>>> This rc introduces a new development and braching model (new feature
>>>> development outside trunk) and Hadoop versioning scheme without
>>>> sufficient discussion or proposal of these changes with the community.
>>>> 
>>>> We should establish new process before the release, a release is not
>>>> the appropriate mechanism for changing our review and development
>>>> process or versioning .
>>>> 
>>>> I do support a release from branch-0.20-security that follows the
>>>> existing, established community process.
>>>> 
>>>> Thanks,
>>>> Eli
>>> 
>>> 
>> 
> 
> 


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Todd Lipcon <to...@cloudera.com>.
With Cloudera hat on, I agree with Eli's assessment.

With Apache hat on, I don't see how this is at all relevant to the task at
hand. I would make the same arguments against taking CDH3 and releasing it
as an ASF artifact -- we'd also have a certain amount of work to do to make
sure that all of the patches are in trunk, first. Additionally, I'd want to
outline what the inclusion criteria would be for that branch.

-Todd

On Wed, May 4, 2011 at 3:24 PM, Eli Collins <el...@cloudera.com> wrote:

> With my Cloudera hat on..
>
> When we went through the 10x and 20x patches we only pulled a subset
> of them, primarily for security and the general improvements that we
> thought were good.  We found both incompatible changes and some
> sketchy changes that we did not pull in from a quality perspective.
> There is a big difference between a patch set that's acceptable for
> Yahoo!'s user base and one that's a more general artifact.
>
> When we evaluated the YDH patch sets we were using that frame of mind.
>  I'm now looking it in terms of an Apache release. And the place to
> review changes for an Apache release is on jira.
>
> CDH3 is based on the latest stable Apache release (20.2) so it doesn't
> regress against it.  I'm nervous about rebasing future releases on 203
> because of the compatibility and quality implications.
>
> Thanks,
> Eli
>
>
> On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <su...@yahoo-inc.com>
> wrote:
> > Eli,
> >
> > How many of these patches that you find troublesome are in CDH already?
> >
> > Regards,
> > Suresh
> >
> >
> > On 5/4/11 3:03 PM, "Eli Collins" <el...@cloudera.com> wrote:
> >
> >> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org>
> wrote:
> >>> Here's an updated release candidate for 0.20.203.0. I've incorporated
> the
> >>> feedback and included all of the patches from 0.20.2, which is the last
> >>> stable release. I also fixed the eclipse-plugin problem.
> >>>
> >>> The candidate is at:
> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
> >>>
> >>> Please download it, inspect it, compile it, and test it. Clearly, I'm
> +1.
> >>>
> >>> -- Owen
> >>
> >> While rc2 is an improvement on rc1, I am -1 on this particular rc.
>  Rationale:
> >>
> >> This rc contains many patches not yet committed to trunk. This would
> >> cause the next major release (0.22) to be a feature regression against
> >> our latest stable release (203), were 0.22 released soon.
> >>
> >> This rc contains many patches not yet reviewed by the community via
> >> the normal process (jira, patch against trunk, merge to a release
> >> branch). I think we should respect the existing community process that
> >> has been used for all previous releases.
> >>
> >> This rc introduces a new development and braching model (new feature
> >> development outside trunk) and Hadoop versioning scheme without
> >> sufficient discussion or proposal of these changes with the community.
> >>
> >> We should establish new process before the release, a release is not
> >> the appropriate mechanism for changing our review and development
> >> process or versioning .
> >>
> >> I do support a release from branch-0.20-security that follows the
> >> existing, established community process.
> >>
> >> Thanks,
> >> Eli
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
With my Cloudera hat on..

When we went through the 10x and 20x patches we only pulled a subset
of them, primarily for security and the general improvements that we
thought were good.  We found both incompatible changes and some
sketchy changes that we did not pull in from a quality perspective.
There is a big difference between a patch set that's acceptable for
Yahoo!'s user base and one that's a more general artifact.

When we evaluated the YDH patch sets we were using that frame of mind.
 I'm now looking it in terms of an Apache release. And the place to
review changes for an Apache release is on jira.

CDH3 is based on the latest stable Apache release (20.2) so it doesn't
regress against it.  I'm nervous about rebasing future releases on 203
because of the compatibility and quality implications.

Thanks,
Eli


On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas <su...@yahoo-inc.com> wrote:
> Eli,
>
> How many of these patches that you find troublesome are in CDH already?
>
> Regards,
> Suresh
>
>
> On 5/4/11 3:03 PM, "Eli Collins" <el...@cloudera.com> wrote:
>
>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
>>> Here's an updated release candidate for 0.20.203.0. I've incorporated the
>>> feedback and included all of the patches from 0.20.2, which is the last
>>> stable release. I also fixed the eclipse-plugin problem.
>>>
>>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>>>
>>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>>>
>>> -- Owen
>>
>> While rc2 is an improvement on rc1, I am -1 on this particular rc.  Rationale:
>>
>> This rc contains many patches not yet committed to trunk. This would
>> cause the next major release (0.22) to be a feature regression against
>> our latest stable release (203), were 0.22 released soon.
>>
>> This rc contains many patches not yet reviewed by the community via
>> the normal process (jira, patch against trunk, merge to a release
>> branch). I think we should respect the existing community process that
>> has been used for all previous releases.
>>
>> This rc introduces a new development and braching model (new feature
>> development outside trunk) and Hadoop versioning scheme without
>> sufficient discussion or proposal of these changes with the community.
>>
>> We should establish new process before the release, a release is not
>> the appropriate mechanism for changing our review and development
>> process or versioning .
>>
>> I do support a release from branch-0.20-security that follows the
>> existing, established community process.
>>
>> Thanks,
>> Eli
>
>

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Suresh Srinivas <su...@yahoo-inc.com>.
Eli,

How many of these patches that you find troublesome are in CDH already?

Regards,
Suresh


On 5/4/11 3:03 PM, "Eli Collins" <el...@cloudera.com> wrote:

> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
>> Here's an updated release candidate for 0.20.203.0. I've incorporated the
>> feedback and included all of the patches from 0.20.2, which is the last
>> stable release. I also fixed the eclipse-plugin problem.
>> 
>> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>> 
>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>> 
>> -- Owen
> 
> While rc2 is an improvement on rc1, I am -1 on this particular rc.  Rationale:
> 
> This rc contains many patches not yet committed to trunk. This would
> cause the next major release (0.22) to be a feature regression against
> our latest stable release (203), were 0.22 released soon.
> 
> This rc contains many patches not yet reviewed by the community via
> the normal process (jira, patch against trunk, merge to a release
> branch). I think we should respect the existing community process that
> has been used for all previous releases.
> 
> This rc introduces a new development and braching model (new feature
> development outside trunk) and Hadoop versioning scheme without
> sufficient discussion or proposal of these changes with the community.
> 
> We should establish new process before the release, a release is not
> the appropriate mechanism for changing our review and development
> process or versioning .
> 
> I do support a release from branch-0.20-security that follows the
> existing, established community process.
> 
> Thanks,
> Eli


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.
>
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>
> -- Owen

While rc2 is an improvement on rc1, I am -1 on this particular rc.  Rationale:

This rc contains many patches not yet committed to trunk. This would
cause the next major release (0.22) to be a feature regression against
our latest stable release (203), were 0.22 released soon.

This rc contains many patches not yet reviewed by the community via
the normal process (jira, patch against trunk, merge to a release
branch). I think we should respect the existing community process that
has been used for all previous releases.

This rc introduces a new development and braching model (new feature
development outside trunk) and Hadoop versioning scheme without
sufficient discussion or proposal of these changes with the community.

We should establish new process before the release, a release is not
the appropriate mechanism for changing our review and development
process or versioning .

I do support a release from branch-0.20-security that follows the
existing, established community process.

Thanks,
Eli

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Owen O'Malley <om...@apache.org>.
On May 4, 2011, at 1:17 PM, Allen Wittenauer wrote:

> 	Am I misreading this, or are the MR protocols out of sync between 0.20.203 and 0.21?  It would also appear that this is marked stable in 0.21. What is the user impact?

The names of the protocols were changed, but the names of the protocols aren't user-facing. The protocols themselves also changed, as with all Hadoop major versions. (We need to switch to protobuf or something for RPC to provide wire compatibility.) 

-- Owen

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 4, 2011, at 10:31 AM, Owen O'Malley wrote:

> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. 
> 
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
> 
> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.


	Am I misreading this, or are the MR protocols out of sync between 0.20.203 and 0.21?  It would also appear that this is marked stable in 0.21. What is the user impact?



Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Sanjay Radia <sr...@yahoo-inc.com>.
+1
  downloaded, built, deployed on one node cluster.


sanjay

On May 4, 2011, at 10:31 AM, Owen O'Malley wrote:

> Here's an updated release candidate for 0.20.203.0. I've  
> incorporated the feedback and included all of the patches from  
> 0.20.2, which is the last stable release. I also fixed the eclipse- 
> plugin problem.
>
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
> Please download it, inspect it, compile it, and test it. Clearly,  
> I'm +1.
>
> -- Owen


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Stack <st...@duboce.net>.
I abstain: +/- 0.

I can't vote against the good work done by the crew at Y!

But I can't vote for a 0.20.clusterbomb release that railroads over
precedent compounding further the existing confusion that already
exists around the state of Hadoop.

Thanks,
St.Ack


On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.
>
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>
> -- Owen

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On May 4, 2011, at 11:52 PM, Jean-Daniel Cryans wrote:

> Roy,
> 
> On Wed, May 4, 2011 at 7:22 PM, Roy T. Fielding <fi...@gbiv.com> wrote:
>> The ASF is a vehicle for whomever wishes to collaborate on a
>> given project.  Collaboration means helping do the work.  Those
>> who do the work may do so for whatever reasons that they think
>> are good, whether it is because they feel like being charitable
>> today, they get paid a salary and the big boss said "work on
>> this part", or because they just have an itch worth scratching.
>> 
>> Apache does not care why people choose to collaborate or
>> how they choose to apply their own intellectual efforts.  We
>> welcome all forms of contribution under the terms of our license.
> 
> I don't think I was arguing against the contribution of the code in
> that branch, it's very welcome, but I'm questioning (and ranting
> about) the motivation for releasing a version that even just by name
> is a weird hulla-hoop around the usual development practices that
> Hadoop has had in the past (not that it's set in stone).

Yes, and I said that kind of questioning is not appropriate.
You are not responsible for other peoples' motivation.

> So I wanted to contribute my negative non-binding vote to highlight
> that this release is probably very confusing for the general user.
> This is 0.20, but it's not. Also it has more numbers, and it starts at
> 203. Why doing this at all instead of just moving on with 0.22? Or is
> 0.22 bound to be like 0.21? It almost begs the question if this should
> be called 0.22.0 then.

Yes, I already made that same point.  You don't need to talk about
motivation in order to do so.

If I had a vote, I would have voted -1 just because version numbers
do matter to users, the three number form is well-known, and minting
new versions is far cheaper than adding extra numbers.  I'd have cut
the release candidate as 0.30.  However, I did not do that work.
The person who did the work chose 0.20.203.0.  Anyone who doesn't
like that should vote accordingly and, preferably, make your
communication about such things more open in the future so
that you don't waste others' time on extra builds.  And if the
majority thinks releasing these bits are more important than my
concerns, then I have to accept that as the will of the project.

Please note also that policies are not technical discussions.
Likewise, version numbers are not technical. If those were
technical changes then anyone on the PMC could veto them,
which would effectively mean anyone could veto a release
and the project would quickly devolve into tyranny by minority.

Likewise, just because I said that a successful release defines
its own set of precedents (and therein policy), that doesn't
mean the project can't vote on a new policy the next day or
make another release that sets it again moving forward.
Progress is in the doing.

....Roy

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Roy,

On Wed, May 4, 2011 at 7:22 PM, Roy T. Fielding <fi...@gbiv.com> wrote:
> The ASF is a vehicle for whomever wishes to collaborate on a
> given project.  Collaboration means helping do the work.  Those
> who do the work may do so for whatever reasons that they think
> are good, whether it is because they feel like being charitable
> today, they get paid a salary and the big boss said "work on
> this part", or because they just have an itch worth scratching.
>
> Apache does not care why people choose to collaborate or
> how they choose to apply their own intellectual efforts.  We
> welcome all forms of contribution under the terms of our license.

I don't think I was arguing against the contribution of the code in
that branch, it's very welcome, but I'm questioning (and ranting
about) the motivation for releasing a version that even just by name
is a weird hulla-hoop around the usual development practices that
Hadoop has had in the past (not that it's set in stone).

So I wanted to contribute my negative non-binding vote to highlight
that this release is probably very confusing for the general user.
This is 0.20, but it's not. Also it has more numbers, and it starts at
203. Why doing this at all instead of just moving on with 0.22? Or is
0.22 bound to be like 0.21? It almost begs the question if this should
be called 0.22.0 then.

>
> What we do require is a certain amount of civility regarding
> our voting procedures and an emphasis on individual responsibility
> for your votes.  Anyone caught *voting* a particular way just
> because the boss says so will be dealt with severely.  Votes
> are how we do quality control and make decisions, and no other
> company can be allowed to make decisions for our non-profit.

Yeah I don't think that's a problem here, everyone seem to have their
very own strong opinions.

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Milind Bhandarkar <mb...@linkedin.com>.
Thanks Kos. Archived mailing lists come in handy. Many thanks to Apache to
have http://mail-archives.apache.org/mod_mbox/hadoop-general/.

- milind
-- 
Milind Bhandarkar
mbhandarkar@linkedin.com
+1-650-776-3167






On 5/6/11 11:57 PM, "Konstantin Boudnik" <co...@apache.org> wrote:

>Wow! Great compilation, Milind! Very nice to have the sequence of events
>handy.
>
>Thanks,
>  Cos
>
>On Fri, May 6, 2011 at 23:55, Milind Bhandarkar
><mb...@linkedin.com> wrote:
>> [I am not on PMC, but seeing that PMC may be busy with other issues, I
>> will try to answer your questions.]
>>
>> Eric,
>>
>> I think the thread
>> 
>>"http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C1
>>8C
>> 5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com%3E" will answer your
>> questions. Here is the timeline as I see it:
>>
>> 1. Arun proposes to create a release from the security patchset. Says
>>Doug
>> has proposed this earlier
>> 
>>(http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3C4
>>BD
>> 1DFEA.5020908@apache.org%3E April 23, 2010) ("This has been proposed
>> earlier by Doug and did not get far due to concerns about the effect
>>this
>> would have on development on trunk.") (August 24, 2010)
>>
>> 2. Lots of +1s, between August 24 to August 30 2010. One particular
>> comment is from Tom White: "I think it would be good to have a shared
>>0.20
>> Apache security branch.
>> Since security isn't in 0.21, and the 0.22 release is a some way off
>> as you mention, this would be useful for folks who want the security
>> features sooner (and want to use an Apache release)."
>>
>> 3. Arun volunteers to create a release (August 30, 2010)
>>
>> 4. Doug reminds Arun. (October 15, 2010)
>>
>> 5. Arun apologizes for not creating a branch because he was busy,
>>because
>> he had a baby. (January 11, 2011)
>>
>> 6. Lots of discussion about what to call it (the release, not the baby,
>> although I had a good laugh at Patrick Angeles's email: "You're gonna
>>call
>> your kid 20.100?" ;-).
>>
>> 7. Arun proposes to call it 0.20.100: "I'm open to suggestions - how
>>about
>> something like 20.100 to show that it's a big jump? Anything else?" Jan
>> 12, 2011
>>
>> 8. Among others, Eli says: "+1 on 0.20.x   (where x is a J > 3)" on Jan
>> 12, 2011.
>>
>> So, as you can see, even if this release is called 0.20.x, the community
>> agreed that these are valuable patches to have, and despite backward
>> incompatibility, still have them in minor release.
>>
>> - milind
>>
>> --
>> Milind Bhandarkar
>> mbhandarkar@linkedin.com
>> +1-650-776-3167
>>
>>
>>
>>
>>
>>
>> On 5/6/11 11:14 PM, "Eric Sammer" <es...@cloudera.com> wrote:
>>
>>>On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:
>>>
>>>I understand Eli's concerns that putting stuff in there that hasn't gone
>>>into trunk yet is danger. However, as the team makes no guarantees of
>>>100%
>>>compatibility between releases, I don't think it's critical. It's just
>>>something that needs to be addressed -which can be done after this
>>>release
>>>has shipped.
>>>
>>>
>>>I was under the impression that the community has been extremely strict
>>>about compatibility between minor version bumps in the past. I though
>>>there
>>>were specific guarantees and that was one of the reasons certain
>>>behaviors
>>>have persisted so long.
>>>
>>>Does this mean API changes can be made in minor releases and it can be
>>>made
>>>backward compatible in future releases? That seems very, very counter to
>>>various conversations that have happened in the past. I'm of the mind
>>>that
>>>we should continue to promise what we've always promised and if that's
>>>changing, let's make with the refactoring party!
>>>
>>>Can some PMC'ers clarify this one for me?
>>>
>>>TIA.
>>>Sammer
>>>
>>>
>>>
>>>-Steve
>>
>>


Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Konstantin Boudnik <co...@apache.org>.
Wow! Great compilation, Milind! Very nice to have the sequence of events handy.

Thanks,
  Cos

On Fri, May 6, 2011 at 23:55, Milind Bhandarkar
<mb...@linkedin.com> wrote:
> [I am not on PMC, but seeing that PMC may be busy with other issues, I
> will try to answer your questions.]
>
> Eric,
>
> I think the thread
> "http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C18C
> 5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com%3E" will answer your
> questions. Here is the timeline as I see it:
>
> 1. Arun proposes to create a release from the security patchset. Says Doug
> has proposed this earlier
> (http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3C4BD
> 1DFEA.5020908@apache.org%3E April 23, 2010) ("This has been proposed
> earlier by Doug and did not get far due to concerns about the effect this
> would have on development on trunk.") (August 24, 2010)
>
> 2. Lots of +1s, between August 24 to August 30 2010. One particular
> comment is from Tom White: "I think it would be good to have a shared 0.20
> Apache security branch.
> Since security isn't in 0.21, and the 0.22 release is a some way off
> as you mention, this would be useful for folks who want the security
> features sooner (and want to use an Apache release)."
>
> 3. Arun volunteers to create a release (August 30, 2010)
>
> 4. Doug reminds Arun. (October 15, 2010)
>
> 5. Arun apologizes for not creating a branch because he was busy, because
> he had a baby. (January 11, 2011)
>
> 6. Lots of discussion about what to call it (the release, not the baby,
> although I had a good laugh at Patrick Angeles's email: "You're gonna call
> your kid 20.100?" ;-).
>
> 7. Arun proposes to call it 0.20.100: "I'm open to suggestions - how about
> something like 20.100 to show that it's a big jump? Anything else?" Jan
> 12, 2011
>
> 8. Among others, Eli says: "+1 on 0.20.x   (where x is a J > 3)" on Jan
> 12, 2011.
>
> So, as you can see, even if this release is called 0.20.x, the community
> agreed that these are valuable patches to have, and despite backward
> incompatibility, still have them in minor release.
>
> - milind
>
> --
> Milind Bhandarkar
> mbhandarkar@linkedin.com
> +1-650-776-3167
>
>
>
>
>
>
> On 5/6/11 11:14 PM, "Eric Sammer" <es...@cloudera.com> wrote:
>
>>On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:
>>
>>I understand Eli's concerns that putting stuff in there that hasn't gone
>>into trunk yet is danger. However, as the team makes no guarantees of 100%
>>compatibility between releases, I don't think it's critical. It's just
>>something that needs to be addressed -which can be done after this release
>>has shipped.
>>
>>
>>I was under the impression that the community has been extremely strict
>>about compatibility between minor version bumps in the past. I though
>>there
>>were specific guarantees and that was one of the reasons certain behaviors
>>have persisted so long.
>>
>>Does this mean API changes can be made in minor releases and it can be
>>made
>>backward compatible in future releases? That seems very, very counter to
>>various conversations that have happened in the past. I'm of the mind that
>>we should continue to promise what we've always promised and if that's
>>changing, let's make with the refactoring party!
>>
>>Can some PMC'ers clarify this one for me?
>>
>>TIA.
>>Sammer
>>
>>
>>
>>-Steve
>
>

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eli Collins <el...@cloudera.com>.
On Sat, May 7, 2011 at 6:36 PM, Ian Holsman <ha...@holsman.net> wrote:
>
> On May 8, 2011, at 9:50 AM, Eric Sammer wrote:
>
>> do we permit
>> backward incompatible changes between 0.22.0 and 0.22.1 or is this
>> something we've allowed just for the 203 release?
>
> good question.
> do we allow incompatible (smallish) features to be added to a 20.x release.
> hoping that they will eventually be put into trunk at a later stage.
> and if we need a process or something around it, or will just act on good faith that it will occur.

We do allow it, as the 203 release shows. I don't think we need an
official process, our existing practice of filing a jira with the
appropriate fix version is sufficient. And this seems to be happening
already for all changes to future 20x releases.

Compatibility is just one reason it's important that features be
developed on trunk first. Security and other enhancements were
developed on 20 and forward ported to trunk. Almost two years later
the forward porting is still not complete - if we released from trunk
today, it would not support security. Ditto for append. Some people
who work on HBase consider the code in branch-20-append to be more
reliable than the code on trunk. This is the primary reason why people
are currently on 20.x releases.

People are of course free to contribute to Apache on whatever branch
they please, but I think as a development community we need to try to
make sure a future release off trunk is not a regression against
previous releases. This won't happen unless we invest in trunk. I
support a release from branch-20-security, I just wanted to see the
work go into trunk first. Ie I think the staging matters. (and I agree
with Ray wrt the version scheme but that's independent).

Fortunately, I don't think we're far off. The vast majority of
security work is in trunk (thanks to all those who did the porting),
there's probably only 50 to 100 bugs/enhancements in
branch-20-security not yet in trunk, and the append code (or rather
sync) to support HBase just needs some tests and debugging.

I agree with Eric that is is the desire to put improvements into users
hands more quickly that drives orgs to produce releases of Hadoop
outside Apache. I think the best way to address this is to get solid
releases coming off trunk on a regular basis again. Hence the push to
close out 22 and work on getting trunk in shape. Also, as a
development community, we're at our best when collaborating on trunk.

Thanks,
Eli

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Ian Holsman <ha...@holsman.net>.
On May 8, 2011, at 9:50 AM, Eric Sammer wrote:

> do we permit
> backward incompatible changes between 0.22.0 and 0.22.1 or is this
> something we've allowed just for the 203 release?

good question.
do we allow incompatible (smallish) features to be added to a 20.x release.
hoping that they will eventually be put into trunk at a later stage.
and if we need a process or something around it, or will just act on good faith that it will occur.

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On May 10, 2011, at 10:49 PM, Aaron T. Myers wrote:

> On Tue, May 10, 2011 at 10:24 PM, Devaraj Das <dd...@yahoo-inc.com>  
> wrote:
>
> .....
>
> By far the most significant incompatibility that I've seen from a user
> perspective is that setting hadoop.job.ugi no longer has any effect.
> Granted, this interface wasn't used by a large percentage of users,  
> but
> those that were using it have no other alternative that is as  
> flexible as
> this was. There was discussion about this incompatibility last  
> September on
> the mailing lists[1]. The conclusion there was that supporting  
> backward
> compatibility for this interface was too difficult, semantically and
> technically, to warrant support. This incompatibility is present  
> whether or
> not Kerberos support is enabled on the cluster.


That I why we added the interface audience and stability classification.
 From the very beginning (see my early proposals on HADOOP-5073) the  
security UGI
interfaces were targeted to be marked as limited private.
We completed the interface annotation in release 21, so some users did  
not realize that they should not
be using such interfaces.

sanjay


Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by "Aaron T. Myers" <at...@cloudera.com>.
On Tue, May 10, 2011 at 10:24 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:

> I can't think of any major user-facing incompatibilities other than the
> users having to run a 'kinit' when they are working with a secure Hadoop
> cluster (of course the admins need to do more work in order to set up a
> secure cluster).


By far the most significant incompatibility that I've seen from a user
perspective is that setting hadoop.job.ugi no longer has any effect.
Granted, this interface wasn't used by a large percentage of users, but
those that were using it have no other alternative that is as flexible as
this was. There was discussion about this incompatibility last September on
the mailing lists[1]. The conclusion there was that supporting backward
compatibility for this interface was too difficult, semantically and
technically, to warrant support. This incompatibility is present whether or
not Kerberos support is enabled on the cluster.

I totally agree that the pain of upgrading to 0.20 security was felt
substantially more by admins/operators than by users.

--
Aaron T. Myers
Software Engineer, Cloudera

[1]
http://mail-archives.apache.org/mod_mbox/hadoop-general/201009.mbox/%3CAANLkTiknJ_SzRux7KhjhxVfUU9FBkNgvYnkpbz3G_+a4@mail.gmail.com%3E

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, May 10, 2011 at 10:24 PM, Devaraj Das <dd...@yahoo-inc.com> wrote:

> Just so that everyone is on the same page w.r.t the compatibility between
> 20.2 & 20.203 (don't think this is documented anywhere yet)..
>
> The aim of the team working on Hadoop Security at Yahoo! was to make Hadoop
> *secure*, and with *minimal* disruption to existing apps. I can't think of
> any major user-facing incompatibilities other than the users having to run a
> 'kinit' when they are working with a secure Hadoop cluster (of course the
> admins need to do more work in order to set up a secure cluster). Also,
> security can switched off, and all the other enhancements (job limits, etc.)
> are still available.. As per users/Operations/Solutions at Yahoo!,
> 20.security was one of the smoothest upgrades ever.
>

And I think you guys did a commendable job with this, given the scope of the
project! :)

But there were certainly plenty of bugs introduced along the way that
affected both secure and non-secure, and even now the security-able branches
don't function on any non-Sun JVM.

Again, I think for this particular case, most of the developers agreed on
the risk/reward trade-off, so I didn't want to start a discussion about
security being a good or bad decision to backport on to 0.20.

But, I'd love to know what our framework is for making such decisions in the
future, if we plan to maintain branches with feature backports as part of
Apache. (eg what scope of change requires what type of vote and when)

-Todd


>
> On 5/10/11 2:28 PM, "Todd Lipcon" <to...@cloudera.com> wrote:
>
> On Tue, May 10, 2011 at 12:41 PM, Scott Carey <scott@richrelevance.com
> >wrote:
>
> >
> > As an observer, this is a very important observation.  Sure, the default
> > is that dot releases are bugfix-onl.  But exceptions to these rules are
> > sometimes required and often beneficial to the health of the project.
> > Performance enhancements, minor features, and other items are sometimes
> > very low risk and the barrier to getting them to users earlier should be
> > lower.
> >
>
> I agree whole-heartedly.
>
>
> > These issues are the sort of things that get into non-Apache releases
> > quickly and drive the community away from the Apache release.  Its been
> > well proven through those vehicles that back-porting minor features and
> > improvements from trunk to an old release can be done safely.
>
>
> However, one shouldn't understate the difficulty of agreeing on the
> risk-reward tradeoff here. While risk is mostly technical, reward may vary
> widely based on the userbase or organization.
>
> For example, everyone would agree that security was a very risky feature to
> add to 20, with known backward compatibilities and a lot of fallout. For
> some people (both CDH and YDH), the security features were an absolute
> necessity on a tight timeline, so the risk-reward decision was clear --
> I've
> heard from many users, though, that they saw none of the reward from
> security and wished they hadn't had to endure the resulting changes and
> bugs
> within the 0.20 series.
>
> Another example is the 0.20-append patch series, which is indispensable for
> the HBase community but seen as overly risky by those who do not use HBase.
>
> So, while I'm in favor of "sustaining" release series like 0.20-security in
> theory, I also think we need a clear inclusion criteria for such branches.
> As I said in a previous email, the criteria used to be "low risk compatible
> bug fixes only" with a vote process for any exceptions. 0.20-security is
> obviously entirely different, but as yet remains undefined (it's way more
> than just "security").
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Devaraj Das <dd...@yahoo-inc.com>.
Just so that everyone is on the same page w.r.t the compatibility between 20.2 & 20.203 (don't think this is documented anywhere yet)..

The aim of the team working on Hadoop Security at Yahoo! was to make Hadoop *secure*, and with *minimal* disruption to existing apps. I can't think of any major user-facing incompatibilities other than the users having to run a 'kinit' when they are working with a secure Hadoop cluster (of course the admins need to do more work in order to set up a secure cluster). Also, security can switched off, and all the other enhancements (job limits, etc.) are still available.. As per users/Operations/Solutions at Yahoo!, 20.security was one of the smoothest upgrades ever.


On 5/10/11 2:28 PM, "Todd Lipcon" <to...@cloudera.com> wrote:

On Tue, May 10, 2011 at 12:41 PM, Scott Carey <sc...@richrelevance.com>wrote:

>
> As an observer, this is a very important observation.  Sure, the default
> is that dot releases are bugfix-onl.  But exceptions to these rules are
> sometimes required and often beneficial to the health of the project.
> Performance enhancements, minor features, and other items are sometimes
> very low risk and the barrier to getting them to users earlier should be
> lower.
>

I agree whole-heartedly.


> These issues are the sort of things that get into non-Apache releases
> quickly and drive the community away from the Apache release.  Its been
> well proven through those vehicles that back-porting minor features and
> improvements from trunk to an old release can be done safely.


However, one shouldn't understate the difficulty of agreeing on the
risk-reward tradeoff here. While risk is mostly technical, reward may vary
widely based on the userbase or organization.

For example, everyone would agree that security was a very risky feature to
add to 20, with known backward compatibilities and a lot of fallout. For
some people (both CDH and YDH), the security features were an absolute
necessity on a tight timeline, so the risk-reward decision was clear -- I've
heard from many users, though, that they saw none of the reward from
security and wished they hadn't had to endure the resulting changes and bugs
within the 0.20 series.

Another example is the 0.20-append patch series, which is indispensable for
the HBase community but seen as overly risky by those who do not use HBase.

So, while I'm in favor of "sustaining" release series like 0.20-security in
theory, I also think we need a clear inclusion criteria for such branches.
As I said in a previous email, the criteria used to be "low risk compatible
bug fixes only" with a vote process for any exceptions. 0.20-security is
obviously entirely different, but as yet remains undefined (it's way more
than just "security").

-Todd
--
Todd Lipcon
Software Engineer, Cloudera


Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, May 10, 2011 at 12:41 PM, Scott Carey <sc...@richrelevance.com>wrote:

>
> As an observer, this is a very important observation.  Sure, the default
> is that dot releases are bugfix-onl.  But exceptions to these rules are
> sometimes required and often beneficial to the health of the project.
> Performance enhancements, minor features, and other items are sometimes
> very low risk and the barrier to getting them to users earlier should be
> lower.
>

I agree whole-heartedly.


> These issues are the sort of things that get into non-Apache releases
> quickly and drive the community away from the Apache release.  Its been
> well proven through those vehicles that back-porting minor features and
> improvements from trunk to an old release can be done safely.


However, one shouldn't understate the difficulty of agreeing on the
risk-reward tradeoff here. While risk is mostly technical, reward may vary
widely based on the userbase or organization.

For example, everyone would agree that security was a very risky feature to
add to 20, with known backward compatibilities and a lot of fallout. For
some people (both CDH and YDH), the security features were an absolute
necessity on a tight timeline, so the risk-reward decision was clear -- I've
heard from many users, though, that they saw none of the reward from
security and wished they hadn't had to endure the resulting changes and bugs
within the 0.20 series.

Another example is the 0.20-append patch series, which is indispensable for
the HBase community but seen as overly risky by those who do not use HBase.

So, while I'm in favor of "sustaining" release series like 0.20-security in
theory, I also think we need a clear inclusion criteria for such branches.
As I said in a previous email, the criteria used to be "low risk compatible
bug fixes only" with a vote process for any exceptions. 0.20-security is
obviously entirely different, but as yet remains undefined (it's way more
than just "security").

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Scott Carey <sc...@richrelevance.com>.

On 5/8/11 11:10 AM, "Eric Baldeschwieler" <er...@yahoo-inc.com> wrote:

>I'd agree with this too. [same disclaimer as milind, not on PMC]
>
>In general one would not expect to see an incompatible change added in a
>dot release (0.24.1 0.24.2).  I'd expect anything like that to require
>community discussion and support.
>
>As milind summarized, we seem to have support for the addition of
>security to 20.  The existing mechanism of the required release vote will
>confirm or deny that.
>
>I think it is important that compatible enhancements to hadoop are
>allowed into dot releases.  This is something that we've discussed but
>never finalized in the community.  It is the desire to put improvements
>into users hands more quickly that the next major release that drives
>orgs to produce private releases of hadoop.  In general, I think it is
>fair that such changes go into trunk first.  Exceptions to that also need
>discussion and support IMO.

As an observer, this is a very important observation.  Sure, the default
is that dot releases are bugfix-onl.  But exceptions to these rules are
sometimes required and often beneficial to the health of the project.
Performance enhancements, minor features, and other items are sometimes
very low risk and the barrier to getting them to users earlier should be
lower.  
These issues are the sort of things that get into non-Apache releases
quickly and drive the community away from the Apache release.  Its been
well proven through those vehicles that back-porting minor features and
improvements from trunk to an old release can be done safely.

>
>I think the key to making progress is discussion and the idea that
>majority support, not consensus is what is needed to make exceptions to
>our process.  Process is useful, it reduces friction.  Process without
>exception is stifling.

Absolutely -- for a subset of process exceptions, a lazy majority would be
much more useful than consensus.  Others are much more dangerous
(backwards compatibility breakage)

>
>On May 7, 2011, at 10:52 PM, Milind Bhandarkar wrote:
>
>> [Mentioning again: I am not on the PMC, and this email contains
>> non-binding opinions based on my reading the general@hadoop.apache.org
>> emails.]
>> 
>> It is my understanding that, from the beginning, the 0.20+security was
>> always treated as an exception to the normal (I.e. Pre-0.20) release
>> process. (This has been confirmed by the mailing list threads, in which
>> many of those who are objecting to this release now - stating that it
>>has
>> violated norms - have consented, actually argued for, breaking the
>>norms.)
>> 
>> For whatever I have read on this mailing list before the vote for this
>> release, it looked like most of the community agreed that what Yahoo!
>>Had
>> produced on their own branch, outside of Apache trunk, was important
>> contribution, and a release based on that would be a good idea, and
>>that a
>> one-time release should proceed. (After all, whichever organization the
>> contributors belong to, many seem to indicate that they feel ashamed not
>> having an Apache release in more than a year.)
>> 
>> From many emails on this thread, it has been clear to me, that it is a
>>one
>> time concession given for parting ways from the normal process, and I
>>hope
>> everyone understands that this is supposed to make Apache Hadoop
>>releases
>> relevant once again.
>> 
>> So, to cut it short, the 0.20.203 backward incompatibilities etc have no
>> bearing on the "normal" process, in which no backward incompatibilities
>> should be allowed in minor releases. To answer your specific question, I
>> have no reason to believe that 0.22.1 could be backward incompatible
>>with
>> 0.22.0. 
>> 
>> - milind
>> 
>> -- 
>> Milind Bhandarkar
>> mbhandarkar@linkedin.com
>> +1-650-776-3167
>> 
>> 
>> 
>> 
>> 
>> 
>> On 5/7/11 4:50 PM, "Eric Sammer" <es...@cloudera.com> wrote:
>> 
>>> Milind:
>>> 
>>> Thanks for the pointer. I remember this thread. I guess my question
>>> was unrelated to the specific release and more about the general mode
>>> of development under normal release circumstances (ie. do we permit
>>> backward incompatible changes between 0.22.0 and 0.22.1 or is this
>>> something we've allowed just for the 203 release?).
>>> 
>>> I think it's important to be clear about what the MO is so end users
>>> can plan upgrades appropriately.
>>> 
>>> Thanks!
>>> Sammer
>>> 
>>> On May 6, 2011, at 11:52 PM, Milind Bhandarkar
>>><mb...@linkedin.com>
>>> wrote:
>>> 
>>>> [I am not on PMC, but seeing that PMC may be busy with other issues, I
>>>> will try to answer your questions.]
>>>> 
>>>> Eric,
>>>> 
>>>> I think the thread
>>>> 
>>>> 
>>>>"http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3
>>>>C1
>>>> 8C
>>>> 5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com%3E" will answer your
>>>> questions. Here is the timeline as I see it:
>>>> 
>>>> 1. Arun proposes to create a release from the security patchset. Says
>>>> Doug
>>>> has proposed this earlier
>>>> 
>>>> 
>>>>(http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3
>>>>C4
>>>> BD
>>>> 1DFEA.5020908@apache.org%3E April 23, 2010) ("This has been proposed
>>>> earlier by Doug and did not get far due to concerns about the effect
>>>> this
>>>> would have on development on trunk.") (August 24, 2010)
>>>> 
>>>> 2. Lots of +1s, between August 24 to August 30 2010. One particular
>>>> comment is from Tom White: "I think it would be good to have a shared
>>>> 0.20
>>>> Apache security branch.
>>>> Since security isn't in 0.21, and the 0.22 release is a some way off
>>>> as you mention, this would be useful for folks who want the security
>>>> features sooner (and want to use an Apache release)."
>>>> 
>>>> 3. Arun volunteers to create a release (August 30, 2010)
>>>> 
>>>> 4. Doug reminds Arun. (October 15, 2010)
>>>> 
>>>> 5. Arun apologizes for not creating a branch because he was busy,
>>>> because
>>>> he had a baby. (January 11, 2011)
>>>> 
>>>> 6. Lots of discussion about what to call it (the release, not the
>>>>baby,
>>>> although I had a good laugh at Patrick Angeles's email: "You're gonna
>>>> call
>>>> your kid 20.100?" ;-).
>>>> 
>>>> 7. Arun proposes to call it 0.20.100: "I'm open to suggestions - how
>>>> about
>>>> something like 20.100 to show that it's a big jump? Anything else?"
>>>>Jan
>>>> 12, 2011
>>>> 
>>>> 8. Among others, Eli says: "+1 on 0.20.x   (where x is a J > 3)" on
>>>>Jan
>>>> 12, 2011.
>>>> 
>>>> So, as you can see, even if this release is called 0.20.x, the
>>>>community
>>>> agreed that these are valuable patches to have, and despite backward
>>>> incompatibility, still have them in minor release.
>>>> 
>>>> - milind
>>>> 
>>>> --
>>>> Milind Bhandarkar
>>>> mbhandarkar@linkedin.com
>>>> +1-650-776-3167
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 5/6/11 11:14 PM, "Eric Sammer" <es...@cloudera.com> wrote:
>>>> 
>>>>> On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:
>>>>> 
>>>>> I understand Eli's concerns that putting stuff in there that hasn't
>>>>> gone
>>>>> into trunk yet is danger. However, as the team makes no guarantees of
>>>>> 100%
>>>>> compatibility between releases, I don't think it's critical. It's
>>>>>just
>>>>> something that needs to be addressed -which can be done after this
>>>>> release
>>>>> has shipped.
>>>>> 
>>>>> 
>>>>> I was under the impression that the community has been extremely
>>>>>strict
>>>>> about compatibility between minor version bumps in the past. I though
>>>>> there
>>>>> were specific guarantees and that was one of the reasons certain
>>>>> behaviors
>>>>> have persisted so long.
>>>>> 
>>>>> Does this mean API changes can be made in minor releases and it can
>>>>>be
>>>>> made
>>>>> backward compatible in future releases? That seems very, very counter
>>>>> to
>>>>> various conversations that have happened in the past. I'm of the mind
>>>>> that
>>>>> we should continue to promise what we've always promised and if
>>>>>that's
>>>>> changing, let's make with the refactoring party!
>>>>> 
>>>>> Can some PMC'ers clarify this one for me?
>>>>> 
>>>>> TIA.
>>>>> Sammer
>>>>> 
>>>>> 
>>>>> 
>>>>> -Steve
>>>> 
>> 
>


Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
I'd agree with this too. [same disclaimer as milind, not on PMC]

In general one would not expect to see an incompatible change added in a dot release (0.24.1 0.24.2).  I'd expect anything like that to require community discussion and support.

As milind summarized, we seem to have support for the addition of security to 20.  The existing mechanism of the required release vote will confirm or deny that.

I think it is important that compatible enhancements to hadoop are allowed into dot releases.  This is something that we've discussed but never finalized in the community.  It is the desire to put improvements into users hands more quickly that the next major release that drives orgs to produce private releases of hadoop.  In general, I think it is fair that such changes go into trunk first.  Exceptions to that also need discussion and support IMO.

I think the key to making progress is discussion and the idea that majority support, not consensus is what is needed to make exceptions to our process.  Process is useful, it reduces friction.  Process without exception is stifling.  

On May 7, 2011, at 10:52 PM, Milind Bhandarkar wrote:

> [Mentioning again: I am not on the PMC, and this email contains
> non-binding opinions based on my reading the general@hadoop.apache.org
> emails.]
> 
> It is my understanding that, from the beginning, the 0.20+security was
> always treated as an exception to the normal (I.e. Pre-0.20) release
> process. (This has been confirmed by the mailing list threads, in which
> many of those who are objecting to this release now - stating that it has
> violated norms - have consented, actually argued for, breaking the norms.)
> 
> For whatever I have read on this mailing list before the vote for this
> release, it looked like most of the community agreed that what Yahoo! Had
> produced on their own branch, outside of Apache trunk, was important
> contribution, and a release based on that would be a good idea, and that a
> one-time release should proceed. (After all, whichever organization the
> contributors belong to, many seem to indicate that they feel ashamed not
> having an Apache release in more than a year.)
> 
> From many emails on this thread, it has been clear to me, that it is a one
> time concession given for parting ways from the normal process, and I hope
> everyone understands that this is supposed to make Apache Hadoop releases
> relevant once again.
> 
> So, to cut it short, the 0.20.203 backward incompatibilities etc have no
> bearing on the "normal" process, in which no backward incompatibilities
> should be allowed in minor releases. To answer your specific question, I
> have no reason to believe that 0.22.1 could be backward incompatible with
> 0.22.0. 
> 
> - milind
> 
> -- 
> Milind Bhandarkar
> mbhandarkar@linkedin.com
> +1-650-776-3167
> 
> 
> 
> 
> 
> 
> On 5/7/11 4:50 PM, "Eric Sammer" <es...@cloudera.com> wrote:
> 
>> Milind:
>> 
>> Thanks for the pointer. I remember this thread. I guess my question
>> was unrelated to the specific release and more about the general mode
>> of development under normal release circumstances (ie. do we permit
>> backward incompatible changes between 0.22.0 and 0.22.1 or is this
>> something we've allowed just for the 203 release?).
>> 
>> I think it's important to be clear about what the MO is so end users
>> can plan upgrades appropriately.
>> 
>> Thanks!
>> Sammer
>> 
>> On May 6, 2011, at 11:52 PM, Milind Bhandarkar <mb...@linkedin.com>
>> wrote:
>> 
>>> [I am not on PMC, but seeing that PMC may be busy with other issues, I
>>> will try to answer your questions.]
>>> 
>>> Eric,
>>> 
>>> I think the thread
>>> 
>>> "http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C1
>>> 8C
>>> 5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com%3E" will answer your
>>> questions. Here is the timeline as I see it:
>>> 
>>> 1. Arun proposes to create a release from the security patchset. Says
>>> Doug
>>> has proposed this earlier
>>> 
>>> (http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3C4
>>> BD
>>> 1DFEA.5020908@apache.org%3E April 23, 2010) ("This has been proposed
>>> earlier by Doug and did not get far due to concerns about the effect
>>> this
>>> would have on development on trunk.") (August 24, 2010)
>>> 
>>> 2. Lots of +1s, between August 24 to August 30 2010. One particular
>>> comment is from Tom White: "I think it would be good to have a shared
>>> 0.20
>>> Apache security branch.
>>> Since security isn't in 0.21, and the 0.22 release is a some way off
>>> as you mention, this would be useful for folks who want the security
>>> features sooner (and want to use an Apache release)."
>>> 
>>> 3. Arun volunteers to create a release (August 30, 2010)
>>> 
>>> 4. Doug reminds Arun. (October 15, 2010)
>>> 
>>> 5. Arun apologizes for not creating a branch because he was busy,
>>> because
>>> he had a baby. (January 11, 2011)
>>> 
>>> 6. Lots of discussion about what to call it (the release, not the baby,
>>> although I had a good laugh at Patrick Angeles's email: "You're gonna
>>> call
>>> your kid 20.100?" ;-).
>>> 
>>> 7. Arun proposes to call it 0.20.100: "I'm open to suggestions - how
>>> about
>>> something like 20.100 to show that it's a big jump? Anything else?" Jan
>>> 12, 2011
>>> 
>>> 8. Among others, Eli says: "+1 on 0.20.x   (where x is a J > 3)" on Jan
>>> 12, 2011.
>>> 
>>> So, as you can see, even if this release is called 0.20.x, the community
>>> agreed that these are valuable patches to have, and despite backward
>>> incompatibility, still have them in minor release.
>>> 
>>> - milind
>>> 
>>> --
>>> Milind Bhandarkar
>>> mbhandarkar@linkedin.com
>>> +1-650-776-3167
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 5/6/11 11:14 PM, "Eric Sammer" <es...@cloudera.com> wrote:
>>> 
>>>> On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:
>>>> 
>>>> I understand Eli's concerns that putting stuff in there that hasn't
>>>> gone
>>>> into trunk yet is danger. However, as the team makes no guarantees of
>>>> 100%
>>>> compatibility between releases, I don't think it's critical. It's just
>>>> something that needs to be addressed -which can be done after this
>>>> release
>>>> has shipped.
>>>> 
>>>> 
>>>> I was under the impression that the community has been extremely strict
>>>> about compatibility between minor version bumps in the past. I though
>>>> there
>>>> were specific guarantees and that was one of the reasons certain
>>>> behaviors
>>>> have persisted so long.
>>>> 
>>>> Does this mean API changes can be made in minor releases and it can be
>>>> made
>>>> backward compatible in future releases? That seems very, very counter
>>>> to
>>>> various conversations that have happened in the past. I'm of the mind
>>>> that
>>>> we should continue to promise what we've always promised and if that's
>>>> changing, let's make with the refactoring party!
>>>> 
>>>> Can some PMC'ers clarify this one for me?
>>>> 
>>>> TIA.
>>>> Sammer
>>>> 
>>>> 
>>>> 
>>>> -Steve
>>> 
> 


Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Milind Bhandarkar <mb...@linkedin.com>.
[Mentioning again: I am not on the PMC, and this email contains
non-binding opinions based on my reading the general@hadoop.apache.org
emails.]

It is my understanding that, from the beginning, the 0.20+security was
always treated as an exception to the normal (I.e. Pre-0.20) release
process. (This has been confirmed by the mailing list threads, in which
many of those who are objecting to this release now - stating that it has
violated norms - have consented, actually argued for, breaking the norms.)

For whatever I have read on this mailing list before the vote for this
release, it looked like most of the community agreed that what Yahoo! Had
produced on their own branch, outside of Apache trunk, was important
contribution, and a release based on that would be a good idea, and that a
one-time release should proceed. (After all, whichever organization the
contributors belong to, many seem to indicate that they feel ashamed not
having an Apache release in more than a year.)

>From many emails on this thread, it has been clear to me, that it is a one
time concession given for parting ways from the normal process, and I hope
everyone understands that this is supposed to make Apache Hadoop releases
relevant once again.

So, to cut it short, the 0.20.203 backward incompatibilities etc have no
bearing on the "normal" process, in which no backward incompatibilities
should be allowed in minor releases. To answer your specific question, I
have no reason to believe that 0.22.1 could be backward incompatible with
0.22.0. 
 
- milind

-- 
Milind Bhandarkar
mbhandarkar@linkedin.com
+1-650-776-3167






On 5/7/11 4:50 PM, "Eric Sammer" <es...@cloudera.com> wrote:

>Milind:
>
>Thanks for the pointer. I remember this thread. I guess my question
>was unrelated to the specific release and more about the general mode
>of development under normal release circumstances (ie. do we permit
>backward incompatible changes between 0.22.0 and 0.22.1 or is this
>something we've allowed just for the 203 release?).
>
>I think it's important to be clear about what the MO is so end users
>can plan upgrades appropriately.
>
>Thanks!
>Sammer
>
>On May 6, 2011, at 11:52 PM, Milind Bhandarkar <mb...@linkedin.com>
>wrote:
>
>> [I am not on PMC, but seeing that PMC may be busy with other issues, I
>> will try to answer your questions.]
>>
>> Eric,
>>
>> I think the thread
>> 
>>"http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C1
>>8C
>> 5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com%3E" will answer your
>> questions. Here is the timeline as I see it:
>>
>> 1. Arun proposes to create a release from the security patchset. Says
>>Doug
>> has proposed this earlier
>> 
>>(http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3C4
>>BD
>> 1DFEA.5020908@apache.org%3E April 23, 2010) ("This has been proposed
>> earlier by Doug and did not get far due to concerns about the effect
>>this
>> would have on development on trunk.") (August 24, 2010)
>>
>> 2. Lots of +1s, between August 24 to August 30 2010. One particular
>> comment is from Tom White: "I think it would be good to have a shared
>>0.20
>> Apache security branch.
>> Since security isn't in 0.21, and the 0.22 release is a some way off
>> as you mention, this would be useful for folks who want the security
>> features sooner (and want to use an Apache release)."
>>
>> 3. Arun volunteers to create a release (August 30, 2010)
>>
>> 4. Doug reminds Arun. (October 15, 2010)
>>
>> 5. Arun apologizes for not creating a branch because he was busy,
>>because
>> he had a baby. (January 11, 2011)
>>
>> 6. Lots of discussion about what to call it (the release, not the baby,
>> although I had a good laugh at Patrick Angeles's email: "You're gonna
>>call
>> your kid 20.100?" ;-).
>>
>> 7. Arun proposes to call it 0.20.100: "I'm open to suggestions - how
>>about
>> something like 20.100 to show that it's a big jump? Anything else?" Jan
>> 12, 2011
>>
>> 8. Among others, Eli says: "+1 on 0.20.x   (where x is a J > 3)" on Jan
>> 12, 2011.
>>
>> So, as you can see, even if this release is called 0.20.x, the community
>> agreed that these are valuable patches to have, and despite backward
>> incompatibility, still have them in minor release.
>>
>> - milind
>>
>> --
>> Milind Bhandarkar
>> mbhandarkar@linkedin.com
>> +1-650-776-3167
>>
>>
>>
>>
>>
>>
>> On 5/6/11 11:14 PM, "Eric Sammer" <es...@cloudera.com> wrote:
>>
>>> On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:
>>>
>>> I understand Eli's concerns that putting stuff in there that hasn't
>>>gone
>>> into trunk yet is danger. However, as the team makes no guarantees of
>>>100%
>>> compatibility between releases, I don't think it's critical. It's just
>>> something that needs to be addressed -which can be done after this
>>>release
>>> has shipped.
>>>
>>>
>>> I was under the impression that the community has been extremely strict
>>> about compatibility between minor version bumps in the past. I though
>>> there
>>> were specific guarantees and that was one of the reasons certain
>>>behaviors
>>> have persisted so long.
>>>
>>> Does this mean API changes can be made in minor releases and it can be
>>> made
>>> backward compatible in future releases? That seems very, very counter
>>>to
>>> various conversations that have happened in the past. I'm of the mind
>>>that
>>> we should continue to promise what we've always promised and if that's
>>> changing, let's make with the refactoring party!
>>>
>>> Can some PMC'ers clarify this one for me?
>>>
>>> TIA.
>>> Sammer
>>>
>>>
>>>
>>> -Steve
>>


Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eric Sammer <es...@cloudera.com>.
Milind:

Thanks for the pointer. I remember this thread. I guess my question
was unrelated to the specific release and more about the general mode
of development under normal release circumstances (ie. do we permit
backward incompatible changes between 0.22.0 and 0.22.1 or is this
something we've allowed just for the 203 release?).

I think it's important to be clear about what the MO is so end users
can plan upgrades appropriately.

Thanks!
Sammer

On May 6, 2011, at 11:52 PM, Milind Bhandarkar <mb...@linkedin.com> wrote:

> [I am not on PMC, but seeing that PMC may be busy with other issues, I
> will try to answer your questions.]
>
> Eric,
>
> I think the thread
> "http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C18C
> 5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com%3E" will answer your
> questions. Here is the timeline as I see it:
>
> 1. Arun proposes to create a release from the security patchset. Says Doug
> has proposed this earlier
> (http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3C4BD
> 1DFEA.5020908@apache.org%3E April 23, 2010) ("This has been proposed
> earlier by Doug and did not get far due to concerns about the effect this
> would have on development on trunk.") (August 24, 2010)
>
> 2. Lots of +1s, between August 24 to August 30 2010. One particular
> comment is from Tom White: "I think it would be good to have a shared 0.20
> Apache security branch.
> Since security isn't in 0.21, and the 0.22 release is a some way off
> as you mention, this would be useful for folks who want the security
> features sooner (and want to use an Apache release)."
>
> 3. Arun volunteers to create a release (August 30, 2010)
>
> 4. Doug reminds Arun. (October 15, 2010)
>
> 5. Arun apologizes for not creating a branch because he was busy, because
> he had a baby. (January 11, 2011)
>
> 6. Lots of discussion about what to call it (the release, not the baby,
> although I had a good laugh at Patrick Angeles's email: "You're gonna call
> your kid 20.100?" ;-).
>
> 7. Arun proposes to call it 0.20.100: "I'm open to suggestions - how about
> something like 20.100 to show that it's a big jump? Anything else?" Jan
> 12, 2011
>
> 8. Among others, Eli says: "+1 on 0.20.x   (where x is a J > 3)" on Jan
> 12, 2011.
>
> So, as you can see, even if this release is called 0.20.x, the community
> agreed that these are valuable patches to have, and despite backward
> incompatibility, still have them in minor release.
>
> - milind
>
> --
> Milind Bhandarkar
> mbhandarkar@linkedin.com
> +1-650-776-3167
>
>
>
>
>
>
> On 5/6/11 11:14 PM, "Eric Sammer" <es...@cloudera.com> wrote:
>
>> On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:
>>
>> I understand Eli's concerns that putting stuff in there that hasn't gone
>> into trunk yet is danger. However, as the team makes no guarantees of 100%
>> compatibility between releases, I don't think it's critical. It's just
>> something that needs to be addressed -which can be done after this release
>> has shipped.
>>
>>
>> I was under the impression that the community has been extremely strict
>> about compatibility between minor version bumps in the past. I though
>> there
>> were specific guarantees and that was one of the reasons certain behaviors
>> have persisted so long.
>>
>> Does this mean API changes can be made in minor releases and it can be
>> made
>> backward compatible in future releases? That seems very, very counter to
>> various conversations that have happened in the past. I'm of the mind that
>> we should continue to promise what we've always promised and if that's
>> changing, let's make with the refactoring party!
>>
>> Can some PMC'ers clarify this one for me?
>>
>> TIA.
>> Sammer
>>
>>
>>
>> -Steve
>

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Milind Bhandarkar <mb...@linkedin.com>.
[I am not on PMC, but seeing that PMC may be busy with other issues, I
will try to answer your questions.]

Eric,

I think the thread 
"http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C18C
5C999-4680-4684-BC55-A430C40FD746@yahoo-inc.com%3E" will answer your
questions. Here is the timeline as I see it:

1. Arun proposes to create a release from the security patchset. Says Doug
has proposed this earlier
(http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3C4BD
1DFEA.5020908@apache.org%3E April 23, 2010) ("This has been proposed
earlier by Doug and did not get far due to concerns about the effect this
would have on development on trunk.") (August 24, 2010)

2. Lots of +1s, between August 24 to August 30 2010. One particular
comment is from Tom White: "I think it would be good to have a shared 0.20
Apache security branch.
Since security isn't in 0.21, and the 0.22 release is a some way off
as you mention, this would be useful for folks who want the security
features sooner (and want to use an Apache release)."

3. Arun volunteers to create a release (August 30, 2010)

4. Doug reminds Arun. (October 15, 2010)

5. Arun apologizes for not creating a branch because he was busy, because
he had a baby. (January 11, 2011)
 
6. Lots of discussion about what to call it (the release, not the baby,
although I had a good laugh at Patrick Angeles's email: "You're gonna call
your kid 20.100?" ;-).

7. Arun proposes to call it 0.20.100: "I'm open to suggestions - how about
something like 20.100 to show that it's a big jump? Anything else?" Jan
12, 2011

8. Among others, Eli says: "+1 on 0.20.x   (where x is a J > 3)" on Jan
12, 2011.

So, as you can see, even if this release is called 0.20.x, the community
agreed that these are valuable patches to have, and despite backward
incompatibility, still have them in minor release.

- milind

-- 
Milind Bhandarkar
mbhandarkar@linkedin.com
+1-650-776-3167






On 5/6/11 11:14 PM, "Eric Sammer" <es...@cloudera.com> wrote:

>On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:
>
>I understand Eli's concerns that putting stuff in there that hasn't gone
>into trunk yet is danger. However, as the team makes no guarantees of 100%
>compatibility between releases, I don't think it's critical. It's just
>something that needs to be addressed -which can be done after this release
>has shipped.
>
>
>I was under the impression that the community has been extremely strict
>about compatibility between minor version bumps in the past. I though
>there
>were specific guarantees and that was one of the reasons certain behaviors
>have persisted so long.
>
>Does this mean API changes can be made in minor releases and it can be
>made
>backward compatible in future releases? That seems very, very counter to
>various conversations that have happened in the past. I'm of the mind that
>we should continue to promise what we've always promised and if that's
>changing, let's make with the refactoring party!
>
>Can some PMC'ers clarify this one for me?
>
>TIA.
>Sammer
>
>
>
>-Steve


Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Eric Sammer <es...@cloudera.com>.
On May 6, 2011, at 4:53 AM, Steve Loughran <st...@apache.org> wrote:

I understand Eli's concerns that putting stuff in there that hasn't gone
into trunk yet is danger. However, as the team makes no guarantees of 100%
compatibility between releases, I don't think it's critical. It's just
something that needs to be addressed -which can be done after this release
has shipped.


I was under the impression that the community has been extremely strict
about compatibility between minor version bumps in the past. I though there
were specific guarantees and that was one of the reasons certain behaviors
have persisted so long.

Does this mean API changes can be made in minor releases and it can be made
backward compatible in future releases? That seems very, very counter to
various conversations that have happened in the past. I'm of the mind that
we should continue to promise what we've always promised and if that's
changing, let's make with the refactoring party!

Can some PMC'ers clarify this one for me?

TIA.
Sammer



-Steve

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Andrew Purtell <ap...@apache.org>.
> -we've been using Y! releases, and this brings their branch
> back into the apache fold, meaning we can say "the official
> Apache release of Apache Hadoop is something you can use in
> production".

And it is worth consideration that there is some measure of competition between Apache Hadoop and other projects, for example: 

  http://twitter.com/#!/CurtMonash/status/66343706743160832
  "@SethGrimes The @DataStax story is "Cassandra is mature,
  unlike Apache Hadoop, and hence will rescue Hadoop"" I've
  requested proof points..."

and they are happy to exploit any perception of division, lack of forward progress, and such. Perhaps this is additional, and I pray sufficient, motivation to cease the proxy battles, bury the hatchet, etc.

   - Andy


--- On Fri, 5/6/11, Steve Loughran <st...@apache.org> wrote:

> From: Steve Loughran <st...@apache.org>
> Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1
> To: general@hadoop.apache.org
> Date: Friday, May 6, 2011, 4:52 AM
> 
> Vote: +1
> 
> I'm a committer not PMC, so it's non binding.
> 
> 
> Why:
> -looks and works OK on my desktop
> -we've been using Y! releases, and this brings their branch
> back into the apache fold, meaning we can say "the official
> Apache release of Apache Hadoop is something you can use in
> production".
> -I'm confident that the Y! team have tested this well.
> 
> I understand Eli's concerns that putting stuff in there
> that hasn't gone into trunk yet is danger. However, as the
> team makes no guarantees of 100% compatibility between
> releases, I don't think it's critical. It's just something
> that needs to be addressed -which can be done after this
> release has shipped.
> 
> 
> -Steve
> 
> 

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Steve Loughran <st...@apache.org>.
Vote: +1

I'm a committer not PMC, so it's non binding.


Why:
-looks and works OK on my desktop
-we've been using Y! releases, and this brings their branch back into 
the apache fold, meaning we can say "the official Apache release of 
Apache Hadoop is something you can use in production".
-I'm confident that the Y! team have tested this well.

I understand Eli's concerns that putting stuff in there that hasn't gone 
into trunk yet is danger. However, as the team makes no guarantees of 
100% compatibility between releases, I don't think it's critical. It's 
just something that needs to be addressed -which can be done after this 
release has shipped.


-Steve


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Andrew Purtell <ap...@apache.org>.
Non binding.

  - Andy

--- On Wed, 5/4/11, Ian Holsman <ha...@holsman.net> wrote:
> just as a Tally we have
> 6+1's (andy.. is yours binding?? if so 7)
> and 3 -1's.
> 
> so according to the votes so far we are releasing.. but
> according to our bylaws.. we need to wait 7 days for
> everyone to chime in.
> 
> --I


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Ian Holsman <ha...@holsman.net>.
just as a Tally
we have
6+1's (andy.. is yours binding?? if so 7)
and 3 -1's.

so according to the votes so far we are releasing.. but according to our bylaws.. we need to wait 7 days for everyone to chime in.

--I
On May 5, 2011, at 12:22 PM, Roy T. Fielding wrote:

> On May 4, 2011, at 6:24 PM, Jean-Daniel Cryans wrote:
> 
>> Non-biding -1.
>> 
>> I did download it and checked it out, but when I look at the
>> documentation I see it says "Hadoop 0.20 documentation" in the tab on
>> top. From what I can tell this isn't the branch 0.20 so I think it's
>> an error and from a user point of view this looks more like something
>> I would call 0.22 (although yes I understand this is 0.20 +security
>> +whatever).
>> 
>> Why would a single company push so hard to go against the "normal"
>> release process just for "the benefit of putting our work in the hands
>> of all hadoop users" is beyond me. It's not like people were begging
>> on the mailing lists to be able to get their hands on such a release
>> to the point where an emergency point release including tons of new
>> features is needed.
>> 
>> So to me the more logical reason would be monetary gains, that I would
>> understand better from a for-profit company. But then why go through
>> the hurdles of having such an ASF release when Y! isn't even selling
>> anything remotely related to Hadoop services? And why now?
>> 
>> But then there's this spinoff thing and it suddenly makes a lot more sense.
>> 
>> E14 said earlier that "That is how apache works."
>> 
>> I would say yes, maybe this is how it works, but I'm not sure I want
>> to see it working like _that_. The ASF shouldn't be the vehicle for a
>> single (future) company's wishes.
> 
> The ASF is a vehicle for whomever wishes to collaborate on a
> given project.  Collaboration means helping do the work.  Those
> who do the work may do so for whatever reasons that they think
> are good, whether it is because they feel like being charitable
> today, they get paid a salary and the big boss said "work on
> this part", or because they just have an itch worth scratching.
> 
> Apache does not care why people choose to collaborate or
> how they choose to apply their own intellectual efforts.  We
> welcome all forms of contribution under the terms of our license.
> 
> What we do require is a certain amount of civility regarding
> our voting procedures and an emphasis on individual responsibility
> for your votes.  Anyone caught *voting* a particular way just
> because the boss says so will be dealt with severely.  Votes
> are how we do quality control and make decisions, and no other
> company can be allowed to make decisions for our non-profit.
> 
> ....Roy


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On May 4, 2011, at 6:24 PM, Jean-Daniel Cryans wrote:

> Non-biding -1.
> 
> I did download it and checked it out, but when I look at the
> documentation I see it says "Hadoop 0.20 documentation" in the tab on
> top. From what I can tell this isn't the branch 0.20 so I think it's
> an error and from a user point of view this looks more like something
> I would call 0.22 (although yes I understand this is 0.20 +security
> +whatever).
> 
> Why would a single company push so hard to go against the "normal"
> release process just for "the benefit of putting our work in the hands
> of all hadoop users" is beyond me. It's not like people were begging
> on the mailing lists to be able to get their hands on such a release
> to the point where an emergency point release including tons of new
> features is needed.
> 
> So to me the more logical reason would be monetary gains, that I would
> understand better from a for-profit company. But then why go through
> the hurdles of having such an ASF release when Y! isn't even selling
> anything remotely related to Hadoop services? And why now?
> 
> But then there's this spinoff thing and it suddenly makes a lot more sense.
> 
> E14 said earlier that "That is how apache works."
> 
> I would say yes, maybe this is how it works, but I'm not sure I want
> to see it working like _that_. The ASF shouldn't be the vehicle for a
> single (future) company's wishes.

The ASF is a vehicle for whomever wishes to collaborate on a
given project.  Collaboration means helping do the work.  Those
who do the work may do so for whatever reasons that they think
are good, whether it is because they feel like being charitable
today, they get paid a salary and the big boss said "work on
this part", or because they just have an itch worth scratching.

Apache does not care why people choose to collaborate or
how they choose to apply their own intellectual efforts.  We
welcome all forms of contribution under the terms of our license.

What we do require is a certain amount of civility regarding
our voting procedures and an emphasis on individual responsibility
for your votes.  Anyone caught *voting* a particular way just
because the boss says so will be dealt with severely.  Votes
are how we do quality control and make decisions, and no other
company can be allowed to make decisions for our non-profit.

....Roy

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Non-biding -1.

I did download it and checked it out, but when I look at the
documentation I see it says "Hadoop 0.20 documentation" in the tab on
top. From what I can tell this isn't the branch 0.20 so I think it's
an error and from a user point of view this looks more like something
I would call 0.22 (although yes I understand this is 0.20 +security
+whatever).

Why would a single company push so hard to go against the "normal"
release process just for "the benefit of putting our work in the hands
of all hadoop users" is beyond me. It's not like people were begging
on the mailing lists to be able to get their hands on such a release
to the point where an emergency point release including tons of new
features is needed.

So to me the more logical reason would be monetary gains, that I would
understand better from a for-profit company. But then why go through
the hurdles of having such an ASF release when Y! isn't even selling
anything remotely related to Hadoop services? And why now?

But then there's this spinoff thing and it suddenly makes a lot more sense.

E14 said earlier that "That is how apache works."

I would say yes, maybe this is how it works, but I'm not sure I want
to see it working like _that_. The ASF shouldn't be the vehicle for a
single (future) company's wishes.

J-D

On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley <om...@apache.org> wrote:
> Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.
>
> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
> Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>
> -- Owen

Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Milind Bhandarkar <mb...@linkedin.com>.
My (non-binding) vote for 0.20.203.0-rc1 is +1.

I downloaded, compiled, ran tests, ran my bigrams example, all ran
perfectly.
(I did a single node test without security on.)

The voting criteria I used are:

1. Is this a working release? : Yes
2. Does it take the codebase forward? : Yes
3. Does it have features that the user community might find valuable? : Yes

- milind

-- 
Milind Bhandarkar
mbhandarkar@linkedin.com
+1-650-776-3167






On 5/4/11 6:10 PM, "Devaraj Das" <dd...@yahoo-inc.com> wrote:

>+1 based on some single node tests I did (with security ON).
>
>
>On 5/4/11 10:31 AM, "Owen O'Malley" <om...@apache.org> wrote:
>
>Here's an updated release candidate for 0.20.203.0. I've incorporated the
>feedback and included all of the patches from 0.20.2, which is the last
>stable release. I also fixed the eclipse-plugin problem.
>
>The candidate is at:
>http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/
>
>Please download it, inspect it, compile it, and test it. Clearly, I'm +1.
>
>-- Owen
>


Re: [VOTE] Release candidate 0.20.203.0-rc1

Posted by Devaraj Das <dd...@yahoo-inc.com>.
+1 based on some single node tests I did (with security ON).


On 5/4/11 10:31 AM, "Owen O'Malley" <om...@apache.org> wrote:

Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem.

The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/

Please download it, inspect it, compile it, and test it. Clearly, I'm +1.

-- Owen