You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Ian Holsman <ha...@holsman.net> on 2011/05/02 21:15:48 UTC

Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

moving this thread to general@

On May 3, 2011, at 3:58 AM, Doug Cutting wrote:

>> Should we release
>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
> 
> The patch selection process for this branch did not appear to be a
> community process.  A massive patch set was committed en-masse with no
> public discussion before or after about its specific composition.

guys...
1. do we agree this is an issue
2. if it is, how we do get the communication & discussion on list?

what do people think are the major issues that are stopping people talking about stuff on list are?

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by Eli Collins <el...@cloudera.com>.
Hey Eric,

I don't have any objections to a release from
branch-0.20-security-203.  However when I examined the specific patch
set I noticed the are important implications with respect to
compatibility (of for 0.20.2 and 0.22), a question about project model
(eg not reviewing patches on jira before committing them, not having
patches go through trunk, etc), and some open questions for users (eg
is this the next dot release of the stable branch?).

I agree this is a valuable artifact, but that doesn't mean it's OK to
ignore compatibility concerns, etc.

I've listed specifics questions/comments here:
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201105.mbox/%3CBANLkTinZ=xb6kJ5PTeLN5KKD9b-cwaM0OQ@mail.gmail.com%3E

Thanks,
Eli

On Mon, May 2, 2011 at 1:05 PM, Eric Baldeschwieler
<er...@yahoo-inc.com> wrote:
>
> Hi folks,
>
> This strikes me as a bit odd. I think we have already discussed this at length and agreed that a release could proceed.
>
> Since then, Arun and Owen have worked actively to incorporated community feedback into this release.
>
> All parties making Hadoop releases other then Apache have already incorporated most of the patches in this release into their products, including doug's organization. I don't see how Hadoop's users benefit from Apache not incorporating them into an Apache release.
>
> As previously discussed, all parties are welcome to champion altenative releases from Apache if they want to invest in making Apache Hadoop better.
>
> Thanks!!
>
> E14
>
> ---
> E14 - typing on glass
>
> On May 2, 2011, at 12:16 PM, "Ian Holsman" <ha...@holsman.net> wrote:
>
>> moving this thread to general@
>>
>> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
>>
>>>> Should we release
>>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>>>
>>> The patch selection process for this branch did not appear to be a
>>> community process.  A massive patch set was committed en-masse with no
>>> public discussion before or after about its specific composition.
>>
>> guys...
>> 1. do we agree this is an issue
>> 2. if it is, how we do get the communication & discussion on list?
>>
>> what do people think are the major issues that are stopping people talking about stuff on list are?
>

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by Eli Collins <el...@cloudera.com>.
Hey Eric,

I don't have any objections to a release from
branch-0.20-security-203.  However when I examined the specific patch
set I noticed the are important implications with respect to
compatibility (of for 0.20.2 and 0.22), a question about project model
(eg not reviewing patches on jira before committing them, not having
patches go through trunk, etc), and some open questions for users (eg
is this the next dot release of the stable branch?).

I agree this is a valuable artifact, but that doesn't mean it's OK to
ignore compatibility concerns, etc.

I've listed specifics questions/comments here:
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201105.mbox/%3CBANLkTinZ=xb6kJ5PTeLN5KKD9b-cwaM0OQ@mail.gmail.com%3E

Thanks,
Eli

On Mon, May 2, 2011 at 1:05 PM, Eric Baldeschwieler
<er...@yahoo-inc.com> wrote:
>
> Hi folks,
>
> This strikes me as a bit odd. I think we have already discussed this at length and agreed that a release could proceed.
>
> Since then, Arun and Owen have worked actively to incorporated community feedback into this release.
>
> All parties making Hadoop releases other then Apache have already incorporated most of the patches in this release into their products, including doug's organization. I don't see how Hadoop's users benefit from Apache not incorporating them into an Apache release.
>
> As previously discussed, all parties are welcome to champion altenative releases from Apache if they want to invest in making Apache Hadoop better.
>
> Thanks!!
>
> E14
>
> ---
> E14 - typing on glass
>
> On May 2, 2011, at 12:16 PM, "Ian Holsman" <ha...@holsman.net> wrote:
>
>> moving this thread to general@
>>
>> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
>>
>>>> Should we release
>>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>>>
>>> The patch selection process for this branch did not appear to be a
>>> community process.  A massive patch set was committed en-masse with no
>>> public discussion before or after about its specific composition.
>>
>> guys...
>> 1. do we agree this is an issue
>> 2. if it is, how we do get the communication & discussion on list?
>>
>> what do people think are the major issues that are stopping people talking about stuff on list are?
>

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by James Seigel <ja...@tynt.com>.
Hello!

I guess I am concerned as a user of hadoop that the only way to get an “endorsed” up-to-date version of hadoop one has to abandon the community and “trust” a commercial release with its special sauce.

I am just hoping that the community can put together a nice stable up-to-date patched version.  That’d be nice.  It probably won’t change my commercial deploy, but it would give me something to compare with :)

Just my $0.02 (CND)
Cheers
James.

On 2011-05-02, at 2:51 PM, Doug Cutting wrote:

> On 05/02/2011 01:05 PM, Eric Baldeschwieler wrote:
>> As previously discussed, all parties are welcome to champion
>> altenative releases from Apache if they want to invest in making
>> Apache Hadoop better.
> 
> I do not believe that different organizations should release their own
> versions of Hadoop posing as Apache releases.  If folks wish to release
> their own versions, then they should call them something else and
> release them themselves.  The Apache Hadoop project should create
> releases collaboratively, through an open process.  The standard means
> is to start a branch from trunk or a prior release and propose patches
> to that branch, one-by-one.  This candidate diverged sufficiently from
> this pattern that, for me, it doesn't qualify as a community release.
> 
> Cheers,
> 
> Doug


Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by Doug Cutting <cu...@apache.org>.
On 05/02/2011 01:05 PM, Eric Baldeschwieler wrote:
> As previously discussed, all parties are welcome to champion
> altenative releases from Apache if they want to invest in making
> Apache Hadoop better.

I do not believe that different organizations should release their own
versions of Hadoop posing as Apache releases.  If folks wish to release
their own versions, then they should call them something else and
release them themselves.  The Apache Hadoop project should create
releases collaboratively, through an open process.  The standard means
is to start a branch from trunk or a prior release and propose patches
to that branch, one-by-one.  This candidate diverged sufficiently from
this pattern that, for me, it doesn't qualify as a community release.

Cheers,

Doug

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Hi folks,

This strikes me as a bit odd. I think we have already discussed this at length and agreed that a release could proceed. 

Since then, Arun and Owen have worked actively to incorporated community feedback into this release. 

All parties making Hadoop releases other then Apache have already incorporated most of the patches in this release into their products, including doug's organization. I don't see how Hadoop's users benefit from Apache not incorporating them into an Apache release. 

As previously discussed, all parties are welcome to champion altenative releases from Apache if they want to invest in making Apache Hadoop better.

Thanks!!

E14

---
E14 - typing on glass

On May 2, 2011, at 12:16 PM, "Ian Holsman" <ha...@holsman.net> wrote:

> moving this thread to general@
> 
> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
> 
>>> Should we release
>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>> 
>> The patch selection process for this branch did not appear to be a
>> community process.  A massive patch set was committed en-masse with no
>> public discussion before or after about its specific composition.
> 
> guys...
> 1. do we agree this is an issue
> 2. if it is, how we do get the communication & discussion on list?
> 
> what do people think are the major issues that are stopping people talking about stuff on list are?

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by Steve Loughran <st...@apache.org>.
On 03/05/11 01:41, Roy T. Fielding wrote:
 >
  I am constantly amazed at how
> quiet it is in this project, at least until I remember that
> most of the work is done exclusively via jira, unlike any of
> my other followed projects that use jira.  I'd suggest that
> the right place to hold any discussion is on the dev list,
> but I am not on that list because it receives way too many
> automated notifications.  Maybe it would help discussion on
> dev if notices were sent elsewhere and only discussions were
> held on dev.

I've seen this before on the Maven lists, where there's mostly a stream 
of JIRA changes above anything else:
http://mail-archives.apache.org/mod_mbox/maven-dev/200510.mbox/browser

however, they've got no JIRA issues in their list now, which may imply 
all changes aren't going to the list, or they arent using it so much:
http://mail-archives.apache.org/mod_mbox/maven-dev/201104.mbox/browser

(pause: bisecting their list shows that in 1.mar.06 they forked JIRA to 
a separate list to hide the details of ongoing work)

In some ways it's a means of dealing with a large and fast moving 
codebase: you subscribe to the issues that matter to you, all the 
discussions on a specific feature are archived, etc.

However, it has some flaws
  -discouragement of community, you become a group of people working on 
JIRA issues, rather than on a large integrated project
  -with work spread across common, hdfs and mapreduce JIRAs and mailing 
lists, it's hard to keep all the things in your head -it is pretty much 
a full time job to do so. And I don't know about the others, but I don't 
have the time.
  -we need a way of gently moving people from those who use hadoop to 
those who develop it. To me, every end user is a warm engineering 
resource we just need to point at a problem that they care about. The 
scale of the project, its complexity, JIRA change rate and testing 
difficulties are all barriers to entry -you end up needing a team of people
  * someone to track all the issues and keep the design in their head
  * 1+ person to test
  * 1+ person to code
I don't know about others, but I can't do this on my own.

The attempt to split up into HDFS+MAPREDUCE was one tactic to deal with 
this, but it hasn't worked, we just have more mailing lists to track (or 
in my case, fall behind on).

votewise:

-I'm favour of shipping an apache release of 20.x that has the patches 
that Y! and others have added to deal with scale and availability -and 
which has been tested by them. This will provide an apache release for 
people to use in production systems -because the official apache 
releases have lagged the CDH and Y! releases.

-I'd like to see all the changes integrated into trunk too, as it 
doesn't make sense for a patch in this branch not to be in trunk.

Steve

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
PLEASE NOTE

Voting +1 for a release means that you have downloaded the
source code package, verified its signatures, compiled it
on your platform of choice, and checked to your satisfaction
that it matches the source code we have in subversion and that
is is better (in your opinion) than the last Apache release
of the same name.

The ASF relies on that minimum amount of peer review to make
sure that we don't release trojan horses, license violations,
or other things that might get us sued as a foundation or as
individuals.  If you don't have time to do it yourself, then
vote +0 (with happy feelings) and hope that there are at least
three members of the PMC who do have that time.

DO NOT +1 a release just because it seems like progress.
Progress is in the doing, not the talking.

....Roy

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by Milind Bhandarkar <mb...@linkedin.com>.
>It is perfectly reasonable for Doug (or anyone else) to vote
>on a release based on a lack of version history, adequate
>description of the sweet meats, or anything else that others
>might consider non-technical.  This is a release vote!
>It does not require consensus.  It requires minimal review
>(usually meaning three +1s) and a majority opinion of those
>on the PMC who choose to review the proposed release and vote.

Roy,

Thanks for reminding everyone that a release does not require consensus.

Regarding this release, I think anyone who runs a multi-tenant Hadoop
cluster will appreciate the user-limits feature that goes a long way to
ensure that an errant job does not take the entire cluster down. Your
operations and support people will thank you for deploying this release.

Recently I was discussing with operations folks at a company that operates
a Hadoop cluster based on a commercial distribution of Hadoop, and they
were excited to hear that they will have a way of making sure that their
cluster will not be taken down by an errant user/job, because that's one
big fear that keeps them awake.

FWIW, I am +1 for this release.

Arun, can you include a document that gives more details about what the
limits are, and how to modify your jobs to stay below these limits (I know
it is a cut-paste for you :-)?

- milind


Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On May 2, 2011, at 12:15 PM, Ian Holsman wrote:

> moving this thread to general@
> 
> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
> 
>>> Should we release
>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>> 
>> The patch selection process for this branch did not appear to be a
>> community process.  A massive patch set was committed en-masse with no
>> public discussion before or after about its specific composition.
> 
> guys...
> 1. do we agree this is an issue

Of course it is an issue.  Anyone can make it an issue -- no
agreement is necessary.

> 2. if it is, how we do get the communication & discussion on list?

By communicating and discussing on list.  Like, for example,
by proposing a release vote and people objecting to it, followed
by a polite collaboration on ways to reduce objections if that
is needed to get a release out the door.

> what do people think are the major issues that are stopping people talking about stuff on list are?

The fact that people can vote on individual issues via jira,
which means that there is effectively no discussion of the
product as a whole on list.  I am constantly amazed at how
quiet it is in this project, at least until I remember that
most of the work is done exclusively via jira, unlike any of
my other followed projects that use jira.  I'd suggest that
the right place to hold any discussion is on the dev list,
but I am not on that list because it receives way too many
automated notifications.  Maybe it would help discussion on
dev if notices were sent elsewhere and only discussions were
held on dev.

By all means, produce a tarball and let the entire PMC vote
on it as the next release.  My personal preference is to not
allow anything that deviates from the major.minor.patch release
numbering that most software projects follow, but I don't have
a vote here.

It is perfectly reasonable for Doug (or anyone else) to vote
on a release based on a lack of version history, adequate
description of the sweet meats, or anything else that others
might consider non-technical.  This is a release vote!
It does not require consensus.  It requires minimal review
(usually meaning three +1s) and a majority opinion of those
on the PMC who choose to review the proposed release and vote.

....Roy

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Hi folks,

This strikes me as a bit odd. I think we have already discussed this at length and agreed that a release could proceed. 

Since then, Arun and Owen have worked actively to incorporated community feedback into this release. 

All parties making Hadoop releases other then Apache have already incorporated most of the patches in this release into their products, including doug's organization. I don't see how Hadoop's users benefit from Apache not incorporating them into an Apache release. 

As previously discussed, all parties are welcome to champion altenative releases from Apache if they want to invest in making Apache Hadoop better.

Thanks!!

E14

---
E14 - typing on glass

On May 2, 2011, at 12:16 PM, "Ian Holsman" <ha...@holsman.net> wrote:

> moving this thread to general@
> 
> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
> 
>>> Should we release
>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>> 
>> The patch selection process for this branch did not appear to be a
>> community process.  A massive patch set was committed en-masse with no
>> public discussion before or after about its specific composition.
> 
> guys...
> 1. do we agree this is an issue
> 2. if it is, how we do get the communication & discussion on list?
> 
> what do people think are the major issues that are stopping people talking about stuff on list are?