You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Stack <st...@duboce.net> on 2010/12/23 00:30:50 UTC

DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

I propose cutting a release from the tip of the branch-0.20-append
branch [1].  I suggest the release be called hadoop-0.20.0-append.  I
volunteer to run the release process. Are folks OK with this?

Here's some background.

The branch-0.20-append was forked from branch-0.20 a few months ago by
Dhruba to add an append/sync to 0.20.x era HDFS.  The added append
facility is made of the patches attached to HDFS-200 and then a bunch
of fixup patches done by Dhruba, Hairong, Nicolas, Todd, and others.
For a complete list of differences from the tip of the Hadoop
branch-0.20, see the CHANGE.txt file in branch-0.20-append [2].  The
HDFS-200 append/sync is not the same as the append/sync implementation
that is in hadoop 0.21.x and hadoop TRUNK.

The branch-0.20-append is a relatively small deviation from hadoop
0.20.x for those who want an append/sync in an (Apache) hadoop 0.20.x
[3].  Its for those unwilling to upgrade their clusters to hadoop
0.21.0 and for those who can't wait on the coming hadoop 0.22.0.  For
applications like HBase [4], an application that runs on HDFS and
"loses data" if no working append/sync, its critical that there is an
Apache release with a working append/sync.

A few of us have been playing with this branch with a while and it
seems to do the right thing.  Its fairly close to what FB runs
internally (correct me if I'm wrong in this last statement Dhruba).

Thanks,
St.Ack

1. http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/
2. http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt?view=markup
3. Cloudera's CDH3Beta2/3 already include an append/sync based off the
HDFS-200++ work.   There is no 'official' Apache hadoop 0.20.x with a
working append/sync.
4. http://hbase.apache.org

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Owen O'Malley <om...@apache.org>.
On Wed, Dec 22, 2010 at 4:05 PM, Ian Holsman <ha...@holsman.net> wrote:

> Hi St.Ack.
>
> In general I'm opposed to such a thing.
>
> There are already 5 Hadoop 20.x releases out there, I don't think there is
> a need for another.
>

Stack is trying to get an Apache version of Hadoop that solves his problem.
He's been asking for it for a year now. If the 20-append branch is stable,
we should release it.


>
> Is there a reason why we couldn't create a hadoop 0.20.3 release that has
> this patch inside of it, as well as other fixes that have been applied since
> 0.20.2 (~26 patches)?


It is of course possible, but no one has done the work to build the version
and test it out. As to what to call it, I'm a little hesitant to call it
0.20.3 at this point, since the last time I asked it was considered a fairly
risky change. If it goes badly, it would mean an incompatible revert onto a
branch that has been stable for 1.5 years. I'd much rather call it
0.20-append.0 and use 0.20.3 for straight bugfixes.

-- Owen

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ted Yu <yu...@gmail.com>.
> Thats why I think we should go to 0.22 ASAP and get companies to build
their new features on trunk against that.

There was a thread in Nov - 'Caution using Hadoop 0.21'
It would be helpful to see response to 0.22


> >
> > Thanks for getting the discussion off the ground,
> > St.Ack
>
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Stack <st...@duboce.net>.
On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org> wrote:
> If I remember right, there were also protocol changes in the append branch,
> which was another reason we didn't want to put it directly into the 0.20
> branch.
>

That is indeed the case Owen.
St.Ack

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Todd Lipcon <to...@cloudera.com>.
On Thu, Dec 23, 2010 at 10:15 AM, M. C. Srivas <mc...@gmail.com> wrote:

> Regardless, there will still be 2 incompatible "branches". And that is only
> the beginning.
>
> Some future features will be done only on branch 1 (since company 1 uses
> that), and other features on branch 2 (by company 2, since they prefer
> branch 2),  thereby further separating the two branches.
>
> If the goal is to avoid the split, then there are only 2 choices:
>  (a) merge both
>  (b) abandon one or the other.
>
>
The 0.20 append solution has never been seen as a fork. It's a stop-gap
fixup of the 0.20 append feature, but we don't intend to forward-port that
append implementation into trunk. From an API perspective it's very close to
the 0.22 version, and I think everyone fully intends to abandon the
0.20-append work once 0.22 append has been heavily tested for HBase
workloads.


>
> >
> > The Promised Land that we say we're all trying to get to is regular,
> > timely, feature-complete, tested, innovative but stable releases of
> > new versions of Apache Hadoop.  Missing out any one of those criteria
> > discovered will continue (and has continued) the current situation
> > where quasi-official branches and outside distributions fill the void
> > such a release should.  The effort to maintain this offical branch and
> > fix the bugs that will be discovered could be better spent moving us
> > closer to that goal.
> >
>

+1. Interestingly, the work on 0.20-append uncovered a number of bugs that
also will apply to 0.22's implementation. So it wasn't all a wasted effort
;-)

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by "M. C. Srivas" <mc...@gmail.com>.
On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan <jg...@gmail.com> wrote:

> It's difficult to support this proposal knowing how much time would be
> spent preparing an official release, continuing to support it and
> continuing to two support two separate implementations of append.  I
> believe that effort would be better spent getting out a kick-ass 22
> (or, barring that, a *really* kick-ass 23).
>

Regardless, there will still be 2 incompatible "branches". And that is only
the beginning.

Some future features will be done only on branch 1 (since company 1 uses
that), and other features on branch 2 (by company 2, since they prefer
branch 2),  thereby further separating the two branches.

If the goal is to avoid the split, then there are only 2 choices:
  (a) merge both
  (b) abandon one or the other.

Which one is one willing to stomach?




>
> The Promised Land that we say we're all trying to get to is regular,
> timely, feature-complete, tested, innovative but stable releases of
> new versions of Apache Hadoop.  Missing out any one of those criteria
> discovered will continue (and has continued) the current situation
> where quasi-official branches and outside distributions fill the void
> such a release should.  The effort to maintain this offical branch and
> fix the bugs that will be discovered could be better spent moving us
> closer to that goal.
>
> I'm certainly sympathetic to the difficult position our quagmire has
> placed HBase into.  However, the current proposal would hurt HDFS to
> help HBase. The best solution for that project, as well as for HDFS,
> is to get HDFS back to a healthy release cycle; not prolong or codify
> the current ad-hoc state of affairs.  Let's stop digging this hole.

-jakob
>
> On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas <mc...@gmail.com> wrote:
> > [ Sorry if this is be-laboring the obvious ]
> >
> > There are two append solutions floating around, and they are incompatible
> > with each other. Thus, the two "branches" will forever remain
> incompatible
> > with each other, regardless of how they are numbered (0.22,  0.23,
>  0.20.3,
> > e.t.c.)
> >
> > Unless both are merged into one branch, and a switch provided to  "use
> > HDFS-200 append" or "use 0.22 append", we have effectively split Hadoop
> into
> > two.
> >
> >
> > On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org>
> wrote:
> >
> >> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com>
> >> wrote:
> >>
> >> > Features are not release version tags.  If there is a security bug
> >> > found then we would have to release a new version of the append
> >> > version, and a round of severe trout slapping would result.
> >> >
> >>
> >> Yeah, it isn't a perfect solution and it doesn't scale to a second tag,
> but
> >> the problem is that this is effectively a release branch between 0.20
> and
> >> 0.21. Of course I agree that any critical bugs would need to be fixed in
> >> the
> >> append branch as well as the 0.20 and 0.21 branches.
> >>
> >> If you want to stick to pure numbers and we want to leave ourselves a
> way
> >> to
> >> bugfix the 0.20 branch without append, we'd could use a version string
> like
> >> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering and
> >> suggest a version jump.
> >>
> >> If I remember right, there were also protocol changes in the append
> branch,
> >> which was another reason we didn't want to put it directly into the 0.20
> >> branch.
> >>
> >> -- Owen
> >>
> >
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
That sounds like a reasonable solution to me: the HBase team bears the
burden of cutting an maintaining the release, while Hadoop Core can proceed
with 0.22. HBase had its own version of ZooKeeper in there for a while, if I
recall correctly, so it's not without precedent. No funky version numbers
have to be floating around Hadoop-land, and hopefully HBase can move back to
HDFS when 0.22 is released. It's not ideal, but potentially the best
solution given the current constraints.

On Fri, Dec 24, 2010 at 11:06 AM, Stack <st...@duboce.net> wrote:

> On Fri, Dec 24, 2010 at 10:57 AM, Chris Douglas <cd...@apache.org>
> wrote:
> > Does anything go wrong if HBase were to release the 0.20-append branch
> > as its own product?
> >
>
> This is an interesting notion.  We'd host it at hbase.apache.org
> alongside our download?  Would that be OK with others?
> St.Ack
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
I have a feeling we are thinking too much here. 

The reality is that the Hadoop community was in favor of porting append (fixes?) back to a branch based off hadoop-0.20 a while ago (Dhruba proposed the branch).

I see no reason we can't release it now, under a reasonable release name.

As Stack and Ryan have pointed out, we have severely hampered HBase so far. It behooves us to facilitate users of the entire stack to easily access Apache releases. Plus, Stack and co. are volunteering their own time (thanks!).

Arun

Sent from my iPhone

On Dec 24, 2010, at 11:33 AM, "Chris Douglas" <cd...@apache.org> wrote:

> Calling it something other than Hadoop would avoid confusing users
> (and HBase could then release bug fixes, etc. on its own schedule),
> but from how it's been described: this is acknowledging the reality of
> the situation, not proposing something radical.
> 
> HBase can be backed by the HBase FooFS and HDFS. If the former can be
> retired as a legacy platform that'd be ideal, but Hadoop will have to
> earn it. -C
> 
> On Fri, Dec 24, 2010 at 11:06 AM, Stack <st...@duboce.net> wrote:
>> On Fri, Dec 24, 2010 at 10:57 AM, Chris Douglas <cd...@apache.org> wrote:
>>> Does anything go wrong if HBase were to release the 0.20-append branch
>>> as its own product?
>>> 
>> 
>> This is an interesting notion.  We'd host it at hbase.apache.org
>> alongside our download?  Would that be OK with others?
>> St.Ack
>> 

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Stack <st...@duboce.net>.
Chris:

What you say seems sensible enough but its not what I want (smile).
My sense is that for HBase, unless the package we host at
hbase.apache.org is called hadoop-0.20.0-append -- so its clear its an
untainted bundle made from the tip of the append branch -- then we're
only going to confuse; we'll be spending our time quelling queries
about the "hbase version" of hadoop.

I'm going to pass on trying to offer an append release bundle off the
branch-0.20-append branch.  For the next HBase (imminent) release,
we'll just keep on with telling folks build their own hadoop from the
append branch or go get CDH3 (The HBase release after that will be
about getting us up on hadoop 0.22).

Thanks all,
St.Ack


On Mon, Dec 27, 2010 at 11:20 AM, Chris Douglas <cd...@apache.org> wrote:
>> On Fri, Dec 24, 2010 at 11:31 AM, Chris Douglas <cd...@apache.org> wrote:
>> Calling it other than Hadoop would only confuse the situation even
>> more; "Trust all your data to fooFS!".  It'd also reeks of HDFS 'fork'
>> (HBase is not yet up for taking on such a burden).
>
> Unless I'm missing something, it is a fork. It's a temporary, friendly
> fork, but it's what the HBase project has been using and supporting
> for months. It hasn't had a label assigned to it, but it's a product
> (a feature with a mostly-shared implementation across other forks, at
> any rate).
>
>> I liked my original reading of your suggestion Chris -- even if it was
>> perhaps not what you intended -- where HBase would host
>> hadoop-0.20-append.  Thats not on?
>
> Your original reading was what I intended. The obstacle to releasing a
> variant of Hadoop from the HBase project is the name. I'd be surprised
> if TLPs were permitted to release under another project's name, even
> if the other endorsed it. If that assumption is not a real constraint,
> then I agree that there's no point in calling it something else. -C
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Chris Douglas <cd...@apache.org>.
> On Fri, Dec 24, 2010 at 11:31 AM, Chris Douglas <cd...@apache.org> wrote:
> Calling it other than Hadoop would only confuse the situation even
> more; "Trust all your data to fooFS!".  It'd also reeks of HDFS 'fork'
> (HBase is not yet up for taking on such a burden).

Unless I'm missing something, it is a fork. It's a temporary, friendly
fork, but it's what the HBase project has been using and supporting
for months. It hasn't had a label assigned to it, but it's a product
(a feature with a mostly-shared implementation across other forks, at
any rate).

> I liked my original reading of your suggestion Chris -- even if it was
> perhaps not what you intended -- where HBase would host
> hadoop-0.20-append.  Thats not on?

Your original reading was what I intended. The obstacle to releasing a
variant of Hadoop from the HBase project is the name. I'd be surprised
if TLPs were permitted to release under another project's name, even
if the other endorsed it. If that assumption is not a real constraint,
then I agree that there's no point in calling it something else. -C

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Stack <st...@duboce.net>.
On Fri, Dec 24, 2010 at 11:31 AM, Chris Douglas <cd...@apache.org> wrote:
> Calling it something other than Hadoop would avoid confusing users
> (and HBase could then release bug fixes, etc. on its own schedule),
> but from how it's been described: this is acknowledging the reality of
> the situation, not proposing something radical.
>

Calling it other than Hadoop would only confuse the situation even
more; "Trust all your data to fooFS!".  It'd also reeks of HDFS 'fork'
(HBase is not yet up for taking on such a burden).

I liked my original reading of your suggestion Chris -- even if it was
perhaps not what you intended -- where HBase would host
hadoop-0.20-append.  Thats not on?

St.Ack
P.S. Arun, I agree.

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Chris Douglas <cd...@apache.org>.
Calling it something other than Hadoop would avoid confusing users
(and HBase could then release bug fixes, etc. on its own schedule),
but from how it's been described: this is acknowledging the reality of
the situation, not proposing something radical.

HBase can be backed by the HBase FooFS and HDFS. If the former can be
retired as a legacy platform that'd be ideal, but Hadoop will have to
earn it. -C

On Fri, Dec 24, 2010 at 11:06 AM, Stack <st...@duboce.net> wrote:
> On Fri, Dec 24, 2010 at 10:57 AM, Chris Douglas <cd...@apache.org> wrote:
>> Does anything go wrong if HBase were to release the 0.20-append branch
>> as its own product?
>>
>
> This is an interesting notion.  We'd host it at hbase.apache.org
> alongside our download?  Would that be OK with others?
> St.Ack
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Stack <st...@duboce.net>.
On Fri, Dec 24, 2010 at 10:57 AM, Chris Douglas <cd...@apache.org> wrote:
> Does anything go wrong if HBase were to release the 0.20-append branch
> as its own product?
>

This is an interesting notion.  We'd host it at hbase.apache.org
alongside our download?  Would that be OK with others?
St.Ack

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Chris Douglas <cd...@apache.org>.
Does anything go wrong if HBase were to release the 0.20-append branch
as its own product?

Even if it were short-lived, it sounds like that would give HBase
users a working append, the HBase project could decide when to retire
that work (and support it concurrently with post-0.20 append), and it
sidesteps the versioning issue. -C

On Thu, Dec 23, 2010 at 11:52 PM, Stack <st...@duboce.net> wrote:
> The intent of the proposed release off the branch-0.20-append was
> never to derail, “hurt”, or distract from the Hadoop 0.22 effort.  The
> HBase crew are up for helping out testing and debugging and the intent
> is to run atop the 0.22 version of append as well as 0.20’s append.  A
> release off the branch-0.20-append branch was more about a ‘stop-gap’,
> see Todd’s explication above, or a ‘fig-leaf’ as Andrew describes it
> while 0.22 is stabilizing.
>
> Suggestions that projects like HBase hibernate until 0.22 don’t help
> (See Ryan’s comments for a sense of why). We can just keep on with
> what we’ve been doing up to this if the feeling is that an append
> release could somehow jeopardize the 0.22 effort.  Its kinda hokey
> having to point users at some random looking branch [1] telling them
> build their own but thankfully this is not their only option.
>
> I’ve enjoyed the healthy back and forth,
>
> St.Ack
> 1. To be clear, 'random looking branch' is fruit of a bunch of
> hardwork by Facebookers and Clouderians.
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Stack <st...@duboce.net>.
The intent of the proposed release off the branch-0.20-append was
never to derail, “hurt”, or distract from the Hadoop 0.22 effort.  The
HBase crew are up for helping out testing and debugging and the intent
is to run atop the 0.22 version of append as well as 0.20’s append.  A
release off the branch-0.20-append branch was more about a ‘stop-gap’,
see Todd’s explication above, or a ‘fig-leaf’ as Andrew describes it
while 0.22 is stabilizing.

Suggestions that projects like HBase hibernate until 0.22 don’t help
(See Ryan’s comments for a sense of why). We can just keep on with
what we’ve been doing up to this if the feeling is that an append
release could somehow jeopardize the 0.22 effort.  Its kinda hokey
having to point users at some random looking branch [1] telling them
build their own but thankfully this is not their only option.

I’ve enjoyed the healthy back and forth,

St.Ack
1. To be clear, 'random looking branch' is fruit of a bunch of
hardwork by Facebookers and Clouderians.

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
After reading through the reasoning on both sides of this issue, I agree
with Ian, Konstantin, and Jakob. Nigel has already volunteered to run the
0.22 release process; let's put our energy there. Stack, the energy you
would have put into the 0.20-append release could help ensure the 0.22
release makes it out in short order. That way HBase will be able to take
advantage of both append (don't lose data) and security (don't give it
away), and we won't derail the Hadoop Core release process, which has
actually been regaining some momentum over the past several months: we got
0.21 out the door! we have a release manager for 0.22!

As Roy points out, the Apache Hadoop release train has already passed 0.20;
for those that require a 0.20-based HDFS with append, there are multiple
places in the open source world to retrieve such bits, including the
0.20-append branch of the HDFS project. If the HBase community requires an
ASF project to release such an artifact, as Roy points out, it can certainly
done as a new project separate from HDFS.

On Thu, Dec 23, 2010 at 4:36 PM, Andrew Purtell <ap...@apache.org> wrote:

> I hope that 22 will be an answer. I think I would be more comfortable with
> that answer if Hadoop Core were not so obviously internally conflicted and
> sclerotic. Potential HBase/Hadoop adopters have confidence in 20 seeing the
> production deployments of it. 21 was to all indications I have seen a dud.
> There is no reasonable basis as of yet to presume 22 will be "kick ass".
>
> I, at least, was hoping that promoting 0.20-append from its de-facto status
> to something official could be a fig leaf for HBase while Hadoop Core gets
> its house in order.
>
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back.
>  - Piet Hein (via Tom White)
>
>
> --- On Thu, 12/23/10, Ryan Rawson <ry...@gmail.com> wrote:
>
> > From: Ryan Rawson <ry...@gmail.com>
> > Subject: Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip
> of branch-0.20-append branch?
> > To: general@hadoop.apache.org
> > Date: Thursday, December 23, 2010, 2:39 PM
> > How does stack volunteering his time
> > to release an existing branch
> > divert resources?
> >
> > Without an ASF release of 0.20-append I will keep having to
> > recommend an external vendor's release of Hadoop.
> >
> >
> > On Thu, Dec 23, 2010 at 2:18 PM, Konstantin Shvachko
> > <sh...@gmail.com>
> > wrote:
> > > I also think building 0.20-append will be a major
> > distraction from moving
> > > 0.22 forward with all the great new features,
> > including the new append
> > > implementation, sitting on the bench because we are
> > delaying the release.
> > > It seems to be beneficial for the entire community to
> > focus on 0.22 rather
> > > than chasing both birds.
> > >
> > > I hear a concern that 0.22 will lack large scale
> > testing as was the case
> > > with 0.21.
> > > I'd like to volunteer to put as many large scale
> > resources, as I can grasp,
> > > into stabilizing of 0.22. Under Nigel's management of
> > course.
> > > This should get us to production quality in 3-6 months
> > rather than
> > > "another 12-15". I also hope it can go even
> > faster/better if others
> > > could join the effort. I see > 100 companies
> > claiming they are powered by
> > > Apache Hadoop.
> > >
> > > I also hope with this effort HBase will be able to
> > start moving to the new
> > > append implementation in the next 2-3 months, which in
> > turn will help 0.22
> > > HDFS
> > > rather than divert resources from it as it would have
> > be with 0.20-append.
> > >
> > > Stack, will this plan will work for HBase survival?
> > >
> > > One other thought. Apache Hadoop community is not in
> > control of external
> > > releases and distributions, but we should not fork our
> > own releases by
> > > introducing
> > > competing apis. If we can keep the dev line relatively
> > straight the external
> > > releases
> > > will follow.
> > >
> > > Thanks,
> > > --Konstantin
> > >
> > >
> > > On Thu, Dec 23, 2010 at 11:40 AM, Ryan Rawson <ry...@gmail.com>
> > wrote:
> > >
> > >> The append solution in 0.22 that you are referring
> > to was supposed to
> > >> be out 13-15 months ago.  Pardon if I look for
> > solutions that deploy 4
> > >> months ago (as the 0.20 append branch did).
> > >>
> > >> Another 12-15 months of delay is not exactly
> > helping HDFS either.
> > >>
> > >> -ryan
> > >>
> > >> On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan
> > <jg...@gmail.com>
> > wrote:
> > >> > It's difficult to support this proposal
> > knowing how much time would be
> > >> > spent preparing an official release,
> > continuing to support it and
> > >> > continuing to two support two separate
> > implementations of append.  I
> > >> > believe that effort would be better spent
> > getting out a kick-ass 22
> > >> > (or, barring that, a *really* kick-ass 23).
> > >> >
> > >> > The Promised Land that we say we're all
> > trying to get to is regular,
> > >> > timely, feature-complete, tested, innovative
> > but stable releases of
> > >> > new versions of Apache Hadoop.  Missing out
> > any one of those criteria
> > >> > discovered will continue (and has continued)
> > the current situation
> > >> > where quasi-official branches and outside
> > distributions fill the void
> > >> > such a release should.  The effort to
> > maintain this offical branch and
> > >> > fix the bugs that will be discovered could be
> > better spent moving us
> > >> > closer to that goal.
> > >> >
> > >> > I'm certainly sympathetic to the difficult
> > position our quagmire has
> > >> > placed HBase into.  However, the current
> > proposal would hurt HDFS to
> > >> > help HBase. The best solution for that
> > project, as well as for HDFS,
> > >> > is to get HDFS back to a healthy release
> > cycle; not prolong or codify
> > >> > the current ad-hoc state of affairs.  Let's
> > stop digging this hole.
> > >> > -jakob
> > >> >
> > >> > On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas
> > <mc...@gmail.com>
> > >> wrote:
> > >> >> [ Sorry if this is be-laboring the
> > obvious ]
> > >> >>
> > >> >> There are two append solutions floating
> > around, and they are
> > >> incompatible
> > >> >> with each other. Thus, the two "branches"
> > will forever remain
> > >> incompatible
> > >> >> with each other, regardless of how they
> > are numbered (0.22,  0.23,
> > >>  0.20.3,
> > >> >> e.t.c.)
> > >> >>
> > >> >> Unless both are merged into one branch,
> > and a switch provided to  "use
> > >> >> HDFS-200 append" or "use 0.22 append", we
> > have effectively split Hadoop
> > >> into
> > >> >> two.
> > >> >>
> > >> >>
> > >> >> On Thu, Dec 23, 2010 at 12:00 AM, Owen
> > O'Malley <om...@apache.org>
> > >> wrote:
> > >> >>
> > >> >>> On Wed, Dec 22, 2010 at 11:07 PM, Roy
> > T. Fielding <fi...@gbiv.com>
> > >> >>> wrote:
> > >> >>>
> > >> >>> > Features are not release version
> > tags.  If there is a security bug
> > >> >>> > found then we would have to
> > release a new version of the append
> > >> >>> > version, and a round of severe
> > trout slapping would result.
> > >> >>> >
> > >> >>>
> > >> >>> Yeah, it isn't a perfect solution and
> > it doesn't scale to a second tag,
> > >> but
> > >> >>> the problem is that this is
> > effectively a release branch between 0.20
> > >> and
> > >> >>> 0.21. Of course I agree that any
> > critical bugs would need to be fixed
> > >> in
> > >> >>> the
> > >> >>> append branch as well as the 0.20 and
> > 0.21 branches.
> > >> >>>
> > >> >>> If you want to stick to pure numbers
> > and we want to leave ourselves a
> > >> way
> > >> >>> to
> > >> >>> bugfix the 0.20 branch without
> > append, we'd could use a version string
> > >> like
> > >> >>> 0.20.100, etc. Not pretty, but it
> > does preserve the numeric ordering
> > >> and
> > >> >>> suggest a version jump.
> > >> >>>
> > >> >>> If I remember right, there were also
> > protocol changes in the append
> > >> branch,
> > >> >>> which was another reason we didn't
> > want to put it directly into the
> > >> 0.20
> > >> >>> branch.
> > >> >>>
> > >> >>> -- Owen
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >
> >
>
>
>
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Andrew Purtell <ap...@apache.org>.
I hope that 22 will be an answer. I think I would be more comfortable with that answer if Hadoop Core were not so obviously internally conflicted and sclerotic. Potential HBase/Hadoop adopters have confidence in 20 seeing the production deployments of it. 21 was to all indications I have seen a dud. There is no reasonable basis as of yet to presume 22 will be "kick ass". 

I, at least, was hoping that promoting 0.20-append from its de-facto status to something official could be a fig leaf for HBase while Hadoop Core gets its house in order.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)


--- On Thu, 12/23/10, Ryan Rawson <ry...@gmail.com> wrote:

> From: Ryan Rawson <ry...@gmail.com>
> Subject: Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?
> To: general@hadoop.apache.org
> Date: Thursday, December 23, 2010, 2:39 PM
> How does stack volunteering his time
> to release an existing branch
> divert resources?
> 
> Without an ASF release of 0.20-append I will keep having to
> recommend an external vendor's release of Hadoop.
> 
> 
> On Thu, Dec 23, 2010 at 2:18 PM, Konstantin Shvachko
> <sh...@gmail.com>
> wrote:
> > I also think building 0.20-append will be a major
> distraction from moving
> > 0.22 forward with all the great new features,
> including the new append
> > implementation, sitting on the bench because we are
> delaying the release.
> > It seems to be beneficial for the entire community to
> focus on 0.22 rather
> > than chasing both birds.
> >
> > I hear a concern that 0.22 will lack large scale
> testing as was the case
> > with 0.21.
> > I'd like to volunteer to put as many large scale
> resources, as I can grasp,
> > into stabilizing of 0.22. Under Nigel's management of
> course.
> > This should get us to production quality in 3-6 months
> rather than
> > "another 12-15". I also hope it can go even
> faster/better if others
> > could join the effort. I see > 100 companies
> claiming they are powered by
> > Apache Hadoop.
> >
> > I also hope with this effort HBase will be able to
> start moving to the new
> > append implementation in the next 2-3 months, which in
> turn will help 0.22
> > HDFS
> > rather than divert resources from it as it would have
> be with 0.20-append.
> >
> > Stack, will this plan will work for HBase survival?
> >
> > One other thought. Apache Hadoop community is not in
> control of external
> > releases and distributions, but we should not fork our
> own releases by
> > introducing
> > competing apis. If we can keep the dev line relatively
> straight the external
> > releases
> > will follow.
> >
> > Thanks,
> > --Konstantin
> >
> >
> > On Thu, Dec 23, 2010 at 11:40 AM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >
> >> The append solution in 0.22 that you are referring
> to was supposed to
> >> be out 13-15 months ago.  Pardon if I look for
> solutions that deploy 4
> >> months ago (as the 0.20 append branch did).
> >>
> >> Another 12-15 months of delay is not exactly
> helping HDFS either.
> >>
> >> -ryan
> >>
> >> On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan
> <jg...@gmail.com>
> wrote:
> >> > It's difficult to support this proposal
> knowing how much time would be
> >> > spent preparing an official release,
> continuing to support it and
> >> > continuing to two support two separate
> implementations of append.  I
> >> > believe that effort would be better spent
> getting out a kick-ass 22
> >> > (or, barring that, a *really* kick-ass 23).
> >> >
> >> > The Promised Land that we say we're all
> trying to get to is regular,
> >> > timely, feature-complete, tested, innovative
> but stable releases of
> >> > new versions of Apache Hadoop.  Missing out
> any one of those criteria
> >> > discovered will continue (and has continued)
> the current situation
> >> > where quasi-official branches and outside
> distributions fill the void
> >> > such a release should.  The effort to
> maintain this offical branch and
> >> > fix the bugs that will be discovered could be
> better spent moving us
> >> > closer to that goal.
> >> >
> >> > I'm certainly sympathetic to the difficult
> position our quagmire has
> >> > placed HBase into.  However, the current
> proposal would hurt HDFS to
> >> > help HBase. The best solution for that
> project, as well as for HDFS,
> >> > is to get HDFS back to a healthy release
> cycle; not prolong or codify
> >> > the current ad-hoc state of affairs.  Let's
> stop digging this hole.
> >> > -jakob
> >> >
> >> > On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas
> <mc...@gmail.com>
> >> wrote:
> >> >> [ Sorry if this is be-laboring the
> obvious ]
> >> >>
> >> >> There are two append solutions floating
> around, and they are
> >> incompatible
> >> >> with each other. Thus, the two "branches"
> will forever remain
> >> incompatible
> >> >> with each other, regardless of how they
> are numbered (0.22,  0.23,
> >>  0.20.3,
> >> >> e.t.c.)
> >> >>
> >> >> Unless both are merged into one branch,
> and a switch provided to  "use
> >> >> HDFS-200 append" or "use 0.22 append", we
> have effectively split Hadoop
> >> into
> >> >> two.
> >> >>
> >> >>
> >> >> On Thu, Dec 23, 2010 at 12:00 AM, Owen
> O'Malley <om...@apache.org>
> >> wrote:
> >> >>
> >> >>> On Wed, Dec 22, 2010 at 11:07 PM, Roy
> T. Fielding <fi...@gbiv.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Features are not release version
> tags.  If there is a security bug
> >> >>> > found then we would have to
> release a new version of the append
> >> >>> > version, and a round of severe
> trout slapping would result.
> >> >>> >
> >> >>>
> >> >>> Yeah, it isn't a perfect solution and
> it doesn't scale to a second tag,
> >> but
> >> >>> the problem is that this is
> effectively a release branch between 0.20
> >> and
> >> >>> 0.21. Of course I agree that any
> critical bugs would need to be fixed
> >> in
> >> >>> the
> >> >>> append branch as well as the 0.20 and
> 0.21 branches.
> >> >>>
> >> >>> If you want to stick to pure numbers
> and we want to leave ourselves a
> >> way
> >> >>> to
> >> >>> bugfix the 0.20 branch without
> append, we'd could use a version string
> >> like
> >> >>> 0.20.100, etc. Not pretty, but it
> does preserve the numeric ordering
> >> and
> >> >>> suggest a version jump.
> >> >>>
> >> >>> If I remember right, there were also
> protocol changes in the append
> >> branch,
> >> >>> which was another reason we didn't
> want to put it directly into the
> >> 0.20
> >> >>> branch.
> >> >>>
> >> >>> -- Owen
> >> >>>
> >> >>
> >> >
> >>
> >
> 


      

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Todd Lipcon <to...@cloudera.com>.
On Thu, Dec 23, 2010 at 3:33 PM, Ian Holsman <ha...@holsman.net> wrote:

> The release is one issue, but ongoing maintenance of it is another, which
> is the point roy raised.
>
> It's a concern if we have a security issue, and who will patch it (and test
> it) going forward.
>

The nice thing is that Hadoop 0.20.x (without security patches) has no
guarantees as to security. So, I can't imagine any security issue that could
possibly exist that would be worth addressing. It doesn't run as root so
root escalation is only possible with a JVM bug, and it's trivially possible
to read anyone's data since 0.20 has no strong authentication.

Thanks
-Todd



>
> ---
> Ian Holsman - 703 879-3128
>
> I saw the angel in the marble and carved until I set him free --
> Michelangelo
>
> On 24/12/2010, at 9:39 AM, Ryan Rawson <ry...@gmail.com> wrote:
>
> > How does stack volunteering his time to release an existing branch
> > divert resources?
> >
> > Without an ASF release of 0.20-append I will keep having to recommend
> > an external vendor's release of Hadoop.
> >
> >
> > On Thu, Dec 23, 2010 at 2:18 PM, Konstantin Shvachko
> > <sh...@gmail.com> wrote:
> >> I also think building 0.20-append will be a major distraction from
> moving
> >> 0.22 forward with all the great new features, including the new append
> >> implementation, sitting on the bench because we are delaying the
> release.
> >> It seems to be beneficial for the entire community to focus on 0.22
> rather
> >> than chasing both birds.
> >>
> >> I hear a concern that 0.22 will lack large scale testing as was the case
> >> with 0.21.
> >> I'd like to volunteer to put as many large scale resources, as I can
> grasp,
> >> into stabilizing of 0.22. Under Nigel's management of course.
> >> This should get us to production quality in 3-6 months rather than
> >> "another 12-15". I also hope it can go even faster/better if others
> >> could join the effort. I see > 100 companies claiming they are powered
> by
> >> Apache Hadoop.
> >>
> >> I also hope with this effort HBase will be able to start moving to the
> new
> >> append implementation in the next 2-3 months, which in turn will help
> 0.22
> >> HDFS
> >> rather than divert resources from it as it would have be with
> 0.20-append.
> >>
> >> Stack, will this plan will work for HBase survival?
> >>
> >> One other thought. Apache Hadoop community is not in control of external
> >> releases and distributions, but we should not fork our own releases by
> >> introducing
> >> competing apis. If we can keep the dev line relatively straight the
> external
> >> releases
> >> will follow.
> >>
> >> Thanks,
> >> --Konstantin
> >>
> >>
> >> On Thu, Dec 23, 2010 at 11:40 AM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >>
> >>> The append solution in 0.22 that you are referring to was supposed to
> >>> be out 13-15 months ago.  Pardon if I look for solutions that deploy 4
> >>> months ago (as the 0.20 append branch did).
> >>>
> >>> Another 12-15 months of delay is not exactly helping HDFS either.
> >>>
> >>> -ryan
> >>>
> >>> On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan <jg...@gmail.com>
> wrote:
> >>>> It's difficult to support this proposal knowing how much time would be
> >>>> spent preparing an official release, continuing to support it and
> >>>> continuing to two support two separate implementations of append.  I
> >>>> believe that effort would be better spent getting out a kick-ass 22
> >>>> (or, barring that, a *really* kick-ass 23).
> >>>>
> >>>> The Promised Land that we say we're all trying to get to is regular,
> >>>> timely, feature-complete, tested, innovative but stable releases of
> >>>> new versions of Apache Hadoop.  Missing out any one of those criteria
> >>>> discovered will continue (and has continued) the current situation
> >>>> where quasi-official branches and outside distributions fill the void
> >>>> such a release should.  The effort to maintain this offical branch and
> >>>> fix the bugs that will be discovered could be better spent moving us
> >>>> closer to that goal.
> >>>>
> >>>> I'm certainly sympathetic to the difficult position our quagmire has
> >>>> placed HBase into.  However, the current proposal would hurt HDFS to
> >>>> help HBase. The best solution for that project, as well as for HDFS,
> >>>> is to get HDFS back to a healthy release cycle; not prolong or codify
> >>>> the current ad-hoc state of affairs.  Let's stop digging this hole.
> >>>> -jakob
> >>>>
> >>>> On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas <mc...@gmail.com>
> >>> wrote:
> >>>>> [ Sorry if this is be-laboring the obvious ]
> >>>>>
> >>>>> There are two append solutions floating around, and they are
> >>> incompatible
> >>>>> with each other. Thus, the two "branches" will forever remain
> >>> incompatible
> >>>>> with each other, regardless of how they are numbered (0.22,  0.23,
> >>>  0.20.3,
> >>>>> e.t.c.)
> >>>>>
> >>>>> Unless both are merged into one branch, and a switch provided to
>  "use
> >>>>> HDFS-200 append" or "use 0.22 append", we have effectively split
> Hadoop
> >>> into
> >>>>> two.
> >>>>>
> >>>>>
> >>>>> On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org>
> >>> wrote:
> >>>>>
> >>>>>> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <
> fielding@gbiv.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Features are not release version tags.  If there is a security bug
> >>>>>>> found then we would have to release a new version of the append
> >>>>>>> version, and a round of severe trout slapping would result.
> >>>>>>>
> >>>>>>
> >>>>>> Yeah, it isn't a perfect solution and it doesn't scale to a second
> tag,
> >>> but
> >>>>>> the problem is that this is effectively a release branch between
> 0.20
> >>> and
> >>>>>> 0.21. Of course I agree that any critical bugs would need to be
> fixed
> >>> in
> >>>>>> the
> >>>>>> append branch as well as the 0.20 and 0.21 branches.
> >>>>>>
> >>>>>> If you want to stick to pure numbers and we want to leave ourselves
> a
> >>> way
> >>>>>> to
> >>>>>> bugfix the 0.20 branch without append, we'd could use a version
> string
> >>> like
> >>>>>> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering
> >>> and
> >>>>>> suggest a version jump.
> >>>>>>
> >>>>>> If I remember right, there were also protocol changes in the append
> >>> branch,
> >>>>>> which was another reason we didn't want to put it directly into the
> >>> 0.20
> >>>>>> branch.
> >>>>>>
> >>>>>> -- Owen
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ian Holsman <ha...@holsman.net>.
The release is one issue, but ongoing maintenance of it is another, which is the point roy raised. 

It's a concern if we have a security issue, and who will patch it (and test it) going forward. 

---
Ian Holsman - 703 879-3128

I saw the angel in the marble and carved until I set him free -- Michelangelo

On 24/12/2010, at 9:39 AM, Ryan Rawson <ry...@gmail.com> wrote:

> How does stack volunteering his time to release an existing branch
> divert resources?
> 
> Without an ASF release of 0.20-append I will keep having to recommend
> an external vendor's release of Hadoop.
> 
> 
> On Thu, Dec 23, 2010 at 2:18 PM, Konstantin Shvachko
> <sh...@gmail.com> wrote:
>> I also think building 0.20-append will be a major distraction from moving
>> 0.22 forward with all the great new features, including the new append
>> implementation, sitting on the bench because we are delaying the release.
>> It seems to be beneficial for the entire community to focus on 0.22 rather
>> than chasing both birds.
>> 
>> I hear a concern that 0.22 will lack large scale testing as was the case
>> with 0.21.
>> I'd like to volunteer to put as many large scale resources, as I can grasp,
>> into stabilizing of 0.22. Under Nigel's management of course.
>> This should get us to production quality in 3-6 months rather than
>> "another 12-15". I also hope it can go even faster/better if others
>> could join the effort. I see > 100 companies claiming they are powered by
>> Apache Hadoop.
>> 
>> I also hope with this effort HBase will be able to start moving to the new
>> append implementation in the next 2-3 months, which in turn will help 0.22
>> HDFS
>> rather than divert resources from it as it would have be with 0.20-append.
>> 
>> Stack, will this plan will work for HBase survival?
>> 
>> One other thought. Apache Hadoop community is not in control of external
>> releases and distributions, but we should not fork our own releases by
>> introducing
>> competing apis. If we can keep the dev line relatively straight the external
>> releases
>> will follow.
>> 
>> Thanks,
>> --Konstantin
>> 
>> 
>> On Thu, Dec 23, 2010 at 11:40 AM, Ryan Rawson <ry...@gmail.com> wrote:
>> 
>>> The append solution in 0.22 that you are referring to was supposed to
>>> be out 13-15 months ago.  Pardon if I look for solutions that deploy 4
>>> months ago (as the 0.20 append branch did).
>>> 
>>> Another 12-15 months of delay is not exactly helping HDFS either.
>>> 
>>> -ryan
>>> 
>>> On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan <jg...@gmail.com> wrote:
>>>> It's difficult to support this proposal knowing how much time would be
>>>> spent preparing an official release, continuing to support it and
>>>> continuing to two support two separate implementations of append.  I
>>>> believe that effort would be better spent getting out a kick-ass 22
>>>> (or, barring that, a *really* kick-ass 23).
>>>> 
>>>> The Promised Land that we say we're all trying to get to is regular,
>>>> timely, feature-complete, tested, innovative but stable releases of
>>>> new versions of Apache Hadoop.  Missing out any one of those criteria
>>>> discovered will continue (and has continued) the current situation
>>>> where quasi-official branches and outside distributions fill the void
>>>> such a release should.  The effort to maintain this offical branch and
>>>> fix the bugs that will be discovered could be better spent moving us
>>>> closer to that goal.
>>>> 
>>>> I'm certainly sympathetic to the difficult position our quagmire has
>>>> placed HBase into.  However, the current proposal would hurt HDFS to
>>>> help HBase. The best solution for that project, as well as for HDFS,
>>>> is to get HDFS back to a healthy release cycle; not prolong or codify
>>>> the current ad-hoc state of affairs.  Let's stop digging this hole.
>>>> -jakob
>>>> 
>>>> On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas <mc...@gmail.com>
>>> wrote:
>>>>> [ Sorry if this is be-laboring the obvious ]
>>>>> 
>>>>> There are two append solutions floating around, and they are
>>> incompatible
>>>>> with each other. Thus, the two "branches" will forever remain
>>> incompatible
>>>>> with each other, regardless of how they are numbered (0.22,  0.23,
>>>  0.20.3,
>>>>> e.t.c.)
>>>>> 
>>>>> Unless both are merged into one branch, and a switch provided to  "use
>>>>> HDFS-200 append" or "use 0.22 append", we have effectively split Hadoop
>>> into
>>>>> two.
>>>>> 
>>>>> 
>>>>> On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org>
>>> wrote:
>>>>> 
>>>>>> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Features are not release version tags.  If there is a security bug
>>>>>>> found then we would have to release a new version of the append
>>>>>>> version, and a round of severe trout slapping would result.
>>>>>>> 
>>>>>> 
>>>>>> Yeah, it isn't a perfect solution and it doesn't scale to a second tag,
>>> but
>>>>>> the problem is that this is effectively a release branch between 0.20
>>> and
>>>>>> 0.21. Of course I agree that any critical bugs would need to be fixed
>>> in
>>>>>> the
>>>>>> append branch as well as the 0.20 and 0.21 branches.
>>>>>> 
>>>>>> If you want to stick to pure numbers and we want to leave ourselves a
>>> way
>>>>>> to
>>>>>> bugfix the 0.20 branch without append, we'd could use a version string
>>> like
>>>>>> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering
>>> and
>>>>>> suggest a version jump.
>>>>>> 
>>>>>> If I remember right, there were also protocol changes in the append
>>> branch,
>>>>>> which was another reason we didn't want to put it directly into the
>>> 0.20
>>>>>> branch.
>>>>>> 
>>>>>> -- Owen
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ryan Rawson <ry...@gmail.com>.
How does stack volunteering his time to release an existing branch
divert resources?

Without an ASF release of 0.20-append I will keep having to recommend
an external vendor's release of Hadoop.


On Thu, Dec 23, 2010 at 2:18 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> I also think building 0.20-append will be a major distraction from moving
> 0.22 forward with all the great new features, including the new append
> implementation, sitting on the bench because we are delaying the release.
> It seems to be beneficial for the entire community to focus on 0.22 rather
> than chasing both birds.
>
> I hear a concern that 0.22 will lack large scale testing as was the case
> with 0.21.
> I'd like to volunteer to put as many large scale resources, as I can grasp,
> into stabilizing of 0.22. Under Nigel's management of course.
> This should get us to production quality in 3-6 months rather than
> "another 12-15". I also hope it can go even faster/better if others
> could join the effort. I see > 100 companies claiming they are powered by
> Apache Hadoop.
>
> I also hope with this effort HBase will be able to start moving to the new
> append implementation in the next 2-3 months, which in turn will help 0.22
> HDFS
> rather than divert resources from it as it would have be with 0.20-append.
>
> Stack, will this plan will work for HBase survival?
>
> One other thought. Apache Hadoop community is not in control of external
> releases and distributions, but we should not fork our own releases by
> introducing
> competing apis. If we can keep the dev line relatively straight the external
> releases
> will follow.
>
> Thanks,
> --Konstantin
>
>
> On Thu, Dec 23, 2010 at 11:40 AM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> The append solution in 0.22 that you are referring to was supposed to
>> be out 13-15 months ago.  Pardon if I look for solutions that deploy 4
>> months ago (as the 0.20 append branch did).
>>
>> Another 12-15 months of delay is not exactly helping HDFS either.
>>
>> -ryan
>>
>> On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan <jg...@gmail.com> wrote:
>> > It's difficult to support this proposal knowing how much time would be
>> > spent preparing an official release, continuing to support it and
>> > continuing to two support two separate implementations of append.  I
>> > believe that effort would be better spent getting out a kick-ass 22
>> > (or, barring that, a *really* kick-ass 23).
>> >
>> > The Promised Land that we say we're all trying to get to is regular,
>> > timely, feature-complete, tested, innovative but stable releases of
>> > new versions of Apache Hadoop.  Missing out any one of those criteria
>> > discovered will continue (and has continued) the current situation
>> > where quasi-official branches and outside distributions fill the void
>> > such a release should.  The effort to maintain this offical branch and
>> > fix the bugs that will be discovered could be better spent moving us
>> > closer to that goal.
>> >
>> > I'm certainly sympathetic to the difficult position our quagmire has
>> > placed HBase into.  However, the current proposal would hurt HDFS to
>> > help HBase. The best solution for that project, as well as for HDFS,
>> > is to get HDFS back to a healthy release cycle; not prolong or codify
>> > the current ad-hoc state of affairs.  Let's stop digging this hole.
>> > -jakob
>> >
>> > On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas <mc...@gmail.com>
>> wrote:
>> >> [ Sorry if this is be-laboring the obvious ]
>> >>
>> >> There are two append solutions floating around, and they are
>> incompatible
>> >> with each other. Thus, the two "branches" will forever remain
>> incompatible
>> >> with each other, regardless of how they are numbered (0.22,  0.23,
>>  0.20.3,
>> >> e.t.c.)
>> >>
>> >> Unless both are merged into one branch, and a switch provided to  "use
>> >> HDFS-200 append" or "use 0.22 append", we have effectively split Hadoop
>> into
>> >> two.
>> >>
>> >>
>> >> On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org>
>> wrote:
>> >>
>> >>> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com>
>> >>> wrote:
>> >>>
>> >>> > Features are not release version tags.  If there is a security bug
>> >>> > found then we would have to release a new version of the append
>> >>> > version, and a round of severe trout slapping would result.
>> >>> >
>> >>>
>> >>> Yeah, it isn't a perfect solution and it doesn't scale to a second tag,
>> but
>> >>> the problem is that this is effectively a release branch between 0.20
>> and
>> >>> 0.21. Of course I agree that any critical bugs would need to be fixed
>> in
>> >>> the
>> >>> append branch as well as the 0.20 and 0.21 branches.
>> >>>
>> >>> If you want to stick to pure numbers and we want to leave ourselves a
>> way
>> >>> to
>> >>> bugfix the 0.20 branch without append, we'd could use a version string
>> like
>> >>> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering
>> and
>> >>> suggest a version jump.
>> >>>
>> >>> If I remember right, there were also protocol changes in the append
>> branch,
>> >>> which was another reason we didn't want to put it directly into the
>> 0.20
>> >>> branch.
>> >>>
>> >>> -- Owen
>> >>>
>> >>
>> >
>>
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Konstantin Boudnik <co...@apache.org>.
On Thu, Dec 23, 2010 at 14:18, Konstantin Shvachko <sh...@gmail.com> wrote:
> I also think building 0.20-append will be a major distraction from moving
> 0.22 forward with all the great new features, including the new append
> implementation, sitting on the bench because we are delaying the release.
> It seems to be beneficial for the entire community to focus on 0.22 rather
> than chasing both birds.
>
> I hear a concern that 0.22 will lack large scale testing as was the case
> with 0.21.
> I'd like to volunteer to put as many large scale resources, as I can grasp,
> into stabilizing of 0.22. Under Nigel's management of course.
> This should get us to production quality in 3-6 months rather than
> "another 12-15". I also hope it can go even faster/better if others
> could join the effort. I see > 100 companies claiming they are powered by
> Apache Hadoop.

On the similar note I's like to emphasize that a significant part of
my time is going to be devoted to building system & scale testing
infrastructure which would usable out of the box by any of those 100+
companies if they are willing to put any effort into testing of 0.22.

Cos

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Konstantin Shvachko <sh...@gmail.com>.
I also think building 0.20-append will be a major distraction from moving
0.22 forward with all the great new features, including the new append
implementation, sitting on the bench because we are delaying the release.
It seems to be beneficial for the entire community to focus on 0.22 rather
than chasing both birds.

I hear a concern that 0.22 will lack large scale testing as was the case
with 0.21.
I'd like to volunteer to put as many large scale resources, as I can grasp,
into stabilizing of 0.22. Under Nigel's management of course.
This should get us to production quality in 3-6 months rather than
"another 12-15". I also hope it can go even faster/better if others
could join the effort. I see > 100 companies claiming they are powered by
Apache Hadoop.

I also hope with this effort HBase will be able to start moving to the new
append implementation in the next 2-3 months, which in turn will help 0.22
HDFS
rather than divert resources from it as it would have be with 0.20-append.

Stack, will this plan will work for HBase survival?

One other thought. Apache Hadoop community is not in control of external
releases and distributions, but we should not fork our own releases by
introducing
competing apis. If we can keep the dev line relatively straight the external
releases
will follow.

Thanks,
--Konstantin


On Thu, Dec 23, 2010 at 11:40 AM, Ryan Rawson <ry...@gmail.com> wrote:

> The append solution in 0.22 that you are referring to was supposed to
> be out 13-15 months ago.  Pardon if I look for solutions that deploy 4
> months ago (as the 0.20 append branch did).
>
> Another 12-15 months of delay is not exactly helping HDFS either.
>
> -ryan
>
> On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan <jg...@gmail.com> wrote:
> > It's difficult to support this proposal knowing how much time would be
> > spent preparing an official release, continuing to support it and
> > continuing to two support two separate implementations of append.  I
> > believe that effort would be better spent getting out a kick-ass 22
> > (or, barring that, a *really* kick-ass 23).
> >
> > The Promised Land that we say we're all trying to get to is regular,
> > timely, feature-complete, tested, innovative but stable releases of
> > new versions of Apache Hadoop.  Missing out any one of those criteria
> > discovered will continue (and has continued) the current situation
> > where quasi-official branches and outside distributions fill the void
> > such a release should.  The effort to maintain this offical branch and
> > fix the bugs that will be discovered could be better spent moving us
> > closer to that goal.
> >
> > I'm certainly sympathetic to the difficult position our quagmire has
> > placed HBase into.  However, the current proposal would hurt HDFS to
> > help HBase. The best solution for that project, as well as for HDFS,
> > is to get HDFS back to a healthy release cycle; not prolong or codify
> > the current ad-hoc state of affairs.  Let's stop digging this hole.
> > -jakob
> >
> > On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas <mc...@gmail.com>
> wrote:
> >> [ Sorry if this is be-laboring the obvious ]
> >>
> >> There are two append solutions floating around, and they are
> incompatible
> >> with each other. Thus, the two "branches" will forever remain
> incompatible
> >> with each other, regardless of how they are numbered (0.22,  0.23,
>  0.20.3,
> >> e.t.c.)
> >>
> >> Unless both are merged into one branch, and a switch provided to  "use
> >> HDFS-200 append" or "use 0.22 append", we have effectively split Hadoop
> into
> >> two.
> >>
> >>
> >> On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org>
> wrote:
> >>
> >>> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com>
> >>> wrote:
> >>>
> >>> > Features are not release version tags.  If there is a security bug
> >>> > found then we would have to release a new version of the append
> >>> > version, and a round of severe trout slapping would result.
> >>> >
> >>>
> >>> Yeah, it isn't a perfect solution and it doesn't scale to a second tag,
> but
> >>> the problem is that this is effectively a release branch between 0.20
> and
> >>> 0.21. Of course I agree that any critical bugs would need to be fixed
> in
> >>> the
> >>> append branch as well as the 0.20 and 0.21 branches.
> >>>
> >>> If you want to stick to pure numbers and we want to leave ourselves a
> way
> >>> to
> >>> bugfix the 0.20 branch without append, we'd could use a version string
> like
> >>> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering
> and
> >>> suggest a version jump.
> >>>
> >>> If I remember right, there were also protocol changes in the append
> branch,
> >>> which was another reason we didn't want to put it directly into the
> 0.20
> >>> branch.
> >>>
> >>> -- Owen
> >>>
> >>
> >
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ryan Rawson <ry...@gmail.com>.
The append solution in 0.22 that you are referring to was supposed to
be out 13-15 months ago.  Pardon if I look for solutions that deploy 4
months ago (as the 0.20 append branch did).

Another 12-15 months of delay is not exactly helping HDFS either.

-ryan

On Thu, Dec 23, 2010 at 9:38 AM, Jakob Homan <jg...@gmail.com> wrote:
> It's difficult to support this proposal knowing how much time would be
> spent preparing an official release, continuing to support it and
> continuing to two support two separate implementations of append.  I
> believe that effort would be better spent getting out a kick-ass 22
> (or, barring that, a *really* kick-ass 23).
>
> The Promised Land that we say we're all trying to get to is regular,
> timely, feature-complete, tested, innovative but stable releases of
> new versions of Apache Hadoop.  Missing out any one of those criteria
> discovered will continue (and has continued) the current situation
> where quasi-official branches and outside distributions fill the void
> such a release should.  The effort to maintain this offical branch and
> fix the bugs that will be discovered could be better spent moving us
> closer to that goal.
>
> I'm certainly sympathetic to the difficult position our quagmire has
> placed HBase into.  However, the current proposal would hurt HDFS to
> help HBase. The best solution for that project, as well as for HDFS,
> is to get HDFS back to a healthy release cycle; not prolong or codify
> the current ad-hoc state of affairs.  Let's stop digging this hole.
> -jakob
>
> On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas <mc...@gmail.com> wrote:
>> [ Sorry if this is be-laboring the obvious ]
>>
>> There are two append solutions floating around, and they are incompatible
>> with each other. Thus, the two "branches" will forever remain incompatible
>> with each other, regardless of how they are numbered (0.22,  0.23,  0.20.3,
>> e.t.c.)
>>
>> Unless both are merged into one branch, and a switch provided to  "use
>> HDFS-200 append" or "use 0.22 append", we have effectively split Hadoop into
>> two.
>>
>>
>> On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org> wrote:
>>
>>> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com>
>>> wrote:
>>>
>>> > Features are not release version tags.  If there is a security bug
>>> > found then we would have to release a new version of the append
>>> > version, and a round of severe trout slapping would result.
>>> >
>>>
>>> Yeah, it isn't a perfect solution and it doesn't scale to a second tag, but
>>> the problem is that this is effectively a release branch between 0.20 and
>>> 0.21. Of course I agree that any critical bugs would need to be fixed in
>>> the
>>> append branch as well as the 0.20 and 0.21 branches.
>>>
>>> If you want to stick to pure numbers and we want to leave ourselves a way
>>> to
>>> bugfix the 0.20 branch without append, we'd could use a version string like
>>> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering and
>>> suggest a version jump.
>>>
>>> If I remember right, there were also protocol changes in the append branch,
>>> which was another reason we didn't want to put it directly into the 0.20
>>> branch.
>>>
>>> -- Owen
>>>
>>
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Jakob Homan <jg...@gmail.com>.
It's difficult to support this proposal knowing how much time would be
spent preparing an official release, continuing to support it and
continuing to two support two separate implementations of append.  I
believe that effort would be better spent getting out a kick-ass 22
(or, barring that, a *really* kick-ass 23).

The Promised Land that we say we're all trying to get to is regular,
timely, feature-complete, tested, innovative but stable releases of
new versions of Apache Hadoop.  Missing out any one of those criteria
discovered will continue (and has continued) the current situation
where quasi-official branches and outside distributions fill the void
such a release should.  The effort to maintain this offical branch and
fix the bugs that will be discovered could be better spent moving us
closer to that goal.

I'm certainly sympathetic to the difficult position our quagmire has
placed HBase into.  However, the current proposal would hurt HDFS to
help HBase. The best solution for that project, as well as for HDFS,
is to get HDFS back to a healthy release cycle; not prolong or codify
the current ad-hoc state of affairs.  Let's stop digging this hole.
-jakob

On Thu, Dec 23, 2010 at 9:33 AM, M. C. Srivas <mc...@gmail.com> wrote:
> [ Sorry if this is be-laboring the obvious ]
>
> There are two append solutions floating around, and they are incompatible
> with each other. Thus, the two "branches" will forever remain incompatible
> with each other, regardless of how they are numbered (0.22,  0.23,  0.20.3,
> e.t.c.)
>
> Unless both are merged into one branch, and a switch provided to  "use
> HDFS-200 append" or "use 0.22 append", we have effectively split Hadoop into
> two.
>
>
> On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org> wrote:
>
>> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com>
>> wrote:
>>
>> > Features are not release version tags.  If there is a security bug
>> > found then we would have to release a new version of the append
>> > version, and a round of severe trout slapping would result.
>> >
>>
>> Yeah, it isn't a perfect solution and it doesn't scale to a second tag, but
>> the problem is that this is effectively a release branch between 0.20 and
>> 0.21. Of course I agree that any critical bugs would need to be fixed in
>> the
>> append branch as well as the 0.20 and 0.21 branches.
>>
>> If you want to stick to pure numbers and we want to leave ourselves a way
>> to
>> bugfix the 0.20 branch without append, we'd could use a version string like
>> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering and
>> suggest a version jump.
>>
>> If I remember right, there were also protocol changes in the append branch,
>> which was another reason we didn't want to put it directly into the 0.20
>> branch.
>>
>> -- Owen
>>
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by "M. C. Srivas" <mc...@gmail.com>.
[ Sorry if this is be-laboring the obvious ]

There are two append solutions floating around, and they are incompatible
with each other. Thus, the two "branches" will forever remain incompatible
with each other, regardless of how they are numbered (0.22,  0.23,  0.20.3,
e.t.c.)

Unless both are merged into one branch, and a switch provided to  "use
HDFS-200 append" or "use 0.22 append", we have effectively split Hadoop into
two.


On Thu, Dec 23, 2010 at 12:00 AM, Owen O'Malley <om...@apache.org> wrote:

> On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com>
> wrote:
>
> > Features are not release version tags.  If there is a security bug
> > found then we would have to release a new version of the append
> > version, and a round of severe trout slapping would result.
> >
>
> Yeah, it isn't a perfect solution and it doesn't scale to a second tag, but
> the problem is that this is effectively a release branch between 0.20 and
> 0.21. Of course I agree that any critical bugs would need to be fixed in
> the
> append branch as well as the 0.20 and 0.21 branches.
>
> If you want to stick to pure numbers and we want to leave ourselves a way
> to
> bugfix the 0.20 branch without append, we'd could use a version string like
> 0.20.100, etc. Not pretty, but it does preserve the numeric ordering and
> suggest a version jump.
>
> If I remember right, there were also protocol changes in the append branch,
> which was another reason we didn't want to put it directly into the 0.20
> branch.
>
> -- Owen
>

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Owen O'Malley <om...@apache.org>.
On Wed, Dec 22, 2010 at 11:07 PM, Roy T. Fielding <fi...@gbiv.com> wrote:

> Features are not release version tags.  If there is a security bug
> found then we would have to release a new version of the append
> version, and a round of severe trout slapping would result.
>

Yeah, it isn't a perfect solution and it doesn't scale to a second tag, but
the problem is that this is effectively a release branch between 0.20 and
0.21. Of course I agree that any critical bugs would need to be fixed in the
append branch as well as the 0.20 and 0.21 branches.

If you want to stick to pure numbers and we want to leave ourselves a way to
bugfix the 0.20 branch without append, we'd could use a version string like
0.20.100, etc. Not pretty, but it does preserve the numeric ordering and
suggest a version jump.

If I remember right, there were also protocol changes in the append branch,
which was another reason we didn't want to put it directly into the 0.20
branch.

-- Owen

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Dec 22, 2010, at 10:24 PM, Ian Holsman wrote:

> In that case, I'm +1 on releasing a 20+append branch, but am nervous on how much effort will be put into testing it. But this option is better than the current apache alternative out there as you and Owen mentioned. 

Features are not release version tags.  If there is a security bug
found then we would have to release a new version of the append
version, and a round of severe trout slapping would result.

If someone builds the source package and there are enough votes to
release it, then the version number is one of 0.22.0 (assuming trunk
hasn't been released yet), 0.23.0 (if 0.22.x is already published),
or 1.0.0.

That is, unless the PMC decides to make it a separate product, in
which case it would be hadoopend-0.20.0 (or something like that)
and will either die a slow and painful death or someone else will
pick it up and fork the project.

....Roy


Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ian Holsman <ha...@holsman.net>.
In that case, I'm +1 on releasing a 20+append branch, but am nervous on how much effort will be put into testing it. But this option is better than the current apache alternative out there as you and Owen mentioned. 

---
Ian Holsman - 703 879-3128

I saw the angel in the marble and carved until I set him free -- Michelangelo

On 23/12/2010, at 5:16 PM, Stack <st...@duboce.net> wrote:

> On Wed, Dec 22, 2010 at 5:03 PM, Ian Holsman <ha...@holsman.net> wrote:>
>>> Are you counting other than Apache releases?  (I see only 4 here, two
>>> of which probably should be removed:
>>> http://www.gtlib.gatech.edu/pub/apache//hadoop/core/.)
>> 
>> 
>> yes.. I was referring to the external companies who have decided to release their own version, for their own business purposes. (please don't take that as a negative).
>> 
> 
> Oh.  I was not counting those at all.
> 
> Currently, over in HBase we tell users build and 'trust' your own
> Hadoop binary from the tip of what to them probably looks like some
> random Hadoop branch OR go get the slick Cloudera pre-builts since
> Cloudera's CDH3s have append/sync.  By offering to release a
> hadoop-0.20.0-append, I was just trying to make some remiss for a
> gaping hole in the Apache Hadoop offering.
> 
> 
>>>> Is there a reason why we couldn't create a hadoop 0.20.3 release that has this patch inside of it, as well as other fixes that have been applied since 0.20.2 (~26 patches)? Would this be too much effort for you to RM?..
>>>> 
>>> ...
>> 
>> I'm open with adding it, as lack of append/sync could be seen as a bug to some. (yes i'm playing with words)
> 
> My guess is that few would see it the way you do.  Append/sync has had
> a long torturous history.  HADOOP-1700, the original append issue, was
> originally opened in August 2007.  There have been two
> implementations.  The one in branch-0.20-append is the 'deprecated'
> implementation; i.e. its not the append that is in Hadoop TRUNK
> (though IIUC the 'deprecated' append runs on the largest 'known' HDFS
> cluster).  At least once, append was part of a release and then pulled
> because it was 'destabilizing'.  It might be hard getting such a
> storied, scarred feature in as a 'bug fix'.  If it did go in, the
> append/sync is of such a reputation that it might sully the current
> good standing hadoop 0.20 branch releases hold.
> 
> That said, I'm cavalier and if others are game, I'd be up for running
> a 0.20.3 release that included it.
> 
>> Thats why I think we should go to 0.22 ASAP and get companies to build their new features on trunk against that.
>> 
> 
> Waiting on 0.22 and its adoption is not going to work for HBase.  The
> HBase project would be long dead if waiting on 0.22 were the only
> option available to us.  In fact we'd be dead already if it wasn't for
> the lifeline thrown us by the folks who hooked us up with
> branch-0.20-append.
> 
> St.Ack

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Stack <st...@duboce.net>.
On Wed, Dec 22, 2010 at 5:03 PM, Ian Holsman <ha...@holsman.net> wrote:>
>> Are you counting other than Apache releases?  (I see only 4 here, two
>> of which probably should be removed:
>> http://www.gtlib.gatech.edu/pub/apache//hadoop/core/.)
>
>
> yes.. I was referring to the external companies who have decided to release their own version, for their own business purposes. (please don't take that as a negative).
>

Oh.  I was not counting those at all.

Currently, over in HBase we tell users build and 'trust' your own
Hadoop binary from the tip of what to them probably looks like some
random Hadoop branch OR go get the slick Cloudera pre-builts since
Cloudera's CDH3s have append/sync.  By offering to release a
hadoop-0.20.0-append, I was just trying to make some remiss for a
gaping hole in the Apache Hadoop offering.


>>> Is there a reason why we couldn't create a hadoop 0.20.3 release that has this patch inside of it, as well as other fixes that have been applied since 0.20.2 (~26 patches)? Would this be too much effort for you to RM?..
>>>
>> ...
>
> I'm open with adding it, as lack of append/sync could be seen as a bug to some. (yes i'm playing with words)

My guess is that few would see it the way you do.  Append/sync has had
a long torturous history.  HADOOP-1700, the original append issue, was
originally opened in August 2007.  There have been two
implementations.  The one in branch-0.20-append is the 'deprecated'
implementation; i.e. its not the append that is in Hadoop TRUNK
(though IIUC the 'deprecated' append runs on the largest 'known' HDFS
cluster).  At least once, append was part of a release and then pulled
because it was 'destabilizing'.  It might be hard getting such a
storied, scarred feature in as a 'bug fix'.  If it did go in, the
append/sync is of such a reputation that it might sully the current
good standing hadoop 0.20 branch releases hold.

That said, I'm cavalier and if others are game, I'd be up for running
a 0.20.3 release that included it.

> Thats why I think we should go to 0.22 ASAP and get companies to build their new features on trunk against that.
>

Waiting on 0.22 and its adoption is not going to work for HBase.  The
HBase project would be long dead if waiting on 0.22 were the only
option available to us.  In fact we'd be dead already if it wasn't for
the lifeline thrown us by the folks who hooked us up with
branch-0.20-append.

St.Ack

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ian Holsman <ha...@holsman.net>.
On Dec 23, 2010, at 12:47 PM, Andrew Purtell wrote:

> I'm on the HBase PMC.
> 
>> We will end up with Apache+Security release vs
>> Apache+Append release vs Apache+Avatar release,
> 
> The current situation is pretty close to this. 

agreed. and I would like to make it better
> 
> HBase has no suitable binary ASF Hadoop release to work against, currently. Vanilla version 0.20 does not have sync/append support. We recommend users adopt Cloudera's CDH3 beta 2, or compile the 0.20-append branch from source. Version 0.21 is marked as unstable, was not tested at scale by Yahoo (unlike 0.20), and has been panned by many would be adopters, if the various tweets and blog posts I have seen in that regard are any indication.
> 

I'd like to make two points here:

1. There is no substitute for your own QA team.

You can never rely on a single company to do your testing for you. While it was great Yahoo tested the initial releases, you can see by their own distribution that what they were/are running is different to what other people are running. 

It is better for people to not blindly trust that just because company X is claiming that the are running something that it will work for them, and we cannot just rely on a individual or single company to provide that service going forward. Communities don't work that way. And reliance on a single company to provide your core infrastructure for gratis isn't really going to end up well either.
 
Saying that, we are very lucky that Yahoo has chosen to openly contribute as much as they have, and I look forward to them and other large installation's contributions and participation going forward.

2. Hadoop is only one piece of the puzzle for most installations.

One of the other issues with 0.21 (and with future releases going forward) is that 3rd parties did not port/upgrade their software to run with our new APIs. Without major software like Hbase, Pig, Hive being able to run on the platform, major installations won't even bother looking at it.

I don't expect people to immediately upgrade to 0.22 when we release it. I expect it will take a good 3-6 months until people have the software they run available on it, and possibly a point release with some of problems people have found in their own testing fixed in our and other software. 

Like I said, I don't mind getting 0.20.3 released with the append/sync patch applied to it (with the other 20 or so patches), but I don't think the Hadoop team is large enough to support all the different releases as-is, let alone another one.


--Ian


>> Thats why I think we should go to 0.22 ASAP and get
>> companies to build their new features on trunk against
>> that.
> 
> If Hadoop 0.22 is not vetted at high scale as was 0.20 -- this is the current situation with 0.21 -- then I fear the current situation will not change and HBase will still to refer would be users to a non-ASF release or a source-only branch. 
> 
> Best regards,
> 
>    - Andy
> 
> Problems worthy of attack prove their worth by hitting back.
>  - Piet Hein (via Tom White)
> 
> 
> --- On Wed, 12/22/10, Ian Holsman <ha...@holsman.net> wrote:
> 
>> From: Ian Holsman <ha...@holsman.net>
>> Subject: Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?
>> To: general@hadoop.apache.org
>> Date: Wednesday, December 22, 2010, 5:03 PM
>> 
>> On Dec 23, 2010, at 11:33 AM, Stack wrote:
>> 
>>> On Wed, Dec 22, 2010 at 4:05 PM, Ian Holsman <ha...@holsman.net>
>> wrote:
>>>> There are already 5 Hadoop 20.x releases out
>> there, I don't think there is a need for another. (personal
>> opinion, not a veto or speaking as the chair)
>>>> 
>>> 
>>> Are you counting other than Apache releases?  (I
>> see only 4 here, two
>>> of which probably should be removed:
>>> http://www.gtlib.gatech.edu/pub/apache//hadoop/core/.)
>> 
>> 
>> yes.. I was referring to the external companies who have
>> decided to release their own version, for their own business
>> purposes. (please don't take that as a negative).
>> 
>>> 
>>>> Is there a reason why we couldn't create a hadoop
>> 0.20.3 release that has this patch inside of it, as well as
>> other fixes that have been applied since 0.20.2 (~26
>> patches)? Would this be too much effort for you to RM?..
>>>> 
>>> 
>>> I'd like that but my sense is the general populace of
>> hadoopers would
>>> think the append/sync suite of patches destabilizing
>> -- append/sync
>>> has a long 'history' in hadoop -- and a violation of
>> the general
>>> principal that bug fixes only are added on a branch.
>> 
>> I'm open with adding it, as lack of append/sync could be
>> seen as a bug to some. (yes i'm playing with words)
>>> 
>>> 
>>>> I really don't want to come to a^h^h^h^hget out of
>> the situation where we have multiple releases of 0.20 each
>> with a unique feature.
>>>> 
>>> 
>>> Sure.  The notion has been broached before up on
>> these lists -- e.g.
>>> there was talk of a 0.20 Apache release that had
>> security in it -- and
>>> at the time folks seemed amenable.
>> 
>> I think that approach encourages groups of
>> individuals/companies to huddle up together to build large
>> features without taking the larger group into account and
>> then 'drop' the feature off and wait for others to thank
>> them & port it to their releases. We then become
>> multiple communities instead of a single one. 
>> 
>> We will end up with Apache+Security release vs
>> Apache+Append release vs Apache+Avatar release, with various
>> bug-fixes sprinkled into each.
>> And I'm not sure which release Pig or Hbase would target to
>> develop against.
>> 
>> Thats why I think we should go to 0.22 ASAP and get
>> companies to build their new features on trunk against
>> that.
>> 
>>> 
>>> Thanks for getting the discussion off the ground,
>>> St.Ack
>> 
>> 
> 
> 
> 


Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Andrew Purtell <ap...@apache.org>.
I'm on the HBase PMC.

> We will end up with Apache+Security release vs
> Apache+Append release vs Apache+Avatar release,

The current situation is pretty close to this. 

HBase has no suitable binary ASF Hadoop release to work against, currently. Vanilla version 0.20 does not have sync/append support. We recommend users adopt Cloudera's CDH3 beta 2, or compile the 0.20-append branch from source. Version 0.21 is marked as unstable, was not tested at scale by Yahoo (unlike 0.20), and has been panned by many would be adopters, if the various tweets and blog posts I have seen in that regard are any indication.

> Thats why I think we should go to 0.22 ASAP and get
> companies to build their new features on trunk against
> that.

If Hadoop 0.22 is not vetted at high scale as was 0.20 -- this is the current situation with 0.21 -- then I fear the current situation will not change and HBase will still to refer would be users to a non-ASF release or a source-only branch. 

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)


--- On Wed, 12/22/10, Ian Holsman <ha...@holsman.net> wrote:

> From: Ian Holsman <ha...@holsman.net>
> Subject: Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?
> To: general@hadoop.apache.org
> Date: Wednesday, December 22, 2010, 5:03 PM
> 
> On Dec 23, 2010, at 11:33 AM, Stack wrote:
> 
> > On Wed, Dec 22, 2010 at 4:05 PM, Ian Holsman <ha...@holsman.net>
> wrote:
> >> There are already 5 Hadoop 20.x releases out
> there, I don't think there is a need for another. (personal
> opinion, not a veto or speaking as the chair)
> >> 
> > 
> > Are you counting other than Apache releases?  (I
> see only 4 here, two
> > of which probably should be removed:
> > http://www.gtlib.gatech.edu/pub/apache//hadoop/core/.)
> 
> 
> yes.. I was referring to the external companies who have
> decided to release their own version, for their own business
> purposes. (please don't take that as a negative).
> 
> > 
> >> Is there a reason why we couldn't create a hadoop
> 0.20.3 release that has this patch inside of it, as well as
> other fixes that have been applied since 0.20.2 (~26
> patches)? Would this be too much effort for you to RM?..
> >> 
> > 
> > I'd like that but my sense is the general populace of
> hadoopers would
> > think the append/sync suite of patches destabilizing
> -- append/sync
> > has a long 'history' in hadoop -- and a violation of
> the general
> > principal that bug fixes only are added on a branch.
> 
> I'm open with adding it, as lack of append/sync could be
> seen as a bug to some. (yes i'm playing with words)
> > 
> > 
> >> I really don't want to come to a^h^h^h^hget out of
> the situation where we have multiple releases of 0.20 each
> with a unique feature.
> >> 
> > 
> > Sure.  The notion has been broached before up on
> these lists -- e.g.
> > there was talk of a 0.20 Apache release that had
> security in it -- and
> > at the time folks seemed amenable.
> 
> I think that approach encourages groups of
> individuals/companies to huddle up together to build large
> features without taking the larger group into account and
> then 'drop' the feature off and wait for others to thank
> them & port it to their releases. We then become
> multiple communities instead of a single one. 
> 
> We will end up with Apache+Security release vs
> Apache+Append release vs Apache+Avatar release, with various
> bug-fixes sprinkled into each.
> And I'm not sure which release Pig or Hbase would target to
> develop against.
> 
> Thats why I think we should go to 0.22 ASAP and get
> companies to build their new features on trunk against
> that.
> 
> > 
> > Thanks for getting the discussion off the ground,
> > St.Ack
> 
> 


      

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ian Holsman <ha...@holsman.net>.
On Dec 23, 2010, at 11:33 AM, Stack wrote:

> On Wed, Dec 22, 2010 at 4:05 PM, Ian Holsman <ha...@holsman.net> wrote:
>> There are already 5 Hadoop 20.x releases out there, I don't think there is a need for another. (personal opinion, not a veto or speaking as the chair)
>> 
> 
> Are you counting other than Apache releases?  (I see only 4 here, two
> of which probably should be removed:
> http://www.gtlib.gatech.edu/pub/apache//hadoop/core/.)


yes.. I was referring to the external companies who have decided to release their own version, for their own business purposes. (please don't take that as a negative).

> 
>> Is there a reason why we couldn't create a hadoop 0.20.3 release that has this patch inside of it, as well as other fixes that have been applied since 0.20.2 (~26 patches)? Would this be too much effort for you to RM?..
>> 
> 
> I'd like that but my sense is the general populace of hadoopers would
> think the append/sync suite of patches destabilizing -- append/sync
> has a long 'history' in hadoop -- and a violation of the general
> principal that bug fixes only are added on a branch.

I'm open with adding it, as lack of append/sync could be seen as a bug to some. (yes i'm playing with words)
> 
> 
>> I really don't want to come to a^h^h^h^hget out of the situation where we have multiple releases of 0.20 each with a unique feature.
>> 
> 
> Sure.  The notion has been broached before up on these lists -- e.g.
> there was talk of a 0.20 Apache release that had security in it -- and
> at the time folks seemed amenable.

I think that approach encourages groups of individuals/companies to huddle up together to build large features without taking the larger group into account and then 'drop' the feature off and wait for others to thank them & port it to their releases. We then become multiple communities instead of a single one. 

We will end up with Apache+Security release vs Apache+Append release vs Apache+Avatar release, with various bug-fixes sprinkled into each.
And I'm not sure which release Pig or Hbase would target to develop against.

Thats why I think we should go to 0.22 ASAP and get companies to build their new features on trunk against that.

> 
> Thanks for getting the discussion off the ground,
> St.Ack


Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Stack <st...@duboce.net>.
On Wed, Dec 22, 2010 at 4:05 PM, Ian Holsman <ha...@holsman.net> wrote:
> There are already 5 Hadoop 20.x releases out there, I don't think there is a need for another. (personal opinion, not a veto or speaking as the chair)
>

Are you counting other than Apache releases?  (I see only 4 here, two
of which probably should be removed:
http://www.gtlib.gatech.edu/pub/apache//hadoop/core/.)

> Is there a reason why we couldn't create a hadoop 0.20.3 release that has this patch inside of it, as well as other fixes that have been applied since 0.20.2 (~26 patches)? Would this be too much effort for you to RM?..
>

I'd like that but my sense is the general populace of hadoopers would
think the append/sync suite of patches destabilizing -- append/sync
has a long 'history' in hadoop -- and a violation of the general
principal that bug fixes only are added on a branch.


> I really don't want to come to a^h^h^h^hget out of the situation where we have multiple releases of 0.20 each with a unique feature.
>

Sure.  The notion has been broached before up on these lists -- e.g.
there was talk of a 0.20 Apache release that had security in it -- and
at the time folks seemed amenable.

Thanks for getting the discussion off the ground,
St.Ack

Re: DISCUSSION: Cut a hadoop-0.20.0-append release from the tip of branch-0.20-append branch?

Posted by Ian Holsman <ha...@holsman.net>.
Hi St.Ack.

In general I'm opposed to such a thing.

There are already 5 Hadoop 20.x releases out there, I don't think there is a need for another. (personal opinion, not a veto or speaking as the chair)

Is there a reason why we couldn't create a hadoop 0.20.3 release that has this patch inside of it, as well as other fixes that have been applied since 0.20.2 (~26 patches)? Would this be too much effort for you to RM?.. 

I understand there is a large QA effort you would be taking on if you do.

does the Append/Sync semantics break/deviate too much from 0.20.2 ?

I really don't want to come to a^h^h^h^hget out of the situation where we have multiple releases of 0.20 each with a unique feature.


On Dec 23, 2010, at 10:30 AM, Stack wrote:

> I propose cutting a release from the tip of the branch-0.20-append
> branch [1].  I suggest the release be called hadoop-0.20.0-append.  I
> volunteer to run the release process. Are folks OK with this?
> 
> Here's some background.
> 
> The branch-0.20-append was forked from branch-0.20 a few months ago by
> Dhruba to add an append/sync to 0.20.x era HDFS.  The added append
> facility is made of the patches attached to HDFS-200 and then a bunch
> of fixup patches done by Dhruba, Hairong, Nicolas, Todd, and others.
> For a complete list of differences from the tip of the Hadoop
> branch-0.20, see the CHANGE.txt file in branch-0.20-append [2].  The
> HDFS-200 append/sync is not the same as the append/sync implementation
> that is in hadoop 0.21.x and hadoop TRUNK.
> 
> The branch-0.20-append is a relatively small deviation from hadoop
> 0.20.x for those who want an append/sync in an (Apache) hadoop 0.20.x
> [3].  Its for those unwilling to upgrade their clusters to hadoop
> 0.21.0 and for those who can't wait on the coming hadoop 0.22.0.  For
> applications like HBase [4], an application that runs on HDFS and
> "loses data" if no working append/sync, its critical that there is an
> Apache release with a working append/sync.
> 
> A few of us have been playing with this branch with a while and it
> seems to do the right thing.  Its fairly close to what FB runs
> internally (correct me if I'm wrong in this last statement Dhruba).
> 
> Thanks,
> St.Ack
> 
> 1. http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/
> 2. http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt?view=markup
> 3. Cloudera's CDH3Beta2/3 already include an append/sync based off the
> HDFS-200++ work.   There is no 'official' Apache hadoop 0.20.x with a
> working append/sync.
> 4. http://hbase.apache.org