You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Friso van Vollenhoven <fv...@xebia.com> on 2010/11/22 10:19:34 UTC

which HBase version to use?

Hi list,

I have the opportunity to reinstall a cluster form scratch. I use Hadoop and HBase (not yet any of the other tools, like Pig, Hive, Avro, Thrift, etc.). Now, I wonder what versions to use. CDH3 is nice, because it comes with RPMs out of the box, so the operations people now what to do with it (of course, we can build our own for versions that don't have RPMs). It does appear, however, that CDH is mostly focused on a very good Hadoop / HDFS / MR version and you're better of with the ASF HBase release right now. And there is also the version that SU provides on githug, which has the advantage of being heavily used in a production environment by people who know what they're doing.

Any advise on this anyone?


Thanks,
Friso


Re: which HBase version to use?

Posted by Gary Helmling <gh...@gmail.com>.
Friso,

If you're running the SU HBase on cdh3b3 hadoop, make sure the SU branch
includes the patch for HBASE-3194 or be sure to apply it yourself.  Without
it you'll get compilation errors due to the security changes in cdh3b3.

Gary


On Tue, Nov 23, 2010 at 12:26 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> Hi All,
> Thanks for all the feedback. Because I need a 'works right now' version, I
> am going to go for 0.89-<somtehing> with some patches applied (basically
> SU's version on top of CDH3b3 Hadoop), with a planned upgrade path to CDH3
> when it reaches b4 or final (or any state that I have time for to test on
> our dev boxes).
>
> Thanks
> Friso
>
>
> On 23 nov 2010, at 02:30, Todd Lipcon wrote:
>
> > On Mon, Nov 22, 2010 at 3:44 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> >
> >>> On Mon, 11/22/10, Todd Lipcon <to...@cloudera.com> wrote:
> >>> Once 0.90 is released, we plan on spending a week or two to suss
> >>> out any possible integration issues, and then release CDH3b4
> >>> including 0.90.
> >>
> >> I'm sure that will make everyone happy. :-) Glad to hear the projected
> time
> >> between releases will be short. That was my concern.
> >>
> >>
> > Yes, a reasonably stable HBase 0.90 is one of the primary gating
> functions
> > for the next beta. The qualification of "reasonably stable" is sometimes
> > hard to quantify, but in general my rubric has been to run a small test
> > cluster under heavy load for 24+ hours with a configuration that stresses
> > splits, compactions, and flushes, and occasionally kill -9 one of the
> nodes.
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>

Re: which HBase version to use?

Posted by Friso van Vollenhoven <fv...@xebia.com>.
Hi All,
Thanks for all the feedback. Because I need a 'works right now' version, I am going to go for 0.89-<somtehing> with some patches applied (basically SU's version on top of CDH3b3 Hadoop), with a planned upgrade path to CDH3 when it reaches b4 or final (or any state that I have time for to test on our dev boxes).

Thanks
Friso


On 23 nov 2010, at 02:30, Todd Lipcon wrote:

> On Mon, Nov 22, 2010 at 3:44 PM, Andrew Purtell <ap...@apache.org> wrote:
> 
>>> On Mon, 11/22/10, Todd Lipcon <to...@cloudera.com> wrote:
>>> Once 0.90 is released, we plan on spending a week or two to suss
>>> out any possible integration issues, and then release CDH3b4
>>> including 0.90.
>> 
>> I'm sure that will make everyone happy. :-) Glad to hear the projected time
>> between releases will be short. That was my concern.
>> 
>> 
> Yes, a reasonably stable HBase 0.90 is one of the primary gating functions
> for the next beta. The qualification of "reasonably stable" is sometimes
> hard to quantify, but in general my rubric has been to run a small test
> cluster under heavy load for 24+ hours with a configuration that stresses
> splits, compactions, and flushes, and occasionally kill -9 one of the nodes.
> 
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


Re: which HBase version to use?

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Nov 22, 2010 at 3:44 PM, Andrew Purtell <ap...@apache.org> wrote:

> > On Mon, 11/22/10, Todd Lipcon <to...@cloudera.com> wrote:
> > Once 0.90 is released, we plan on spending a week or two to suss
> > out any possible integration issues, and then release CDH3b4
> > including 0.90.
>
> I'm sure that will make everyone happy. :-) Glad to hear the projected time
> between releases will be short. That was my concern.
>
>
Yes, a reasonably stable HBase 0.90 is one of the primary gating functions
for the next beta. The qualification of "reasonably stable" is sometimes
hard to quantify, but in general my rubric has been to run a small test
cluster under heavy load for 24+ hours with a configuration that stresses
splits, compactions, and flushes, and occasionally kill -9 one of the nodes.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: which HBase version to use?

Posted by Andrew Purtell <ap...@apache.org>.
> On Mon, 11/22/10, Todd Lipcon <to...@cloudera.com> wrote:
> Once 0.90 is released, we plan on spending a week or two to suss
> out any possible integration issues, and then release CDH3b4
> including 0.90.

I'm sure that will make everyone happy. :-) Glad to hear the projected time between releases will be short. That was my concern. 

> Right now I believe the version in CDH3b3 is more stable
> than the 0.90rc. 

Yes I want to clarify that I believe this to be the case as well. 

Best regards,

    - Andy


      

Re: which HBase version to use?

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Nov 22, 2010 at 10:55 AM, Andrew Purtell <ap...@apache.org>wrote:

> Right now you can't get 0.90 from CDH3. It is an 0.89-<mumble>. It will not
> be a better choice than 0.90 once 0.90 is released.
>
> We are looking at deploying CDH3B3 plus a custom RPM built in house that
> updates the CDH3B3 HBase package to 0.90.
>
>
Once 0.90 is released, we plan on spending a week or two to suss out any
possible integration issues, and then release CDH3b4 including 0.90.

Right now I believe the version in CDH3b3 is more stable than the 0.90rc.
This is from the testing I've done in-house with our clusters - assignment
issues cropped up in 0.90 which I haven't seen in 0.89.20100924. The next rc
or 0.90.0 should be better, and then we'll update CDH as well.


> I'm not sure we forgo support for our ops team just because we do that. I
> can't comment as an authority on Cloudera's business of course since I do
> not work from them, but as a customer I expect by the letter of the contract
> we cannot (and would not anyway) ask for support for HBase if using a custom
> version, but e.g. any HDFS related trouble is independent and we would be
> very unhappy if told otherwise.
>

Not going to go into specifics of our contracts on a list like this ;-) But
as an engineer I'm always happy to try to help other engineers if they can
explain their problem well - it increases the quality of the software for
everyone, and really isn't that what we're all here for?

-Todd


>
> --- On Mon, 11/22/10, Michael Segel <mi...@hotmail.com> wrote:
>
> > From: Michael Segel <mi...@hotmail.com>
> > Subject: RE: which HBase version to use?
> > To: user@hbase.apache.org
> > Date: Monday, November 22, 2010, 9:15 AM
> >
> > Friso,
> >
> > I don't disagree with Ryan, however, I think you have to
> > determine which makes the most sense.
> >
> > Going with CDH3 you get the RPMs, and you know that
> > everything in CDH3 works together.
> > Essentially one stop shopping and you can purchase
> > production support.
> >
> > If you're looking at a pure Dev cluster, maybe mixing and
> > matching makes sense.
> >
> > If you are going to go with CDH3, skip b2 and go with b3.
> > HBase is more stable.
> > If you're going to use HBase from ASF, then why not also
> > get everything from ASF?
> >
> > The reason we chose Cloudera (not that I'm giving them a
> > free plug. :-) is that Cloudera sells support which is
> > something my client wanted from the start.
> > If you are going to go w Cloudera, they support HBase in
> > CDH3 and you don't want to use ASF because it could violate
> > your support contract.
> >
> > HTH
> >
> > -Mike
> >
> >
> > > Date: Mon, 22 Nov 2010 01:24:30 -0800
> > > Subject: Re: which HBase version to use?
> > > From: ryanobjc@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > I have to recommend ASF 0.90, which is in release
> > candidate mode right
> > > now.  You'll want to run it on top of CDH3 b2 or
> > b3, but that is up to
> > > you to decide.  The hbase team here at
> > Stumbleupon has more time in on
> > > b2.
> > >
> > > -ryan
> > >
> > > On Mon, Nov 22, 2010 at 1:19 AM, Friso van
> > Vollenhoven
> > > <fv...@xebia.com>
> > wrote:
> > > > Hi list,
> > > >
> > > > I have the opportunity to reinstall a cluster
> > form scratch. I use Hadoop and HBase (not yet any of the
> > other tools, like Pig, Hive, Avro, Thrift, etc.). Now, I
> > wonder what versions to use. CDH3 is nice, because it comes
> > with RPMs out of the box, so the operations people now what
> > to do with it (of course, we can build our own for versions
> > that don't have RPMs). It does appear, however, that CDH is
> > mostly focused on a very good Hadoop / HDFS / MR version and
> > you're better of with the ASF HBase release right now. And
> > there is also the version that SU provides on githug, which
> > has the advantage of being heavily used in a production
> > environment by people who know what they're doing.
> > > >
> > > > Any advise on this anyone?
> > > >
> > > >
> > > > Thanks,
> > > > Friso
> > > >
> > > >
> >
> >
> >
> >
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

RE: which HBase version to use?

Posted by Andrew Purtell <ap...@apache.org>.
Right now you can't get 0.90 from CDH3. It is an 0.89-<mumble>. It will not be a better choice than 0.90 once 0.90 is released. 

We are looking at deploying CDH3B3 plus a custom RPM built in house that updates the CDH3B3 HBase package to 0.90. 

I'm not sure we forgo support for our ops team just because we do that. I can't comment as an authority on Cloudera's business of course since I do not work from them, but as a customer I expect by the letter of the contract we cannot (and would not anyway) ask for support for HBase if using a custom version, but e.g. any HDFS related trouble is independent and we would be very unhappy if told otherwise. 

Best regards,

    - Andy


--- On Mon, 11/22/10, Michael Segel <mi...@hotmail.com> wrote:

> From: Michael Segel <mi...@hotmail.com>
> Subject: RE: which HBase version to use?
> To: user@hbase.apache.org
> Date: Monday, November 22, 2010, 9:15 AM
> 
> Friso,
> 
> I don't disagree with Ryan, however, I think you have to
> determine which makes the most sense.
> 
> Going with CDH3 you get the RPMs, and you know that
> everything in CDH3 works together.
> Essentially one stop shopping and you can purchase
> production support.
> 
> If you're looking at a pure Dev cluster, maybe mixing and
> matching makes sense.
> 
> If you are going to go with CDH3, skip b2 and go with b3.
> HBase is more stable.
> If you're going to use HBase from ASF, then why not also
> get everything from ASF?
> 
> The reason we chose Cloudera (not that I'm giving them a
> free plug. :-) is that Cloudera sells support which is
> something my client wanted from the start.
> If you are going to go w Cloudera, they support HBase in
> CDH3 and you don't want to use ASF because it could violate
> your support contract.
> 
> HTH
> 
> -Mike
> 
> 
> > Date: Mon, 22 Nov 2010 01:24:30 -0800
> > Subject: Re: which HBase version to use?
> > From: ryanobjc@gmail.com
> > To: user@hbase.apache.org
> > 
> > I have to recommend ASF 0.90, which is in release
> candidate mode right
> > now.  You'll want to run it on top of CDH3 b2 or
> b3, but that is up to
> > you to decide.  The hbase team here at
> Stumbleupon has more time in on
> > b2.
> > 
> > -ryan
> > 
> > On Mon, Nov 22, 2010 at 1:19 AM, Friso van
> Vollenhoven
> > <fv...@xebia.com>
> wrote:
> > > Hi list,
> > >
> > > I have the opportunity to reinstall a cluster
> form scratch. I use Hadoop and HBase (not yet any of the
> other tools, like Pig, Hive, Avro, Thrift, etc.). Now, I
> wonder what versions to use. CDH3 is nice, because it comes
> with RPMs out of the box, so the operations people now what
> to do with it (of course, we can build our own for versions
> that don't have RPMs). It does appear, however, that CDH is
> mostly focused on a very good Hadoop / HDFS / MR version and
> you're better of with the ASF HBase release right now. And
> there is also the version that SU provides on githug, which
> has the advantage of being heavily used in a production
> environment by people who know what they're doing.
> > >
> > > Any advise on this anyone?
> > >
> > >
> > > Thanks,
> > > Friso
> > >
> > >
>     
>         
>           
>   


      

RE: which HBase version to use?

Posted by Michael Segel <mi...@hotmail.com>.
Friso,

I don't disagree with Ryan, however, I think you have to determine which makes the most sense.

Going with CDH3 you get the RPMs, and you know that everything in CDH3 works together.
Essentially one stop shopping and you can purchase production support.

If you're looking at a pure Dev cluster, maybe mixing and matching makes sense.

If you are going to go with CDH3, skip b2 and go with b3. HBase is more stable.
If you're going to use HBase from ASF, then why not also get everything from ASF?

The reason we chose Cloudera (not that I'm giving them a free plug. :-) is that Cloudera sells support which is something my client wanted from the start.
If you are going to go w Cloudera, they support HBase in CDH3 and you don't want to use ASF because it could violate your support contract.

HTH

-Mike


> Date: Mon, 22 Nov 2010 01:24:30 -0800
> Subject: Re: which HBase version to use?
> From: ryanobjc@gmail.com
> To: user@hbase.apache.org
> 
> I have to recommend ASF 0.90, which is in release candidate mode right
> now.  You'll want to run it on top of CDH3 b2 or b3, but that is up to
> you to decide.  The hbase team here at Stumbleupon has more time in on
> b2.
> 
> -ryan
> 
> On Mon, Nov 22, 2010 at 1:19 AM, Friso van Vollenhoven
> <fv...@xebia.com> wrote:
> > Hi list,
> >
> > I have the opportunity to reinstall a cluster form scratch. I use Hadoop and HBase (not yet any of the other tools, like Pig, Hive, Avro, Thrift, etc.). Now, I wonder what versions to use. CDH3 is nice, because it comes with RPMs out of the box, so the operations people now what to do with it (of course, we can build our own for versions that don't have RPMs). It does appear, however, that CDH is mostly focused on a very good Hadoop / HDFS / MR version and you're better of with the ASF HBase release right now. And there is also the version that SU provides on githug, which has the advantage of being heavily used in a production environment by people who know what they're doing.
> >
> > Any advise on this anyone?
> >
> >
> > Thanks,
> > Friso
> >
> >
 		 	   		  

Re: which HBase version to use?

Posted by Ryan Rawson <ry...@gmail.com>.
I have to recommend ASF 0.90, which is in release candidate mode right
now.  You'll want to run it on top of CDH3 b2 or b3, but that is up to
you to decide.  The hbase team here at Stumbleupon has more time in on
b2.

-ryan

On Mon, Nov 22, 2010 at 1:19 AM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Hi list,
>
> I have the opportunity to reinstall a cluster form scratch. I use Hadoop and HBase (not yet any of the other tools, like Pig, Hive, Avro, Thrift, etc.). Now, I wonder what versions to use. CDH3 is nice, because it comes with RPMs out of the box, so the operations people now what to do with it (of course, we can build our own for versions that don't have RPMs). It does appear, however, that CDH is mostly focused on a very good Hadoop / HDFS / MR version and you're better of with the ASF HBase release right now. And there is also the version that SU provides on githug, which has the advantage of being heavily used in a production environment by people who know what they're doing.
>
> Any advise on this anyone?
>
>
> Thanks,
> Friso
>
>