You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rita <rm...@gmail.com> on 2011/03/23 15:29:16 UTC

CDH and Hadoop

I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
instead of the standard Hadoop distribution.

What do most people use? Is CDH free? do they provide the tars or does it
provide source code and I simply compile? Can I have some data nodes as CDH
and the rest as regular Hadoop?


I am asking this because so far I noticed a serious bug (IMO) in the
decommissioning process (
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
)




-- 
--- Get your facts first, then you can distort them as you please.--

Re: CDH and Hadoop

Posted by Rita <rm...@gmail.com>.
understood. i will abandon my Y! compilation. Will look into CDH.




On Thu, Mar 24, 2011 at 11:08 PM, suresh srinivas <sr...@gmail.com>wrote:

> On Thu, Mar 24, 2011 at 7:04 PM, Rita <rm...@gmail.com> wrote:
>
> > Oh! Thats for the heads up on that...
> >
> > I guess I will go with the cloudera source then
> >
> >
> > On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch <darose@darose.net
> > >wrote:
> >
> > > They do, but IIRC, they recently announced that they're going to be
> > > discontinuing it.
> > >
> > > DR
> >
>
> Yahoo! discontinued the distribution in favor of making Apache Hadoop the
> most stable and the go to place for Hadoop releases. So all the advantages
> of using Yahoo distribution, you get in Apache Hadoop release.
>
> Please see the details of announcement here:
>
>
> http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: CDH and Hadoop

Posted by suresh srinivas <sr...@gmail.com>.
On Thu, Mar 24, 2011 at 7:04 PM, Rita <rm...@gmail.com> wrote:

> Oh! Thats for the heads up on that...
>
> I guess I will go with the cloudera source then
>
>
> On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch <darose@darose.net
> >wrote:
>
> > They do, but IIRC, they recently announced that they're going to be
> > discontinuing it.
> >
> > DR
>

Yahoo! discontinued the distribution in favor of making Apache Hadoop the
most stable and the go to place for Hadoop releases. So all the advantages
of using Yahoo distribution, you get in Apache Hadoop release.

Please see the details of announcement here:

http://developer.yahoo.com/blogs/hadoop/posts/2011/01/announcement-yahoo-focusing-on-apache-hadoop-discontinuing-the-yahoo-distribution-of-hadoop/

Re: CDH and Hadoop

Posted by Rita <rm...@gmail.com>.
Oh! Thats for the heads up on that...

I guess I will go with the cloudera source then


On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch <da...@darose.net>wrote:

> They do, but IIRC, they recently announced that they're going to be
> discontinuing it.
>
> DR
>
> On Thu, March 24, 2011 8:10 pm, Rita wrote:
> > Thanks everyone for your replies.
> >
> > I knew Cloudera had their release but never knew Y! had one too...
> >
> >
> >
> >
> >
> > On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins <el...@cloudera.com> wrote:
> >
> >> Hey Rita,
> >>
> >> All software developed by Cloudera for CDH is Apache (v2) licensed and
> >> freely available. See these docs [1,2] for more info.
> >>
> >> We publish source packages (which includes the packaging source) and
> >> source tarballs, you can find these at
> >> http://archive.cloudera.com/cdh/3/.  See the CHANGES.txt file (or the
> >> cloudera directory in the tarballs) for the specific patches that have
> >> been applied.
> >>
> >> CDH contains a number of projects (Hadoop, Pig, Hive, HBase, Oozie,
> >> Flume, Sqoop, Whirr, Hue, ZooKeeper, etc). Most have a small handful
> >> of patches applied (often there's only a couple additional patches as
> >> we've rolled an upstream dot release that folded in the delta from the
> >> previous release). The vast majority of the patches to Hadoop come
> >> from the Apache security and append [3, 4] branches. Aside from those
> >> the rest are critical backports and bug fixes. In general, we develop
> >> upstream first.
> >>
> >> Hope this clarifies things.
> >>
> >> Thanks,
> >> Eli
> >>
> >> 1. https://wiki.cloudera.com/display/DOC/Apache+License
> >> 2. https://wiki.cloudera.com/display/DOC/CDH3+Installation+Guide
> >> 3.
> >>
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security
> >> 4.
> >> http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append
> >>
> >>
> >> On Wed, Mar 23, 2011 at 7:29 AM, Rita <rm...@gmail.com> wrote:
> >> > I have been wondering if I should use CDH (
> >> http://www.cloudera.com/hadoop/)
> >> > instead of the standard Hadoop distribution.
> >> >
> >> > What do most people use? Is CDH free? do they provide the tars or does
> >> it
> >> > provide source code and I simply compile? Can I have some data nodes
> >> as
> >> CDH
> >> > and the rest as regular Hadoop?
> >> >
> >> >
> >> > I am asking this because so far I noticed a serious bug (IMO) in the
> >> > decommissioning process (
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
> >> > )
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > --- Get your facts first, then you can distort them as you please.--
> >> >
> >>
> >
> >
> >
>
>
>


-- 
--- Get your facts first, then you can distort them as you please.--

Re: CDH and Hadoop

Posted by David Rosenstrauch <da...@darose.net>.
They do, but IIRC, they recently announced that they're going to be
discontinuing it.

DR

On Thu, March 24, 2011 8:10 pm, Rita wrote:
> Thanks everyone for your replies.
>
> I knew Cloudera had their release but never knew Y! had one too...
>
>
>
>
>
> On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins <el...@cloudera.com> wrote:
>
>> Hey Rita,
>>
>> All software developed by Cloudera for CDH is Apache (v2) licensed and
>> freely available. See these docs [1,2] for more info.
>>
>> We publish source packages (which includes the packaging source) and
>> source tarballs, you can find these at
>> http://archive.cloudera.com/cdh/3/.  See the CHANGES.txt file (or the
>> cloudera directory in the tarballs) for the specific patches that have
>> been applied.
>>
>> CDH contains a number of projects (Hadoop, Pig, Hive, HBase, Oozie,
>> Flume, Sqoop, Whirr, Hue, ZooKeeper, etc). Most have a small handful
>> of patches applied (often there's only a couple additional patches as
>> we've rolled an upstream dot release that folded in the delta from the
>> previous release). The vast majority of the patches to Hadoop come
>> from the Apache security and append [3, 4] branches. Aside from those
>> the rest are critical backports and bug fixes. In general, we develop
>> upstream first.
>>
>> Hope this clarifies things.
>>
>> Thanks,
>> Eli
>>
>> 1. https://wiki.cloudera.com/display/DOC/Apache+License
>> 2. https://wiki.cloudera.com/display/DOC/CDH3+Installation+Guide
>> 3.
>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security
>> 4.
>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append
>>
>>
>> On Wed, Mar 23, 2011 at 7:29 AM, Rita <rm...@gmail.com> wrote:
>> > I have been wondering if I should use CDH (
>> http://www.cloudera.com/hadoop/)
>> > instead of the standard Hadoop distribution.
>> >
>> > What do most people use? Is CDH free? do they provide the tars or does
>> it
>> > provide source code and I simply compile? Can I have some data nodes
>> as
>> CDH
>> > and the rest as regular Hadoop?
>> >
>> >
>> > I am asking this because so far I noticed a serious bug (IMO) in the
>> > decommissioning process (
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
>> > )
>> >
>> >
>> >
>> >
>> > --
>> > --- Get your facts first, then you can distort them as you please.--
>> >
>>
>
>
>



Re: CDH and Hadoop

Posted by Rita <rm...@gmail.com>.
Thanks everyone for your replies.

I knew Cloudera had their release but never knew Y! had one too...





On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins <el...@cloudera.com> wrote:

> Hey Rita,
>
> All software developed by Cloudera for CDH is Apache (v2) licensed and
> freely available. See these docs [1,2] for more info.
>
> We publish source packages (which includes the packaging source) and
> source tarballs, you can find these at
> http://archive.cloudera.com/cdh/3/.  See the CHANGES.txt file (or the
> cloudera directory in the tarballs) for the specific patches that have
> been applied.
>
> CDH contains a number of projects (Hadoop, Pig, Hive, HBase, Oozie,
> Flume, Sqoop, Whirr, Hue, ZooKeeper, etc). Most have a small handful
> of patches applied (often there's only a couple additional patches as
> we've rolled an upstream dot release that folded in the delta from the
> previous release). The vast majority of the patches to Hadoop come
> from the Apache security and append [3, 4] branches. Aside from those
> the rest are critical backports and bug fixes. In general, we develop
> upstream first.
>
> Hope this clarifies things.
>
> Thanks,
> Eli
>
> 1. https://wiki.cloudera.com/display/DOC/Apache+License
> 2. https://wiki.cloudera.com/display/DOC/CDH3+Installation+Guide
> 3.
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security
> 4. http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append
>
>
> On Wed, Mar 23, 2011 at 7:29 AM, Rita <rm...@gmail.com> wrote:
> > I have been wondering if I should use CDH (
> http://www.cloudera.com/hadoop/)
> > instead of the standard Hadoop distribution.
> >
> > What do most people use? Is CDH free? do they provide the tars or does it
> > provide source code and I simply compile? Can I have some data nodes as
> CDH
> > and the rest as regular Hadoop?
> >
> >
> > I am asking this because so far I noticed a serious bug (IMO) in the
> > decommissioning process (
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
> > )
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: CDH and Hadoop

Posted by Eli Collins <el...@cloudera.com>.
Hey Rita,

All software developed by Cloudera for CDH is Apache (v2) licensed and
freely available. See these docs [1,2] for more info.

We publish source packages (which includes the packaging source) and
source tarballs, you can find these at
http://archive.cloudera.com/cdh/3/.  See the CHANGES.txt file (or the
cloudera directory in the tarballs) for the specific patches that have
been applied.

CDH contains a number of projects (Hadoop, Pig, Hive, HBase, Oozie,
Flume, Sqoop, Whirr, Hue, ZooKeeper, etc). Most have a small handful
of patches applied (often there's only a couple additional patches as
we've rolled an upstream dot release that folded in the delta from the
previous release). The vast majority of the patches to Hadoop come
from the Apache security and append [3, 4] branches. Aside from those
the rest are critical backports and bug fixes. In general, we develop
upstream first.

Hope this clarifies things.

Thanks,
Eli

1. https://wiki.cloudera.com/display/DOC/Apache+License
2. https://wiki.cloudera.com/display/DOC/CDH3+Installation+Guide
3. http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security
4. http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append


On Wed, Mar 23, 2011 at 7:29 AM, Rita <rm...@gmail.com> wrote:
> I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
> instead of the standard Hadoop distribution.
>
> What do most people use? Is CDH free? do they provide the tars or does it
> provide source code and I simply compile? Can I have some data nodes as CDH
> and the rest as regular Hadoop?
>
>
> I am asking this because so far I noticed a serious bug (IMO) in the
> decommissioning process (
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
> )
>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>

Re: CDH and Hadoop

Posted by James Seigel <ja...@tynt.com>.
If you are using one of the supported platforms, then it is easy to get up and going fairly quickly as well.

...advice from another seigel/segel

Cheers
james.


On 2011-03-23, at 9:32 AM, Michael Segel wrote:

> 
> Rita,
> 
> It sounds like you're only using Hadoop and have no intentions to really get into the internals.
> 
> I'm like most admins/developers/IT guys and I'm pretty lazy.
> I find it easier to set up the yum repository and then issue the yum install hadoop command. 
> 
> The thing about Cloudera is that they do back port patches so that while their release is 'heavily patched'.
> But they are usually in some sort of sync with the Apache release. Since you're only working with HDFS and its pretty stable, I'd say go with the Cloudera release.
> 
> HTH
> 
> -Mike
> 
> 
> ----------------------------------------
>> Date: Wed, 23 Mar 2011 11:12:30 -0400
>> Subject: Re: CDH and Hadoop
>> From: rmorgan466@gmail.com
>> To: common-user@hadoop.apache.org
>> CC: michael_segel@hotmail.com
>> 
>> Mike,
>> 
>> Thanks. This helps a lot.
>> 
>> At our lab we have close to 60 servers which only run hdfs. I don't need
>> mapreduce and other bells and whistles. We just use hdfs for storing dataset
>> results ranging from 3gb to 90gb.
>> 
>> So, what is the best practice for hdfs? should I always deploy one version
>> before? I understand that Cloudera's version is heavily patched (similar to
>> Redhat Linux kernel versus standard Linux kernel).
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel
>> wrote:
>> 
>>> 
>>> Rita,
>>> 
>>> Short answer...
>>> 
>>> Cloudera's release is free, and they do also offer a support contract if
>>> you want support from them.
>>> Cloudera has sources, but most use yum (redhat/centos) to download an
>>> already built release.
>>> 
>>> Should you use it?
>>> Depends on what you want to do.
>>> 
>>> If your goal is to get up and running with Hadoop and then focus on *using*
>>> Hadoop/HBase/Hive/Pig/etc... then it makes sense.
>>> 
>>> If your goal is to do a deep dive in to Hadoop and get your hands dirty
>>> mucking around with the latest and greatest in trunk? Then no. You're better
>>> off building your own off the official Apache release.
>>> 
>>> Many companies choose Cloudera's release for the following reasons:
>>> * Paid support is available.
>>> * Companies focus on using a tech not developing the tech, so Cloudera does
>>> the heavy lifting while Client Companies focus on 'USING' Hadoop.
>>> * Cloudera's release makes sure that the versions in the release work
>>> together. That is that when you down load CHD3B4, you get a version of
>>> Hadoop that will work with the included version of HBase, Hive, etc ...
>>> 
>>> And no, its never a good idea to try and mix and match Hadoop from
>>> different environments and versions in a cluster.
>>> (I think it will barf on you.)
>>> 
>>> Does that help?
>>> 
>>> -Mike
>>> 
>>> 
>>> ----------------------------------------
>>>> Date: Wed, 23 Mar 2011 10:29:16 -0400
>>>> Subject: CDH and Hadoop
>>>> From: rmorgan466@gmail.com
>>>> To: common-user@hadoop.apache.org
>>>> 
>>>> I have been wondering if I should use CDH (
>>> http://www.cloudera.com/hadoop/)
>>>> instead of the standard Hadoop distribution.
>>>> 
>>>> What do most people use? Is CDH free? do they provide the tars or does it
>>>> provide source code and I simply compile? Can I have some data nodes as
>>> CDH
>>>> and the rest as regular Hadoop?
>>>> 
>>>> 
>>>> I am asking this because so far I noticed a serious bug (IMO) in the
>>>> decommissioning process (
>>>> 
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
>>>> )
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> --- Get your facts first, then you can distort them as you please.--
>>> 
>>> 
>> 
>> 
>> 
>> --
>> --- Get your facts first, then you can distort them as you please.--
> 		 	   		  


Re: CDH and Hadoop

Posted by Steve Loughran <st...@apache.org>.
On 23/03/11 15:32, Michael Segel wrote:
>
> Rita,
>
> It sounds like you're only using Hadoop and have no intentions to really get into the internals.
>
> I'm like most admins/developers/IT guys and I'm pretty lazy.
> I find it easier to set up the yum repository and then issue the yum install hadoop command.
>
> The thing about Cloudera is that they do back port patches so that while their release is 'heavily patched'.
> But they are usually in some sort of sync with the Apache release. Since you're only working with HDFS and its pretty stable, I'd say go with the Cloudera release.

to be fair, the Y! version of 0.20.x has all the backportings to do with 
scale, on a large cluster I'd pick up that one, with the understanding 
that if you have support problems, you can't pay Cloudera to hold your hand.

If you have any plans to get involved in the Hadoop & friends code, to 
move from a user to contributor, you should get with the official 
releases. Similarly, if you have some problem and want to file a bug, 
you should get the latest official release and test with that, as
  -that will be the first question on the bug report "is it still there?"
  -you'll need to help debug it.

Going forward, there are plans to do RPM and ideally deb artifacts of 
0.22 and later versions of Hadoop, making them easier to install. This 
still leaves the question of who supports it, the answers being you, or 
anyone you pay to, that being the way open source works

-steve


RE: CDH and Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
Rita,

It sounds like you're only using Hadoop and have no intentions to really get into the internals.

I'm like most admins/developers/IT guys and I'm pretty lazy.
I find it easier to set up the yum repository and then issue the yum install hadoop command. 

The thing about Cloudera is that they do back port patches so that while their release is 'heavily patched'.
But they are usually in some sort of sync with the Apache release. Since you're only working with HDFS and its pretty stable, I'd say go with the Cloudera release.

HTH

-Mike


----------------------------------------
> Date: Wed, 23 Mar 2011 11:12:30 -0400
> Subject: Re: CDH and Hadoop
> From: rmorgan466@gmail.com
> To: common-user@hadoop.apache.org
> CC: michael_segel@hotmail.com
>
> Mike,
>
> Thanks. This helps a lot.
>
> At our lab we have close to 60 servers which only run hdfs. I don't need
> mapreduce and other bells and whistles. We just use hdfs for storing dataset
> results ranging from 3gb to 90gb.
>
> So, what is the best practice for hdfs? should I always deploy one version
> before? I understand that Cloudera's version is heavily patched (similar to
> Redhat Linux kernel versus standard Linux kernel).
>
>
>
>
>
>
> On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel
> wrote:
>
> >
> > Rita,
> >
> > Short answer...
> >
> > Cloudera's release is free, and they do also offer a support contract if
> > you want support from them.
> > Cloudera has sources, but most use yum (redhat/centos) to download an
> > already built release.
> >
> > Should you use it?
> > Depends on what you want to do.
> >
> > If your goal is to get up and running with Hadoop and then focus on *using*
> > Hadoop/HBase/Hive/Pig/etc... then it makes sense.
> >
> > If your goal is to do a deep dive in to Hadoop and get your hands dirty
> > mucking around with the latest and greatest in trunk? Then no. You're better
> > off building your own off the official Apache release.
> >
> > Many companies choose Cloudera's release for the following reasons:
> > * Paid support is available.
> > * Companies focus on using a tech not developing the tech, so Cloudera does
> > the heavy lifting while Client Companies focus on 'USING' Hadoop.
> > * Cloudera's release makes sure that the versions in the release work
> > together. That is that when you down load CHD3B4, you get a version of
> > Hadoop that will work with the included version of HBase, Hive, etc ...
> >
> > And no, its never a good idea to try and mix and match Hadoop from
> > different environments and versions in a cluster.
> > (I think it will barf on you.)
> >
> > Does that help?
> >
> > -Mike
> >
> >
> > ----------------------------------------
> > > Date: Wed, 23 Mar 2011 10:29:16 -0400
> > > Subject: CDH and Hadoop
> > > From: rmorgan466@gmail.com
> > > To: common-user@hadoop.apache.org
> > >
> > > I have been wondering if I should use CDH (
> > http://www.cloudera.com/hadoop/)
> > > instead of the standard Hadoop distribution.
> > >
> > > What do most people use? Is CDH free? do they provide the tars or does it
> > > provide source code and I simply compile? Can I have some data nodes as
> > CDH
> > > and the rest as regular Hadoop?
> > >
> > >
> > > I am asking this because so far I noticed a serious bug (IMO) in the
> > > decommissioning process (
> > >
> > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
> > > )
> > >
> > >
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> >
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
 		 	   		  

Re: CDH and Hadoop

Posted by Rita <rm...@gmail.com>.
Mike,

Thanks. This helps a lot.

At our lab we have close to 60 servers which only run hdfs. I don't need
mapreduce and other bells and whistles. We just use hdfs for storing dataset
results ranging from 3gb to 90gb.

So, what is the best practice for hdfs? should I always deploy one version
before? I understand that Cloudera's version is heavily patched (similar to
Redhat Linux kernel versus standard Linux kernel).






On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel
<mi...@hotmail.com>wrote:

>
> Rita,
>
> Short answer...
>
> Cloudera's release is free, and they do also offer a support contract if
> you want support from them.
> Cloudera has sources, but most use yum (redhat/centos) to download an
> already built release.
>
> Should you use it?
> Depends on what you want to do.
>
> If your goal is to get up and running with Hadoop and then focus on *using*
> Hadoop/HBase/Hive/Pig/etc... then it makes sense.
>
> If your goal is to do a deep dive in to Hadoop and get your hands dirty
> mucking around with the latest and greatest in trunk? Then no. You're better
> off building your own off the official Apache release.
>
> Many companies choose Cloudera's release for the following reasons:
> * Paid support is available.
> * Companies focus on using a tech not developing the tech, so Cloudera does
> the heavy lifting while Client Companies focus on  'USING' Hadoop.
> * Cloudera's release makes sure that the versions in the release work
> together. That is that when you down load CHD3B4, you get a version of
> Hadoop that will work with the included version of HBase, Hive, etc ...
>
> And no, its never a good idea to try and mix and match Hadoop from
> different environments and versions in a cluster.
> (I think it will barf on you.)
>
> Does that help?
>
> -Mike
>
>
> ----------------------------------------
> > Date: Wed, 23 Mar 2011 10:29:16 -0400
> > Subject: CDH and Hadoop
> > From: rmorgan466@gmail.com
> > To: common-user@hadoop.apache.org
> >
> > I have been wondering if I should use CDH (
> http://www.cloudera.com/hadoop/)
> > instead of the standard Hadoop distribution.
> >
> > What do most people use? Is CDH free? do they provide the tars or does it
> > provide source code and I simply compile? Can I have some data nodes as
> CDH
> > and the rest as regular Hadoop?
> >
> >
> > I am asking this because so far I noticed a serious bug (IMO) in the
> > decommissioning process (
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
> > )
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
>
>



-- 
--- Get your facts first, then you can distort them as you please.--

RE: CDH and Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
Rita,

Short answer...

Cloudera's release is free, and they do also offer a support contract if you want support from them.
Cloudera has sources, but most use yum (redhat/centos) to download an already built release.

Should you use it?
Depends on what you want to do. 

If your goal is to get up and running with Hadoop and then focus on *using* Hadoop/HBase/Hive/Pig/etc... then it makes sense.

If your goal is to do a deep dive in to Hadoop and get your hands dirty mucking around with the latest and greatest in trunk? Then no. You're better off building your own off the official Apache release.

Many companies choose Cloudera's release for the following reasons:
* Paid support is available.
* Companies focus on using a tech not developing the tech, so Cloudera does the heavy lifting while Client Companies focus onĀ  'USING' Hadoop.
* Cloudera's release makes sure that the versions in the release work together. That is that when you down load CHD3B4, you get a version of Hadoop that will work with the included version of HBase, Hive, etc ... 

And no, its never a good idea to try and mix and match Hadoop from different environments and versions in a cluster.
(I think it will barf on you.)

Does that help?

-Mike


----------------------------------------
> Date: Wed, 23 Mar 2011 10:29:16 -0400
> Subject: CDH and Hadoop
> From: rmorgan466@gmail.com
> To: common-user@hadoop.apache.org
>
> I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
> instead of the standard Hadoop distribution.
>
> What do most people use? Is CDH free? do they provide the tars or does it
> provide source code and I simply compile? Can I have some data nodes as CDH
> and the rest as regular Hadoop?
>
>
> I am asking this because so far I noticed a serious bug (IMO) in the
> decommissioning process (
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/%3cAANLkTikPKGt5zw1QGLse+LPzUDP7Mom=Ty_mXfcuoPhq@mail.gmail.com%3e
> )
>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
 		 	   		  

Re: CDH and Hadoop

Posted by Allen Wittenauer <aw...@apache.org>.
On Mar 23, 2011, at 7:29 AM, Rita wrote:

> I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
> instead of the standard Hadoop distribution.
> 
> What do most people use? Is CDH free? do they provide the tars or does it
> provide source code and I simply compile? Can I have some data nodes as CDH
> and the rest as regular Hadoop?

	 I think most of the larger sites are running some form of modified Apache release, in some cases having migrated off of a CDH release.  At LinkedIn, we've been using the Apache 0.20.2 release with 2 patches related to the capacity scheduler for over a year now.  

	In our case, I never deployed CDH, other than a test setup.  I opted not to use CDH in the CDH2 and CDH3 beta time frame due to some patches that I felt were not of a high quality as well as the potential for vendor lock-in.  But I haven't looked at it in probably a year.