You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "real great.." <gr...@gmail.com> on 2011/02/23 04:42:29 UTC

easiest way to install hadoop

Hi,
Very trivial question.
Which is the easiest way to install hadoop?
i mean which distribution should i go for?? apache or cloudera?
n which is the easiest os for hadoop?

-- 
Regards,
R.V.

Re: easiest way to install hadoop

Posted by Sonal Goyal <so...@gmail.com>.
You can also check Apache Whirr.

Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Wed, Feb 23, 2011 at 11:48 AM, Pavan <ya...@gmail.com> wrote:

>
> Yes, Cloudera distribution helps you the most if you are not quite familiar
> with hadoop and its ecosystem. Good documentation. CDH3 Beta 4 was released
> recently. Having said this, if you are looking for evaluation/testing
> purposes, I suggest you try the virtual appliance from cloudera first,
> before you make any final decisions:
> http://cloudera-vm.s3.amazonaws.com/cloudera-demo-0.3.5.tar.bz2?downloads
>
> We are using Red Hat (and thus I assume Cent OS) in PRODUCTION and are
> quite happy. I personally use cloudera in my ubuntu laptop and equally
> happy.
>
> *Pavan Yara*
> ***@yarapavan
> *
>
>
> On Wed, Feb 23, 2011 at 9:20 AM, Nick Jones <da...@gmail.com> wrote:
>
>> I found Cloudera's distribution easy to use, but it's the only thing I
>> tried.
>>
>> Nick
>>
>>
>>
>> On Tue, Feb 22, 2011 at 9:42 PM, real great..
>> <gr...@gmail.com> wrote:
>> > Hi,
>> > Very trivial question.
>> > Which is the easiest way to install hadoop?
>> > i mean which distribution should i go for?? apache or cloudera?
>> > n which is the easiest os for hadoop?
>> >
>> > --
>> > Regards,
>> > R.V.
>> >
>>
>
>

Re: easiest way to install hadoop

Posted by Pavan <ya...@gmail.com>.
Yes, Cloudera distribution helps you the most if you are not quite familiar
with hadoop and its ecosystem. Good documentation. CDH3 Beta 4 was released
recently. Having said this, if you are looking for evaluation/testing
purposes, I suggest you try the virtual appliance from cloudera first,
before you make any final decisions:
http://cloudera-vm.s3.amazonaws.com/cloudera-demo-0.3.5.tar.bz2?downloads

We are using Red Hat (and thus I assume Cent OS) in PRODUCTION and are quite
happy. I personally use cloudera in my ubuntu laptop and equally happy.

*Pavan Yara*
***@yarapavan
*

On Wed, Feb 23, 2011 at 9:20 AM, Nick Jones <da...@gmail.com> wrote:

> I found Cloudera's distribution easy to use, but it's the only thing I
> tried.
>
> Nick
>
>
>
> On Tue, Feb 22, 2011 at 9:42 PM, real great..
> <gr...@gmail.com> wrote:
> > Hi,
> > Very trivial question.
> > Which is the easiest way to install hadoop?
> > i mean which distribution should i go for?? apache or cloudera?
> > n which is the easiest os for hadoop?
> >
> > --
> > Regards,
> > R.V.
> >
>

Re: easiest way to install hadoop

Posted by Nick Jones <da...@gmail.com>.
I found Cloudera's distribution easy to use, but it's the only thing I tried.

Nick



On Tue, Feb 22, 2011 at 9:42 PM, real great..
<gr...@gmail.com> wrote:
> Hi,
> Very trivial question.
> Which is the easiest way to install hadoop?
> i mean which distribution should i go for?? apache or cloudera?
> n which is the easiest os for hadoop?
>
> --
> Regards,
> R.V.
>

Re: easiest way to install hadoop

Posted by Eric Yang <ey...@yahoo-inc.com>.
Hi,

For anyone who is interested in installing Yahoo patched hadoop 0.20.2+ ( branch-0.20-security) in rpm or debian form.  There is a new option.  There are proposals from Owen to standardize Hadoop deployment in https://issues.apache.org/jira/browse/HADOOP-6255.  I created rpm/debian packages from the patch, and it's two rpm/dpkg install commands to get a single node cluster running on RHEL/CentOS 5.5/Ubuntu 10.10.

rpm -i hadoop-0.20.100-1.i386.rpm
rpm -i hadoop-conf-pseudo-0.20.100-1.i386.rpm

If you want to deploy a real cluster, edit configuration in /etc/hadoop without installing conf-pseudo package.

Pig and Zookeeper also have packages prebuilt for them.  HBase and Chukwa packages are also under way.  The test packages are located at:

http://people.apache.org/~eyang/

Hopefully, the patch will be part of Apache, and we can download packages directly from Apache in the next release.
Feedbacks are welcome.

Regards,
Eric

On 2/23/11 5:01 AM, "real great.." <gr...@gmail.com> wrote:

Thanks a lot..

On Wed, Feb 23, 2011 at 3:12 PM, Eric <er...@gmail.com> wrote:
Cloudera offers a nice distribution and decent documentation of how to install.
When you get started, start using the "old", deprecated API as others have pointed out before. It's most complete and most stable for now.

2011/2/23 MONTMORY Alain <al...@thalesgroup.com>

Hi,

For my point of view it is not a trivial question...

The latest "stable release" is 0.20.2 (embedded in cloudera  CH3) (and not 0.21)...
When you start with hadoop recently (end 2010 for me) you are facing "old API" depreceated, so you start with using new API...
But in 0.20.2 not all the new API are available under mapreduce (example MultipleInput is not available), so you try 0.21 version where it is available...
But the 0.21 seems for me not very stable (we are facing a "null pointer exeception" in framework logs without any idea to solve it), so we scope down to 0.20.2 and we are using "Old API".

search " Re: Which version to choose" in the mailing list and follow the advice of Todd Lipton.
The "old API" are not so depreceated, they will be supported for years because there is thousand jobs running on them. The "new API" could be used when a stable release will be up (0.22, 0.23..).

It is the feeback of my personal experience where i lost time trying to use the latest 0.21 version... Since i use cloudera 0.20.2+320 with old API and i don't have any problem (we are also using Cascading to simplify MR writting with very little overhead on performance (6%) versus native hadoop MR jobs. Overall we gain 4,65 factor versus traditionnal RDBMS approach....

Hopes this help you,

regards

Alain

[@@THALES GROUP RESTRICTED@@]


De : real great.. [mailto:greatness.hardness@gmail.com]
Envoyé : mercredi 23 février 2011 04:42
À : mapreduce-user@hadoop.apache.org
Objet : easiest way to install hadoop


Hi,
Very trivial question.
Which is the easiest way to install hadoop?
i mean which distribution should i go for?? apache or cloudera?
n which is the easiest os for hadoop?

Re: easiest way to install hadoop

Posted by "real great.." <gr...@gmail.com>.
Thanks a lot..

On Wed, Feb 23, 2011 at 3:12 PM, Eric <er...@gmail.com> wrote:

> Cloudera offers a nice distribution and decent documentation of how to
> install.
> When you get started, start using the "old", deprecated API as others have
> pointed out before. It's most complete and most stable for now.
>
> 2011/2/23 MONTMORY Alain <al...@thalesgroup.com>
>
>  Hi,
>>
>>
>>
>> For my point of view it is not a trivial question…
>>
>>
>>
>> The latest "stable release" is 0.20.2 (embedded in cloudera  CH3) (and not
>> 0.21)…
>>
>> When you start with hadoop recently (end 2010 for me) you are facing "old
>> API" depreceated, so you start with using new API…
>>
>> But in 0.20.2 not all the new API are available under mapreduce (example
>> MultipleInput is not available), so you try 0.21 version where it is
>> available…
>>
>> But the 0.21 seems for me not very stable (we are facing a "null pointer
>> exeception" in framework logs without any idea to solve it), so we scope
>> down to 0.20.2 and we are using "Old API".
>>
>>
>>
>> search " Re: Which version to choose" in the mailing list and follow the
>> advice of Todd Lipton.
>>
>> The "old API" are not so depreceated, they will be supported for years
>> because there is thousand jobs running on them. The "new API" could be used
>> when a stable release will be up (0.22, 0.23..).
>>
>>
>>
>> It is the feeback of my personal experience where i lost time trying to
>> use the latest 0.21 version… Since i use cloudera 0.20.2+320 with old API
>> and i don't have any problem (we are also using Cascading to simplify MR
>> writting with very little overhead on performance (6%) versus native hadoop
>> MR jobs. Overall we gain 4,65 factor versus traditionnal RDBMS approach….
>>
>>
>>
>> Hopes this help you,
>>
>>
>>
>> regards
>>
>>
>>
>> Alain
>>
>> [@@THALES GROUP RESTRICTED@@]
>>
>>
>>
>> *De :* real great.. [mailto:greatness.hardness@gmail.com]
>> *Envoyé :* mercredi 23 février 2011 04:42
>> *À :* mapreduce-user@hadoop.apache.org
>> *Objet :* easiest way to install hadoop
>>
>>
>>
>> Hi,
>> Very trivial question.
>> Which is the easiest way to install hadoop?
>> i mean which distribution should i go for?? apache or cloudera?
>> n which is the easiest os for hadoop?
>>
>> --
>> Regards,
>> R.V.
>>
>
>


-- 
Regards,
R.V.

Re: easiest way to install hadoop

Posted by Eric <er...@gmail.com>.
Cloudera offers a nice distribution and decent documentation of how to
install.
When you get started, start using the "old", deprecated API as others have
pointed out before. It's most complete and most stable for now.

2011/2/23 MONTMORY Alain <al...@thalesgroup.com>

>  Hi,
>
>
>
> For my point of view it is not a trivial question…
>
>
>
> The latest "stable release" is 0.20.2 (embedded in cloudera  CH3) (and not
> 0.21)…
>
> When you start with hadoop recently (end 2010 for me) you are facing "old
> API" depreceated, so you start with using new API…
>
> But in 0.20.2 not all the new API are available under mapreduce (example
> MultipleInput is not available), so you try 0.21 version where it is
> available…
>
> But the 0.21 seems for me not very stable (we are facing a "null pointer
> exeception" in framework logs without any idea to solve it), so we scope
> down to 0.20.2 and we are using "Old API".
>
>
>
> search " Re: Which version to choose" in the mailing list and follow the
> advice of Todd Lipton.
>
> The "old API" are not so depreceated, they will be supported for years
> because there is thousand jobs running on them. The "new API" could be used
> when a stable release will be up (0.22, 0.23..).
>
>
>
> It is the feeback of my personal experience where i lost time trying to use
> the latest 0.21 version… Since i use cloudera 0.20.2+320 with old API and i
> don't have any problem (we are also using Cascading to simplify MR writting
> with very little overhead on performance (6%) versus native hadoop MR jobs.
> Overall we gain 4,65 factor versus traditionnal RDBMS approach….
>
>
>
> Hopes this help you,
>
>
>
> regards
>
>
>
> Alain
>
> [@@THALES GROUP RESTRICTED@@]
>
>
>
> *De :* real great.. [mailto:greatness.hardness@gmail.com]
> *Envoyé :* mercredi 23 février 2011 04:42
> *À :* mapreduce-user@hadoop.apache.org
> *Objet :* easiest way to install hadoop
>
>
>
> Hi,
> Very trivial question.
> Which is the easiest way to install hadoop?
> i mean which distribution should i go for?? apache or cloudera?
> n which is the easiest os for hadoop?
>
> --
> Regards,
> R.V.
>

RE: easiest way to install hadoop

Posted by MONTMORY Alain <al...@thalesgroup.com>.
Hi,

For my point of view it is not a trivial question...

The latest "stable release" is 0.20.2 (embedded in cloudera  CH3) (and not 0.21)...
When you start with hadoop recently (end 2010 for me) you are facing "old API" depreceated, so you start with using new API...
But in 0.20.2 not all the new API are available under mapreduce (example MultipleInput is not available), so you try 0.21 version where it is available...
But the 0.21 seems for me not very stable (we are facing a "null pointer exeception" in framework logs without any idea to solve it), so we scope down to 0.20.2 and we are using "Old API".

search " Re: Which version to choose" in the mailing list and follow the advice of Todd Lipton.
The "old API" are not so depreceated, they will be supported for years because there is thousand jobs running on them. The "new API" could be used when a stable release will be up (0.22, 0.23..).

It is the feeback of my personal experience where i lost time trying to use the latest 0.21 version... Since i use cloudera 0.20.2+320 with old API and i don't have any problem (we are also using Cascading to simplify MR writting with very little overhead on performance (6%) versus native hadoop MR jobs. Overall we gain 4,65 factor versus traditionnal RDBMS approach....

Hopes this help you,

regards

Alain
[@@THALES GROUP RESTRICTED@@]

De : real great.. [mailto:greatness.hardness@gmail.com]
Envoyé : mercredi 23 février 2011 04:42
À : mapreduce-user@hadoop.apache.org
Objet : easiest way to install hadoop

Hi,
Very trivial question.
Which is the easiest way to install hadoop?
i mean which distribution should i go for?? apache or cloudera?
n which is the easiest os for hadoop?

--
Regards,
R.V.