You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Praveen Yarlagadda <pr...@gmail.com> on 2009/11/02 21:16:59 UTC

Linux Flavor

Hi all,

I have been running Hadoop on Ubuntu for a while now in distributed mode (4
node cluster). Just playing around with it.
Going forward, I am planning to have more nodes added to the cluster. Just
want to know which linux flavor is the best
to run Hadoop on? Please let me know.

Regards,
Praveen

Re: Linux Flavor

Posted by Steve Loughran <st...@apache.org>.
Tom Wheeler wrote:

> Based on what I've seen on the list, larger installations tend to use
> RedHat Enterprise Linux or one of its clones like CentOS.

One other thing to add is that a large cluster is not the place to learn 
linux or solaris or whatever -it helps to have a working knowledge of 
how to look after an OS before getting 80 machines that need to be kept 
in sync. Then forget every technique you learned to manually keep a 
single OS image up to date and learn about large cluster admin techniques

Re: Linux Flavor

Posted by Tom Wheeler <to...@gmail.com>.
Based on what I've seen on the list, larger installations tend to use
RedHat Enterprise Linux or one of its clones like CentOS.

On Mon, Nov 2, 2009 at 2:16 PM, Praveen Yarlagadda wrote:
> Hi all,
>
> I have been running Hadoop on Ubuntu for a while now in distributed mode (4
> node cluster). Just playing around with it.
> Going forward, I am planning to have more nodes added to the cluster. Just
> want to know which linux flavor is the best
> to run Hadoop on? Please let me know.
>
> Regards,
> Praveen
>



-- 
Tom Wheeler
http://www.tomwheeler.com/

Re: Server types

Posted by John Martyniak <jo...@beforedawnsolutions.com>.
Alex,

Thanks for the info.  (And sorry for the double send, my sent mail  
didn't reflect that this message went out, I was having some network  
trouble last night)

I have a couple of bigger boxes that are currently running VMs, would  
it be reasonable (advisable) to put the Name server on a VM on one of  
the bigger machines (They are 3 TB RAID 5, 24 GB of RAM, and Dual Quad  
Cores)?  That way it would be "dedicated".

I was kind of thinking of that approach for Zoo Keeper (for the HBASE  
part), to put it on a couple of VMs spread across multiple machines to  
get my 3 Zookeeper instances.

So when you say general system management, you mean changing the  
hadoop configs to reflect the capabilities of the boxes?  Or system  
administration OS/Network/Hardware?

-John

On Nov 3, 2009, at 8:29 AM, Alex McLintock wrote:

> 2009/11/2 John Martyniak <jo...@beforedawnsolutions.com>:
>> I am gettin ready to set up a hadoop cluster, starting small but  
>> going to
>> add pretty quickly.  I am planning on running the following on the  
>> cluster,
>> hadoop, hdfs, hbase, nutch, and mahout.
>>
>> So far I have two Dell SC1425 dual processor (2.8 ghz), 4 GB Ram, 2  
>> 1.5 TB
>> Sata drives, on a gigabit switch.
>>
>> So first off does this seem to be a reasonable way to start, I  
>> expect to
>> double or triple the number of servers in the first month.
>
> You may find that the nameserver box needs the most memory, and could
> probably benefit from being RAID'ed. However that sounds fine to me.
>
>
>> Second do the boxes need RAID or will the HDFS take care of the  
>> redundancy?
>
> By default HDFS usually has triple redundancy - data blocks are saved
> on two other machines as well as the original one. This isn't going to
> work with just two machines.
>
> However if you think of each *disk* as a separate Hadoop node then  
> it is ok.
>
>
>> And third as I expand the cluster do all boxes have to be identical  
>> configs?
>
> No they do not have to be identical. Your basic problem is one of
> general system admin management.
>
>> If they are not what problems will I run into?
>
>
> nodes in the cluster will fetch work to be done. The slower boxes will
> take longer, and potentially bring down the time for a whole MapReduce
> run. But that is my understanding. I have not seen it yet.
>
> Goodluck.


Re: Server types

Posted by Alex McLintock <al...@gmail.com>.
2009/11/2 John Martyniak <jo...@beforedawnsolutions.com>:
> I am gettin ready to set up a hadoop cluster, starting small but going to
> add pretty quickly.  I am planning on running the following on the cluster,
> hadoop, hdfs, hbase, nutch, and mahout.
>
> So far I have two Dell SC1425 dual processor (2.8 ghz), 4 GB Ram, 2 1.5 TB
> Sata drives, on a gigabit switch.
>
> So first off does this seem to be a reasonable way to start, I expect to
> double or triple the number of servers in the first month.

You may find that the nameserver box needs the most memory, and could
probably benefit from being RAID'ed. However that sounds fine to me.


> Second do the boxes need RAID or will the HDFS take care of the redundancy?

By default HDFS usually has triple redundancy - data blocks are saved
on two other machines as well as the original one. This isn't going to
work with just two machines.

However if you think of each *disk* as a separate Hadoop node then it is ok.


> And third as I expand the cluster do all boxes have to be identical configs?

No they do not have to be identical. Your basic problem is one of
general system admin management.

>  If they are not what problems will I run into?


nodes in the cluster will fetch work to be done. The slower boxes will
take longer, and potentially bring down the time for a whole MapReduce
run. But that is my understanding. I have not seen it yet.

Goodluck.

Server types

Posted by John Martyniak <jo...@beforedawnsolutions.com>.
I am gettin ready to set up a hadoop cluster, starting small but going  
to add pretty quickly.  I am planning on running the following on the  
cluster, hadoop, hdfs, hbase, nutch, and mahout.

So far I have two Dell SC1425 dual processor (2.8 ghz), 4 GB Ram, 2  
1.5 TB Sata drives, on a gigabit switch.

So first off does this seem to be a reasonable way to start, I expect  
to double or triple the number of servers in the first month.

Second do the boxes need RAID or will the HDFS take care of the  
redundancy?

And third as I expand the cluster do all boxes have to be identical  
configs?  If they are not what problems will I run into?

Thanks in advance for the help.

-John

Re: Linux Flavor

Posted by Steve Loughran <st...@apache.org>.
Todd Lipcon wrote:
> We generally recommend sticking with whatever Linux is already common inside
> your organization. Hadoop itself should run equally well on CentOS 5, RHEL
> 5, or any reasonably recent Ubuntu/Debian. It will probably be OK on any
> other variety of Linux as well (eg SLES), though they are less commonly
> used.
> 
> The reason that RHEL/CentOS is most common for Hadoop is simply that it's
> most common for large production Linux deployments in general, and
> organizations are hesitant to add a new flavor for no benefit.
> 
> Personally I prefer running on reasonably recent Ubuntu/Debian since you get
> some new kernel features like per-process IO accounting (for iotop), etc.

I am busy debugging why my laptop doesn't boot since upgrading to ubuntu 
9.10 at the weekend, they've done some off things to filesystem to 
produce faster boots that I'm not convinced work

Re: Linux Flavor

Posted by Praveen Yarlagadda <pr...@gmail.com>.
Thanks Guys! That's helpful.

On Mon, Nov 2, 2009 at 1:22 PM, Todd Lipcon <to...@cloudera.com> wrote:

> We generally recommend sticking with whatever Linux is already common
> inside
> your organization. Hadoop itself should run equally well on CentOS 5, RHEL
> 5, or any reasonably recent Ubuntu/Debian. It will probably be OK on any
> other variety of Linux as well (eg SLES), though they are less commonly
> used.
>
> The reason that RHEL/CentOS is most common for Hadoop is simply that it's
> most common for large production Linux deployments in general, and
> organizations are hesitant to add a new flavor for no benefit.
>
> Personally I prefer running on reasonably recent Ubuntu/Debian since you
> get
> some new kernel features like per-process IO accounting (for iotop), etc.
>
> -Todd
>
> On Mon, Nov 2, 2009 at 12:16 PM, Praveen Yarlagadda <
> praveen.yarlagadda@gmail.com> wrote:
>
> > Hi all,
> >
> > I have been running Hadoop on Ubuntu for a while now in distributed mode
> (4
> > node cluster). Just playing around with it.
> > Going forward, I am planning to have more nodes added to the cluster.
> Just
> > want to know which linux flavor is the best
> > to run Hadoop on? Please let me know.
> >
> > Regards,
> > Praveen
> >
>



-- 
Regards,
Praveen

Re: Linux Flavor

Posted by Todd Lipcon <to...@cloudera.com>.
We generally recommend sticking with whatever Linux is already common inside
your organization. Hadoop itself should run equally well on CentOS 5, RHEL
5, or any reasonably recent Ubuntu/Debian. It will probably be OK on any
other variety of Linux as well (eg SLES), though they are less commonly
used.

The reason that RHEL/CentOS is most common for Hadoop is simply that it's
most common for large production Linux deployments in general, and
organizations are hesitant to add a new flavor for no benefit.

Personally I prefer running on reasonably recent Ubuntu/Debian since you get
some new kernel features like per-process IO accounting (for iotop), etc.

-Todd

On Mon, Nov 2, 2009 at 12:16 PM, Praveen Yarlagadda <
praveen.yarlagadda@gmail.com> wrote:

> Hi all,
>
> I have been running Hadoop on Ubuntu for a while now in distributed mode (4
> node cluster). Just playing around with it.
> Going forward, I am planning to have more nodes added to the cluster. Just
> want to know which linux flavor is the best
> to run Hadoop on? Please let me know.
>
> Regards,
> Praveen
>

Re: Linux Flavor

Posted by Y G <gy...@gmail.com>.
we use centos 5 with pxe and kickstart to anto install os as well as
necessary lib and software such as gmond ,jdk, hadoop,bcfg2-client。

when manchine is ready,we have a bcfg2 config center machine to snyc all
config file.
-----
天天开心
身体健康

Stephen Leacock<http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html>
- "I detest life-insurance agents: they always argue that I shall some
day
die, which is not so."

2009/11/3 Praveen Yarlagadda <pr...@gmail.com>

> Hi all,
>
> I have been running Hadoop on Ubuntu for a while now in distributed mode (4
> node cluster). Just playing around with it.
> Going forward, I am planning to have more nodes added to the cluster. Just
> want to know which linux flavor is the best
> to run Hadoop on? Please let me know.
>
> Regards,
> Praveen
>