You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Joseph Coleman <jo...@infinitecampus.com> on 2011/02/01 16:50:36 UTC

Hadoop setup question.

Hi all not sure where to ask this question but here it goes. I have been playing with Hadoop for a while now in a test environment before we setup and deploy a productions environment. I am using Hadoop 0.20.0  on Ubuntu 10.04 LTS install on Dell 1950's currently.

My question is what raid should I be using for my data nodes? I haven't come across anything that clearly spells it out I have used raid1 and then EXT4 filesystem but I know this isn't right after further research but not sure what do do. I will be setting up 3 masters in a cluster which I will raid out. And roughly 10 datanodes running hdfs and hbase and a separate zookeeper cluster. Any thoughts or recommendations on the clustering would be much appreciated.

Thanks,
Joe



RE: Hadoop setup question.

Posted by "Buttler, David" <bu...@llnl.gov>.
In addition to what the others have said, I will repeat my standard advice (gleaned from listening to this list for the last year):
If you have 10 nodes or fewer, then you want
1 Master node (namenode, jobtracker, hbase master, zookeeper node)
9 slave nodes(datanode, tasktracker, hbase region server)

If you have more than 10 nodes, then you want 3 or 5 zookeeper nodes. Zookeeper nodes can share hardware with other services as long as they have a dedicated disk (dedicated bandwidth wouldn't hurt, but is probably not necessary). And if you have a lot of nodes, then you want an odd number of zookeeper nodes with one node per rack at minimum.

I don't believe that hbase master takes much resources, so having it share the master node is not a problem.  

I have a 6 node cluster that I share with Solr, and this setup works well. So well that I am having a hard time convincing anyone to get me more hardware.

Dave


-----Original Message-----
From: Joseph Coleman [mailto:joe.coleman@infinitecampus.com] 
Sent: Tuesday, February 01, 2011 7:51 AM
To: hbase-user@hadoop.apache.org
Subject: Hadoop setup question.

Hi all not sure where to ask this question but here it goes. I have been playing with Hadoop for a while now in a test environment before we setup and deploy a productions environment. I am using Hadoop 0.20.0  on Ubuntu 10.04 LTS install on Dell 1950's currently.

My question is what raid should I be using for my data nodes? I haven't come across anything that clearly spells it out I have used raid1 and then EXT4 filesystem but I know this isn't right after further research but not sure what do do. I will be setting up 3 masters in a cluster which I will raid out. And roughly 10 datanodes running hdfs and hbase and a separate zookeeper cluster. Any thoughts or recommendations on the clustering would be much appreciated.

Thanks,
Joe



Re: Hadoop setup question.

Posted by Jeff Whiting <je...@qualtrics.com>.
With our 2950s we setup each hard drive to simply be in its own "raid array."  Basically it is just 
what Ryan said.

~Jeff

On 2/1/2011 2:17 PM, Ryan Rawson wrote:
> We have dell 1950s, I didn't do the setup, but from what I recall...
> basically you have no choice but to use the raid controller.  Think of
> it as a super advanced SATA controller instead.  But the Dell 1950
> raid card did NOT support jbod from what I recalled.  You can raid0 it
> (Stripe only), and maybe you can concatenate the drives, which is not
> ideal.  But without doing some srs internal surgery, which might not
> be possible, no jbod.  Maybe a firmware update adds it back in?
> Presumably dell figures if you are buying a machine with a raid
> controller you'll raid the disks, therefore other modes not supported.
>
> -ryan
>
> On Tue, Feb 1, 2011 at 12:34 PM, Edward Capriolo<ed...@gmail.com>  wrote:
>> On Tue, Feb 1, 2011 at 3:17 PM, Joseph Coleman
>> <jo...@infinitecampus.com>  wrote:
>>> Sorry that was a typo on the amount of Master node although  what is the
>>> limitation of how many masters you can have? Thank you for the feedback on
>>> the JBOD however, I am a little lost on the setup of it. Looking at a Dell
>>> 1950 or 2950 I do not see that as an option in the raid controller setup
>>> nor do I see that as an option when setting up Ubuntu. Is this a hardware
>>> or software option after the fact? Do I just setup raid0 do a ext root vol
>>> and then run a command to convert to JBOD. Sorry for the ignorance this is
>>> just new to me and I want to get it right the first time.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> On 2/1/11 10:06 AM, "Sean Bigdatafun"<se...@gmail.com>  wrote:
>>>
>>>> - hbase-user
>>>>
>>>> No raid should be used, use JBOD instead. I do not think you can setup 3
>>>> master nodes in current Hadoop version, can you explain why you believe
>>>> so?
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> On Tue, Feb 1, 2011 at 7:50 AM, Joseph Coleman<
>>>> joe.coleman@infinitecampus.com>  wrote:
>>>>
>>>>> Hi all not sure where to ask this question but here it goes. I have been
>>>>> playing with Hadoop for a while now in a test environment before we
>>>>> setup
>>>>> and deploy a productions environment. I am using Hadoop 0.20.0  on
>>>>> Ubuntu
>>>>> 10.04 LTS install on Dell 1950's currently.
>>>>>
>>>>> My question is what raid should I be using for my data nodes? I haven't
>>>>> come across anything that clearly spells it out I have used raid1 and
>>>>> then
>>>>> EXT4 filesystem but I know this isn't right after further research but
>>>>> not
>>>>> sure what do do. I will be setting up 3 masters in a cluster which I
>>>>> will
>>>>> raid out. And roughly 10 datanodes running hdfs and hbase and a separate
>>>>> zookeeper cluster. Any thoughts or recommendations on the clustering
>>>>> would
>>>>> be much appreciated.
>>>>>
>>>>> Thanks,
>>>>> Joe
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> --Sean
>>>
>> JBOD in your case means either:
>> 1) Do not use your RAID controller
>> 2) Setup your raid controller with N devices using a 1 to 1 mapping
>> with physical disks
>> /dev/sda ->  disk1
>> /dev/sdb ->  disk2
>> ...
>> Good luck.
>>

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com


Re: Hadoop setup question.

Posted by Ryan Rawson <ry...@gmail.com>.
We have dell 1950s, I didn't do the setup, but from what I recall...
basically you have no choice but to use the raid controller.  Think of
it as a super advanced SATA controller instead.  But the Dell 1950
raid card did NOT support jbod from what I recalled.  You can raid0 it
(Stripe only), and maybe you can concatenate the drives, which is not
ideal.  But without doing some srs internal surgery, which might not
be possible, no jbod.  Maybe a firmware update adds it back in?
Presumably dell figures if you are buying a machine with a raid
controller you'll raid the disks, therefore other modes not supported.

-ryan

On Tue, Feb 1, 2011 at 12:34 PM, Edward Capriolo <ed...@gmail.com> wrote:
> On Tue, Feb 1, 2011 at 3:17 PM, Joseph Coleman
> <jo...@infinitecampus.com> wrote:
>> Sorry that was a typo on the amount of Master node although  what is the
>> limitation of how many masters you can have? Thank you for the feedback on
>> the JBOD however, I am a little lost on the setup of it. Looking at a Dell
>> 1950 or 2950 I do not see that as an option in the raid controller setup
>> nor do I see that as an option when setting up Ubuntu. Is this a hardware
>> or software option after the fact? Do I just setup raid0 do a ext root vol
>> and then run a command to convert to JBOD. Sorry for the ignorance this is
>> just new to me and I want to get it right the first time.
>>
>> Thanks
>>
>>
>>
>>
>> On 2/1/11 10:06 AM, "Sean Bigdatafun" <se...@gmail.com> wrote:
>>
>>>- hbase-user
>>>
>>>No raid should be used, use JBOD instead. I do not think you can setup 3
>>>master nodes in current Hadoop version, can you explain why you believe
>>>so?
>>>
>>>Thanks,
>>>
>>>
>>>On Tue, Feb 1, 2011 at 7:50 AM, Joseph Coleman <
>>>joe.coleman@infinitecampus.com> wrote:
>>>
>>>> Hi all not sure where to ask this question but here it goes. I have been
>>>> playing with Hadoop for a while now in a test environment before we
>>>>setup
>>>> and deploy a productions environment. I am using Hadoop 0.20.0  on
>>>>Ubuntu
>>>> 10.04 LTS install on Dell 1950's currently.
>>>>
>>>> My question is what raid should I be using for my data nodes? I haven't
>>>> come across anything that clearly spells it out I have used raid1 and
>>>>then
>>>> EXT4 filesystem but I know this isn't right after further research but
>>>>not
>>>> sure what do do. I will be setting up 3 masters in a cluster which I
>>>>will
>>>> raid out. And roughly 10 datanodes running hdfs and hbase and a separate
>>>> zookeeper cluster. Any thoughts or recommendations on the clustering
>>>>would
>>>> be much appreciated.
>>>>
>>>> Thanks,
>>>> Joe
>>>>
>>>>
>>>>
>>>
>>>
>>>--
>>>--Sean
>>
>>
>
> JBOD in your case means either:
> 1) Do not use your RAID controller
> 2) Setup your raid controller with N devices using a 1 to 1 mapping
> with physical disks
> /dev/sda -> disk1
> /dev/sdb -> disk2
> ...
> Good luck.
>

Re: Hadoop setup question.

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Feb 1, 2011 at 3:17 PM, Joseph Coleman
<jo...@infinitecampus.com> wrote:
> Sorry that was a typo on the amount of Master node although  what is the
> limitation of how many masters you can have? Thank you for the feedback on
> the JBOD however, I am a little lost on the setup of it. Looking at a Dell
> 1950 or 2950 I do not see that as an option in the raid controller setup
> nor do I see that as an option when setting up Ubuntu. Is this a hardware
> or software option after the fact? Do I just setup raid0 do a ext root vol
> and then run a command to convert to JBOD. Sorry for the ignorance this is
> just new to me and I want to get it right the first time.
>
> Thanks
>
>
>
>
> On 2/1/11 10:06 AM, "Sean Bigdatafun" <se...@gmail.com> wrote:
>
>>- hbase-user
>>
>>No raid should be used, use JBOD instead. I do not think you can setup 3
>>master nodes in current Hadoop version, can you explain why you believe
>>so?
>>
>>Thanks,
>>
>>
>>On Tue, Feb 1, 2011 at 7:50 AM, Joseph Coleman <
>>joe.coleman@infinitecampus.com> wrote:
>>
>>> Hi all not sure where to ask this question but here it goes. I have been
>>> playing with Hadoop for a while now in a test environment before we
>>>setup
>>> and deploy a productions environment. I am using Hadoop 0.20.0  on
>>>Ubuntu
>>> 10.04 LTS install on Dell 1950's currently.
>>>
>>> My question is what raid should I be using for my data nodes? I haven't
>>> come across anything that clearly spells it out I have used raid1 and
>>>then
>>> EXT4 filesystem but I know this isn't right after further research but
>>>not
>>> sure what do do. I will be setting up 3 masters in a cluster which I
>>>will
>>> raid out. And roughly 10 datanodes running hdfs and hbase and a separate
>>> zookeeper cluster. Any thoughts or recommendations on the clustering
>>>would
>>> be much appreciated.
>>>
>>> Thanks,
>>> Joe
>>>
>>>
>>>
>>
>>
>>--
>>--Sean
>
>

JBOD in your case means either:
1) Do not use your RAID controller
2) Setup your raid controller with N devices using a 1 to 1 mapping
with physical disks
/dev/sda -> disk1
/dev/sdb -> disk2
...
Good luck.

Re: Hadoop setup question.

Posted by Joseph Coleman <jo...@infinitecampus.com>.
Sorry that was a typo on the amount of Master node although  what is the
limitation of how many masters you can have? Thank you for the feedback on
the JBOD however, I am a little lost on the setup of it. Looking at a Dell
1950 or 2950 I do not see that as an option in the raid controller setup
nor do I see that as an option when setting up Ubuntu. Is this a hardware
or software option after the fact? Do I just setup raid0 do a ext root vol
and then run a command to convert to JBOD. Sorry for the ignorance this is
just new to me and I want to get it right the first time.

Thanks




On 2/1/11 10:06 AM, "Sean Bigdatafun" <se...@gmail.com> wrote:

>- hbase-user
>
>No raid should be used, use JBOD instead. I do not think you can setup 3
>master nodes in current Hadoop version, can you explain why you believe
>so?
>
>Thanks,
>
>
>On Tue, Feb 1, 2011 at 7:50 AM, Joseph Coleman <
>joe.coleman@infinitecampus.com> wrote:
>
>> Hi all not sure where to ask this question but here it goes. I have been
>> playing with Hadoop for a while now in a test environment before we
>>setup
>> and deploy a productions environment. I am using Hadoop 0.20.0  on
>>Ubuntu
>> 10.04 LTS install on Dell 1950's currently.
>>
>> My question is what raid should I be using for my data nodes? I haven't
>> come across anything that clearly spells it out I have used raid1 and
>>then
>> EXT4 filesystem but I know this isn't right after further research but
>>not
>> sure what do do. I will be setting up 3 masters in a cluster which I
>>will
>> raid out. And roughly 10 datanodes running hdfs and hbase and a separate
>> zookeeper cluster. Any thoughts or recommendations on the clustering
>>would
>> be much appreciated.
>>
>> Thanks,
>> Joe
>>
>>
>>
>
>
>-- 
>--Sean


Re: Hadoop setup question.

Posted by Sean Bigdatafun <se...@gmail.com>.
- hbase-user

No raid should be used, use JBOD instead. I do not think you can setup 3
master nodes in current Hadoop version, can you explain why you believe so?

Thanks,


On Tue, Feb 1, 2011 at 7:50 AM, Joseph Coleman <
joe.coleman@infinitecampus.com> wrote:

> Hi all not sure where to ask this question but here it goes. I have been
> playing with Hadoop for a while now in a test environment before we setup
> and deploy a productions environment. I am using Hadoop 0.20.0  on Ubuntu
> 10.04 LTS install on Dell 1950's currently.
>
> My question is what raid should I be using for my data nodes? I haven't
> come across anything that clearly spells it out I have used raid1 and then
> EXT4 filesystem but I know this isn't right after further research but not
> sure what do do. I will be setting up 3 masters in a cluster which I will
> raid out. And roughly 10 datanodes running hdfs and hbase and a separate
> zookeeper cluster. Any thoughts or recommendations on the clustering would
> be much appreciated.
>
> Thanks,
> Joe
>
>
>


-- 
--Sean