You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chalcy Raja <Ch...@careerbuilder.com> on 2013/01/17 18:24:02 UTC

Just joined the user group and have a question

Hi HBASE Gurus,



I am Chalcy Raja and I joined the hbase group yesterday.  I am already a member of hive and sqoop user groups.  Looking forward to learn and share information about hbase here!



Have a question:  We have a cluster where we run hive jobs and also hbase.  There are stability issues like region servers just die.  We are looking into fine tuning.  When I read about performance and also heard from another user is separate mapreduce from hbase.  How do I do that?  If I understand that as running tasktrackers on some and hbase region servers on some, then we will run into data locality issues and I believe it will perform poorly.



Definitely I am not the only one running into this issue.  Any thoughts on how to resolve this issue?



Thanks,

Chalcy

RE: Just joined the user group and have a question

Posted by Chalcy Raja <Ch...@careerbuilder.com>.
Hi Kevin,

Thanks for the reply.  Currently using 10 mappers and 10 reducers on each node.  With 32 GB memory, allotted 2 GB for hbase heapsize, mapred.map.child.java.opts and reduce.child.java.opts is 1 GB, and therefore having 10 mappers and 10 reducers looks like not a bad idea.

>From what you are saying, I can only use 10 MR(12-2)( meaning 6 mappers and 4 reducers??? Or 8 and 2 or 5 and 5?), is it not very few?

I would also want to test separating MR and hbase by running TT on some and region server on some like I thought.  Also thinking of separating clusters as well.  In that case we can get higher cpu servers for hbase.  

Only after adding hbase to the hadoop cluster, we are seeing stability issues and that is the reason trying to find out the working solution.

Thanks for your time,
Chalcy

-----Original Message-----
From: Kevin O'dell [mailto:kevin.odell@cloudera.com] 
Sent: Thursday, January 17, 2013 12:33 PM
To: user@hbase.apache.org
Subject: Re: Just joined the user group and have a question

Chalcy,

  Glad to have you aboard. One thing to look at is your max map and reduce slots that you are currently allowing. Typically, we look at the CPU architecture and say if it is not HT(hyperthreaded) then it is a 1:1, if it is using HT 1:1.5. Dual quad core without HT you would be able to use 8 total MR slots, but since you have HBase you should give your self a couple slots. This means only using 6 MR slots. Dual quad core with HT you would have 16 logical cores, you could use 12 MR slots, but since you have HBase you want to leave a couple cores. This means only using 9 or 10 slots for MR. This can help with some of the pressure from using MR/hive/pig on the same cluster.

  As for separating MR and HBase. You could break down your processes so that TT run on some nodes and RS run on others, but typically people will setup two separate clusters.


On Thu, Jan 17, 2013 at 12:24 PM, Chalcy Raja <Chalcy.Raja@careerbuilder.com
> wrote:

> Hi HBASE Gurus,
>
>
>
> I am Chalcy Raja and I joined the hbase group yesterday.  I am already 
> a member of hive and sqoop user groups.  Looking forward to learn and 
> share information about hbase here!
>
>
>
> Have a question:  We have a cluster where we run hive jobs and also hbase.
>  There are stability issues like region servers just die.  We are 
> looking into fine tuning.  When I read about performance and also 
> heard from another user is separate mapreduce from hbase.  How do I do 
> that?  If I understand that as running tasktrackers on some and hbase 
> region servers on some, then we will run into data locality issues and 
> I believe it will perform poorly.
>
>
>
> Definitely I am not the only one running into this issue.  Any 
> thoughts on how to resolve this issue?
>
>
>
> Thanks,
>
> Chalcy
>



--
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Just joined the user group and have a question

Posted by Kevin O'dell <ke...@cloudera.com>.
Chalcy,

  Glad to have you aboard. One thing to look at is your max map and reduce
slots that you are currently allowing. Typically, we look at the CPU
architecture and say if it is not HT(hyperthreaded) then it is a 1:1, if it
is using HT 1:1.5. Dual quad core without HT you would be able to use 8
total MR slots, but since you have HBase you should give your self a couple
slots. This means only using 6 MR slots. Dual quad core with HT you would
have 16 logical cores, you could use 12 MR slots, but since you have HBase
you want to leave a couple cores. This means only using 9 or 10 slots for
MR. This can help with some of the pressure from using MR/hive/pig on the
same cluster.

  As for separating MR and HBase. You could break down your processes so
that TT run on some nodes and RS run on others, but typically people will
setup two separate clusters.


On Thu, Jan 17, 2013 at 12:24 PM, Chalcy Raja <Chalcy.Raja@careerbuilder.com
> wrote:

> Hi HBASE Gurus,
>
>
>
> I am Chalcy Raja and I joined the hbase group yesterday.  I am already a
> member of hive and sqoop user groups.  Looking forward to learn and share
> information about hbase here!
>
>
>
> Have a question:  We have a cluster where we run hive jobs and also hbase.
>  There are stability issues like region servers just die.  We are looking
> into fine tuning.  When I read about performance and also heard from
> another user is separate mapreduce from hbase.  How do I do that?  If I
> understand that as running tasktrackers on some and hbase region servers on
> some, then we will run into data locality issues and I believe it will
> perform poorly.
>
>
>
> Definitely I am not the only one running into this issue.  Any thoughts on
> how to resolve this issue?
>
>
>
> Thanks,
>
> Chalcy
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

RE: Just joined the user group and have a question

Posted by Chalcy Raja <Ch...@careerbuilder.com>.
Thanks! Doug.  I am not absolutely new to hbase.  Like in Kevin's email, because of mapred job (hive) contention, hbase regionservers die and whole hbase go down.

I understand that we have to somehow logically or physically separate the clusters.

--Chalcy

-----Original Message-----
From: Doug Meil [mailto:doug.meil@explorysmedical.com] 
Sent: Thursday, January 17, 2013 12:35 PM
To: user@hbase.apache.org
Subject: Re: Just joined the user group and have a question

Hi there-

If you're absolutely new to Hbase, you might want to check out the Hbase refGuide in the architecture, performance, and troubleshooting chapters first.

http://hbase.apache.org/book.html

In terms of determining why your region servers "just die", I think you need to read the background information then provide more information on your cluster and what you're trying to do because although there are a lot of people on this dist-list that want to help, you're not giving folks a whole lot to go on.




On 1/17/13 12:24 PM, "Chalcy Raja" <Ch...@careerbuilder.com> wrote:

>Hi HBASE Gurus,
>
>
>
>I am Chalcy Raja and I joined the hbase group yesterday.  I am already 
>a member of hive and sqoop user groups.  Looking forward to learn and 
>share information about hbase here!
>
>
>
>Have a question:  We have a cluster where we run hive jobs and also 
>hbase.  There are stability issues like region servers just die.  We 
>are looking into fine tuning.  When I read about performance and also 
>heard from another user is separate mapreduce from hbase.  How do I do that?
>If I understand that as running tasktrackers on some and hbase region 
>servers on some, then we will run into data locality issues and I 
>believe it will perform poorly.
>
>
>
>Definitely I am not the only one running into this issue.  Any thoughts 
>on how to resolve this issue?
>
>
>
>Thanks,
>
>Chalcy




RE: Just joined the user group and have a question

Posted by Chalcy Raja <Ch...@careerbuilder.com>.
Thank you, Anil, for your reply.  I am beginning to get the feeling, that may be we should not push both in the same cluster.  In three replies, I get that same info from 2 of you.

Thanks again,
Chalcy

-----Original Message-----
From: anil gupta [mailto:anilgupta84@gmail.com] 
Sent: Thursday, January 17, 2013 12:48 PM
To: user@hbase.apache.org
Subject: Re: Just joined the user group and have a question

Hi Chalcy,

In addition to points others have made. Also have a look at your Disk I/O load. Mapreduce jobs are disk i/o intensive. When a MapReduce job is running there might be a contention for Disk i/o. Contention in Disk i/o might lead to request timeouts in HBase. Hence, you will start having trouble with HBase cluster.
It's little to tricky to get HBase and MapReduce going on the same cluster due to the completely different nature of MapReduce and HBase. Former is batch processing and latter is near real-time processing. If you happen to run them on one cluster then you will have to sacrifice the performance of any one of them. Both of them cannot be optimized.

HTH,
Anil

On Thu, Jan 17, 2013 at 9:34 AM, Doug Meil <do...@explorysmedical.com>wrote:

> Hi there-
>
> If you're absolutely new to Hbase, you might want to check out the 
> Hbase refGuide in the architecture, performance, and troubleshooting 
> chapters first.
>
> http://hbase.apache.org/book.html
>
> In terms of determining why your region servers "just die", I think 
> you need to read the background information then provide more 
> information on your cluster and what you're trying to do because 
> although there are a lot of people on this dist-list that want to 
> help, you're not giving folks a whole lot to go on.
>
>
>
>
> On 1/17/13 12:24 PM, "Chalcy Raja" <Ch...@careerbuilder.com> wrote:
>
> >Hi HBASE Gurus,
> >
> >
> >
> >I am Chalcy Raja and I joined the hbase group yesterday.  I am 
> >already a member of hive and sqoop user groups.  Looking forward to 
> >learn and share information about hbase here!
> >
> >
> >
> >Have a question:  We have a cluster where we run hive jobs and also 
> >hbase.  There are stability issues like region servers just die.  We 
> >are looking into fine tuning.  When I read about performance and also 
> >heard from another user is separate mapreduce from hbase.  How do I do that?
> >If I understand that as running tasktrackers on some and hbase region 
> >servers on some, then we will run into data locality issues and I 
> >believe it will perform poorly.
> >
> >
> >
> >Definitely I am not the only one running into this issue.  Any 
> >thoughts on how to resolve this issue?
> >
> >
> >
> >Thanks,
> >
> >Chalcy
>
>
>


--
Thanks & Regards,
Anil Gupta

Re: Just joined the user group and have a question

Posted by anil gupta <an...@gmail.com>.
Hi Chalcy,

In addition to points others have made. Also have a look at your Disk I/O
load. Mapreduce jobs are disk i/o intensive. When a MapReduce job is
running there might be a contention for Disk i/o. Contention in Disk i/o
might lead to request timeouts in HBase. Hence, you will start having
trouble with HBase cluster.
It's little to tricky to get HBase and MapReduce going on the same cluster
due to the completely different nature of MapReduce and HBase. Former is
batch processing and latter is near real-time processing. If you happen to
run them on one cluster then you will have to sacrifice the performance of
any one of them. Both of them cannot be optimized.

HTH,
Anil

On Thu, Jan 17, 2013 at 9:34 AM, Doug Meil <do...@explorysmedical.com>wrote:

> Hi there-
>
> If you're absolutely new to Hbase, you might want to check out the Hbase
> refGuide in the architecture, performance, and troubleshooting chapters
> first.
>
> http://hbase.apache.org/book.html
>
> In terms of determining why your region servers "just die", I think you
> need to read the background information then provide more information on
> your cluster and what you're trying to do because although there are a lot
> of people on this dist-list that want to help, you're not giving folks a
> whole lot to go on.
>
>
>
>
> On 1/17/13 12:24 PM, "Chalcy Raja" <Ch...@careerbuilder.com> wrote:
>
> >Hi HBASE Gurus,
> >
> >
> >
> >I am Chalcy Raja and I joined the hbase group yesterday.  I am already a
> >member of hive and sqoop user groups.  Looking forward to learn and share
> >information about hbase here!
> >
> >
> >
> >Have a question:  We have a cluster where we run hive jobs and also
> >hbase.  There are stability issues like region servers just die.  We are
> >looking into fine tuning.  When I read about performance and also heard
> >from another user is separate mapreduce from hbase.  How do I do that?
> >If I understand that as running tasktrackers on some and hbase region
> >servers on some, then we will run into data locality issues and I believe
> >it will perform poorly.
> >
> >
> >
> >Definitely I am not the only one running into this issue.  Any thoughts
> >on how to resolve this issue?
> >
> >
> >
> >Thanks,
> >
> >Chalcy
>
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Just joined the user group and have a question

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

If you're absolutely new to Hbase, you might want to check out the Hbase
refGuide in the architecture, performance, and troubleshooting chapters
first.

http://hbase.apache.org/book.html

In terms of determining why your region servers "just die", I think you
need to read the background information then provide more information on
your cluster and what you're trying to do because although there are a lot
of people on this dist-list that want to help, you're not giving folks a
whole lot to go on.




On 1/17/13 12:24 PM, "Chalcy Raja" <Ch...@careerbuilder.com> wrote:

>Hi HBASE Gurus,
>
>
>
>I am Chalcy Raja and I joined the hbase group yesterday.  I am already a
>member of hive and sqoop user groups.  Looking forward to learn and share
>information about hbase here!
>
>
>
>Have a question:  We have a cluster where we run hive jobs and also
>hbase.  There are stability issues like region servers just die.  We are
>looking into fine tuning.  When I read about performance and also heard
>from another user is separate mapreduce from hbase.  How do I do that?
>If I understand that as running tasktrackers on some and hbase region
>servers on some, then we will run into data locality issues and I believe
>it will perform poorly.
>
>
>
>Definitely I am not the only one running into this issue.  Any thoughts
>on how to resolve this issue?
>
>
>
>Thanks,
>
>Chalcy