You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bishal Acharya <ba...@veriskhealth.com> on 2010/09/24 07:11:30 UTC

How to manage Hbase/Hadoop cluster

I am running a 20 node cluster with hadoop/hbase. Currently what I am 
doing is that, I run the MR jobs in the cluster and at the same time I 
am serving my web application directly from Hbase in the same cluster. 
What happens is, when I am not running any MR jobs the applications are 
running perfectly fine, But when I run MR jobs at the same time as I am 
browsing my application, I am faced with this increase in latency while 
browsing. How could I properly manage my cluster so that I don't have to 
face the added latency due to cluster being saturated by MR jobs. I 
wanted to know specifically how this is done in companies using Hbase 
for front serving for example at StumbleUpon ? How do they manage this 
issue ?


-- 

Sincerely,


*Bishal Acharya*

/Software Engineer | D2HawkeyeServices Pvt. Ltd. | Subsidiary of Verisk 
Health,USA/
**

Cell +977-9849378541 | bacharya@veriskhealth.com | www.d2hawkeyeservices.com

P  Request : Unless absolutely necessary, please do not print this 
e-mail. Help save environment. Thank you.




This email is intended for the recipient only. If you are not the intended
recipient please disregard, and do not use the information for any purpose.

Re: How to manage Hbase/Hadoop cluster

Posted by Matthew LeMieux <md...@mlogiciels.com>.
I am not with StumbleUpon, but I can tell you how one of my clients does it.  

We are serving the website from HBase.  We used to try M/R jobs on the same cluster, but quickly found that this was a bad idea.  We do very minimal Hadoop M/R apps on the web cluster.  Every time we run an M/R job on the web cluster, we see increased latency on the web app.  It makes sense, if you do more work on the cluster, it will not be able to respond as quickly. This isn't a new idea, traditionally data warehousing/deep analytics tasks are separated from OLTP processing.  

We ended up splitting the cluster into two clusters: a web cluster and a compute cluster.  The longer jobs that run on the compute cluster have quite a few steps.  The first step pulls data from the HBase cluster, and the final step puts the results back to the HBase cluster.   

We manage our own indexes.   The only jobs that run on the HBase cluster are indexing jobs.  Anything that does any sort of analytics runs on the compute cluster. 

-Matthew


On Sep 23, 2010, at 10:11 PM, Bishal Acharya wrote:

> I am running a 20 node cluster with hadoop/hbase. Currently what I am doing is that, I run the MR jobs in the cluster and at the same time I am serving my web application directly from Hbase in the same cluster. What happens is, when I am not running any MR jobs the applications are running perfectly fine, But when I run MR jobs at the same time as I am browsing my application, I am faced with this increase in latency while browsing. How could I properly manage my cluster so that I don't have to face the added latency due to cluster being saturated by MR jobs. I wanted to know specifically how this is done in companies using Hbase for front serving for example at StumbleUpon ? How do they manage this issue ?
> 
> 
> -- 
> 
> Sincerely,
> 
> 
> *Bishal Acharya*
> 
> /Software Engineer | D2HawkeyeServices Pvt. Ltd. | Subsidiary of Verisk Health,USA/
> **
> 
> Cell +977-9849378541 | bacharya@veriskhealth.com | www.d2hawkeyeservices.com
> 
> P  Request : Unless absolutely necessary, please do not print this e-mail. Help save environment. Thank you.
> 
> 
> 
> 
> This email is intended for the recipient only. If you are not the intended
> recipient please disregard, and do not use the information for any purpose.


Re: How to manage Hbase/Hadoop cluster

Posted by Jean-Daniel Cryans <jd...@apache.org>.
We have multiple clusters (in multiple datacenters), and web serving
is separated from MR processing. We do have some jobs running on prod
but they are usually of lower intensity, run during low traffic hours,
and we limited the number of map and reduce slots to 1 each per node.

The data between the clusters can be copied via any tool of your
fancy. In 0.20 you have access to Import/Export+distcp and in 0.89
there's a CopyTable job that does the copying from one HBase setup
directly into another HBase cluster. Finally in 0.89 there's also the
cluster replication feature (that I wrote) that basically works like
MySQL master/slave replication. We currently only use it for disaster
recovery tho, I have yet written the multi-slave part :)

J-D

On Thu, Sep 23, 2010 at 10:11 PM, Bishal Acharya
<ba...@veriskhealth.com> wrote:
> I am running a 20 node cluster with hadoop/hbase. Currently what I am doing
> is that, I run the MR jobs in the cluster and at the same time I am serving
> my web application directly from Hbase in the same cluster. What happens is,
> when I am not running any MR jobs the applications are running perfectly
> fine, But when I run MR jobs at the same time as I am browsing my
> application, I am faced with this increase in latency while browsing. How
> could I properly manage my cluster so that I don't have to face the added
> latency due to cluster being saturated by MR jobs. I wanted to know
> specifically how this is done in companies using Hbase for front serving for
> example at StumbleUpon ? How do they manage this issue ?
>
>
> --
>
> Sincerely,
>
>
> *Bishal Acharya*
>
> /Software Engineer | D2HawkeyeServices Pvt. Ltd. | Subsidiary of Verisk
> Health,USA/
> **
>
> Cell +977-9849378541 | bacharya@veriskhealth.com | www.d2hawkeyeservices.com
>
> P  Request : Unless absolutely necessary, please do not print this e-mail.
> Help save environment. Thank you.
>
>
>
>
> This email is intended for the recipient only. If you are not the intended
> recipient please disregard, and do not use the information for any purpose.

Re: How to manage Hbase/Hadoop cluster

Posted by Jinsong Hu <ji...@hotmail.com>.
are you running task tracker and region server on the same machine ?
both are CPU intensive.


Jimmy

--------------------------------------------------
From: "Bishal Acharya" <ba...@veriskhealth.com>
Sent: Thursday, September 23, 2010 10:11 PM
To: <us...@hbase.apache.org>
Subject: How to manage Hbase/Hadoop cluster

> I am running a 20 node cluster with hadoop/hbase. Currently what I am
> doing is that, I run the MR jobs in the cluster and at the same time I
> am serving my web application directly from Hbase in the same cluster.
> What happens is, when I am not running any MR jobs the applications are
> running perfectly fine, But when I run MR jobs at the same time as I am
> browsing my application, I am faced with this increase in latency while
> browsing. How could I properly manage my cluster so that I don't have to
> face the added latency due to cluster being saturated by MR jobs. I
> wanted to know specifically how this is done in companies using Hbase
> for front serving for example at StumbleUpon ? How do they manage this
> issue ?
>
>
> -- 
>
> Sincerely,
>
>
> *Bishal Acharya*
>
> /Software Engineer | D2HawkeyeServices Pvt. Ltd. | Subsidiary of Verisk
> Health,USA/
> **
>
> Cell +977-9849378541 | bacharya@veriskhealth.com | 
> www.d2hawkeyeservices.com
>
> P  Request : Unless absolutely necessary, please do not print this
> e-mail. Help save environment. Thank you.
>
>
>
>
> This email is intended for the recipient only. If you are not the intended
> recipient please disregard, and do not use the information for any 
> purpose.