You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Eric Czech <er...@nextbigsound.com> on 2012/09/05 15:25:11 UTC

Managing MapReduce jobs with concurrent client reads

Hi everyone,

Does anyone have any recommendations on how to maintain low latency for
small, individual reads from HBase while MapReduce jobs are being run?  Is
replication a good way to handle this (i.e. run small, low-latency queries
against a replicated copy of the data and run the MapReduce jobs on the
master copy)?

Re: Managing MapReduce jobs with concurrent client reads

Posted by Eric Czech <er...@nextbigsound.com>.

Neither right now -- I'm just assuming that it would be a problem
since I would definitely have to support both in a hypothetical
HBase+Hadoop installment that isn't actually built yet.

Did you ever try corralling those jobs by just reducing the number of
available map/reduce tasks or did you find that that isn't a reliable
throttling mechanism?

Also, is replication to that batch cluster done via HBase replication
or some other approach?

On Thu, Sep 6, 2012 at 4:08 PM, Stack <st...@duboce.net> wrote:
>
> On Wed, Sep 5, 2012 at 6:25 AM, Eric Czech <er...@nextbigsound.com> wrote:
> > Hi everyone,
> >
> > Does anyone have any recommendations on how to maintain low latency for
> > small, individual reads from HBase while MapReduce jobs are being run?  Is
> > replication a good way to handle this (i.e. run small, low-latency queries
> > against a replicated copy of the data and run the MapReduce jobs on the
> > master copy)?
>
> MapReduce is blowing your caches or higher i/o is sending up latency
> when you have cache miss?  Or its using all the CPU?
>
> Dependent on how its impinges, you could trying corralling mapreduce
> (cgroups/jail) or go to an extreme and keep a low latency OLTP cluster
> running well-known, well-behaved mapreduce jobs replicating into a
> batch cluster where mapreduce is allowed free rein (This is what we do
> where I work.  We also cgroup mapreduce cluster even on our batch
> cluster so random big MR doesn't make the pagers go off during sleepy
> time).
>
> St.Ack

Re: Managing MapReduce jobs with concurrent client reads

Posted by Stack <st...@duboce.net>.

On Wed, Sep 5, 2012 at 6:25 AM, Eric Czech <er...@nextbigsound.com> wrote:
> Hi everyone,
>
> Does anyone have any recommendations on how to maintain low latency for
> small, individual reads from HBase while MapReduce jobs are being run?  Is
> replication a good way to handle this (i.e. run small, low-latency queries
> against a replicated copy of the data and run the MapReduce jobs on the
> master copy)?

MapReduce is blowing your caches or higher i/o is sending up latency
when you have cache miss?  Or its using all the CPU?

Dependent on how its impinges, you could trying corralling mapreduce
(cgroups/jail) or go to an extreme and keep a low latency OLTP cluster
running well-known, well-behaved mapreduce jobs replicating into a
batch cluster where mapreduce is allowed free rein (This is what we do
where I work.  We also cgroup mapreduce cluster even on our batch
cluster so random big MR doesn't make the pagers go off during sleepy
time).

St.Ack