You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Alex Baranau <al...@gmail.com> on 2010/09/14 19:10:15 UTC

HBase MR: run more map tasks than regions

Hello,

As far as I know, the number of map tasks for "scan-based" mapreduce job is
equal (not more than) number of underlying regions (for scan). Of course, if
the max map task capacity is big enough.
I have a situation, when map-side processing is very heavy but uses quite
small amount of records from the HBase table. It may occur that those
records belongs to one or several regions and this results in running just
several map tasks. This may make processing very slow without utilising all
of the cluster resources :(. Is there a way to set the minimal number (or
just particular number) of map tasks in this situation? Is the only way is
to enhance TableInputFormat for me?

Thank you,

Alex Baranau
---
http://sematext.com

Re: HBase MR: run more map tasks than regions

Posted by Stack <st...@duboce.net>.
On Tue, Sep 14, 2010 at 10:10 AM, Alex Baranau <al...@gmail.com> wrote:
>Is the only way is to enhance TableInputFormat for me?
>

Currently, yes, you must enhance TIF or use an alternate TIF.
St.Ack

Re: HBase MR: run more map tasks than regions

Posted by Stack <st...@duboce.net>.
On Tue, Sep 14, 2010 at 10:10 AM, Alex Baranau <al...@gmail.com> wrote:
>Is the only way is to enhance TableInputFormat for me?
>

Currently, yes, you must enhance TIF or use an alternate TIF.
St.Ack