You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Guillermo Ortiz <ko...@gmail.com> on 2014/05/07 14:34:21 UTC

Parallel Scan with TableMapReduceUtil

I am processing data from HBase with a MapReduce. The input of my MapReduce
is a "full" scan of a table.

When I execute a full scan with TableMapReduceUtil, is this scan executed
in parallel, so all mappers get the data in parallel?? same way that if I
would execute many range scans with threads?

Re: Parallel Scan with TableMapReduceUtil

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Guillermo,

You should see as many MR tasks as you have regions in your input table.
There will be one scan per task. They will all run in parallel is you have
enough MR slots. Else, some of them will run in parallel, and the others
will wait for an available slot. HBase will try to run those tasks on the
RS the regions are. So doing on the client side using multiple thread will
have a bigger impact on the resources usage since you will have a lot of
calls between the client and all the region servers.

JM


2014-05-07 8:34 GMT-04:00 Guillermo Ortiz <ko...@gmail.com>:

> I am processing data from HBase with a MapReduce. The input of my MapReduce
> is a "full" scan of a table.
>
> When I execute a full scan with TableMapReduceUtil, is this scan executed
> in parallel, so all mappers get the data in parallel?? same way that if I
> would execute many range scans with threads?
>