You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Stanley Xu <we...@gmail.com> on 2011/02/28 03:51:15 UTC

How to have multiple mapper and reducer for a MapReduce job on a hbase table with hbase 0.20.6?

Dear all,

I am writing a Map-Reduce task to go through a HBase table to re-calculate
the entries stored in it daily. The number of entries would be hundreds of
millions. I use the TableMapper as the mapper and IdentityTableReducer as
the reducer followed the example in the HBase code. I found it would only
use 1 mapper and 1 reducer in my test table which has about 3 millions of
entries.

I am wondering how could I get multiple mapper or reducer in this case?
Because I need to finish the job in a couple of minutes. And now it will
take me 6 minutes to process 3 million entries which means it will take
about 300 minutes for 150 millions entries?

I found a SimpleTotalOrderPartitioner in the 0.90.0 api but it didn't exist
in 0.20.6. Is there anything I could use in 0.20.6?

Thanks.

Best wishes,
Stanley Xu

Re: How to have multiple mapper and reducer for a MapReduce job on a hbase table with hbase 0.20.6?

Posted by Stanley Xu <we...@gmail.com>.

Sorry, I thought the reason I have only 1 mapper is that I only got 1 region
in the tests data. I would get more mappers if the data distributed in
multiple regions. And I could use the HRegionPartitioner and setReducerTask
to increase reducer numbers.

Thanks.

Best wishes,
Stanley Xu



On Mon, Feb 28, 2011 at 10:51 AM, Stanley Xu <we...@gmail.com> wrote:

> Dear all,
>
> I am writing a Map-Reduce task to go through a HBase table to re-calculate
> the entries stored in it daily. The number of entries would be hundreds of
> millions. I use the TableMapper as the mapper and IdentityTableReducer as
> the reducer followed the example in the HBase code. I found it would only
> use 1 mapper and 1 reducer in my test table which has about 3 millions of
> entries.
>
> I am wondering how could I get multiple mapper or reducer in this case?
> Because I need to finish the job in a couple of minutes. And now it will
> take me 6 minutes to process 3 million entries which means it will take
> about 300 minutes for 150 millions entries?
>
> I found a SimpleTotalOrderPartitioner in the 0.90.0 api but it didn't exist
> in 0.20.6. Is there anything I could use in 0.20.6?
>
> Thanks.
>
> Best wishes,
> Stanley Xu
>
>