You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Niels Basjes <Ni...@basjes.nl> on 2015/02/26 09:26:58 UTC

Tuning the number of mappers when using HBaseStorage

Hi,

I wrote a pig script that uses the HBaseStorage to read data from HBase (I
do a full table scan for this usecase). Now this table only has 7 regions
right now and I found that the actual pig job runs with 7 mappers to read
this data.
I increased the number of splits in this table and the number of mappers
increased to the same value.

Now increasing the number of splits 'just' to get more mappers seems a bit
strange.

Question: Is it possible to run HBaseStorage as input and use more mappers
than there are regions?  Or perhaps I can explicitly state the key ranges
per mapper (i.e. ignoring  the actual regions in the table)?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes