You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/03/21 20:03:12 UTC

split number

Hi all,
in InputFormat.getSplits(JobConf, splitNum), I think the splitNum should be a hint. The number of splits is equal to the numbers of mappers working on that file. But I do get the same number of splits as indicated by splitNum, and the sum of the split length is the length of that file. It seems the splitNum here is not a hint. Is it a bug, or did I do something wrong? 

Thanks,
-Gang


      

Re: split number

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
AFAIK, it is a hint. Depending on the block size, minimum split size and this hint the exact number of splits is computed.  So if you have total_size/hint < block size but greater than min split size, you should see the exact number.
This is how I understand it, please let me know if I'm going wrong.

Amogh


On 3/22/10 12:33 AM, "Gang Luo" <lg...@yahoo.com.cn> wrote:

Hi all,
in InputFormat.getSplits(JobConf, splitNum), I think the splitNum should be a hint. The number of splits is equal to the numbers of mappers working on that file. But I do get the same number of splits as indicated by splitNum, and the sum of the split length is the length of that file. It seems the splitNum here is not a hint. Is it a bug, or did I do something wrong?

Thanks,
-Gang