You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Sam William <sa...@stumbleupon.com> on 2012/04/24 01:30:07 UTC

Multiple maps for a small input file

I have a file on  HDFS with a reduced block size.  I created this overriding the dfs.block.size param on the hadoop fs -put command .  hadoop fsck shows that this file has 15 blocks (as opposed to the normal 1 block) I did it so as to force Pig to use  more maps than normal .   On my pig command line   I  specify ' pig  -Dpig.splitCombination=false ' to turn off the defaulr split combination logic.   The jobs still ends running just one mapper.  How can I achieve  multiple maps ?    Splitting the original file into multiple files would be my last resort.



Sam William
sampd@stumbleupon.com

Re: Multiple maps for a small input file

Posted by Sam William <sa...@stumbleupon.com>.

 Dmitriy ,
  Thanks.    pig.splitCombination=false and mapred.min.split.size   combination worked.  

Sam

On Apr 23, 2012, at 5:42 PM, Dmitriy Ryaboy wrote:

> Set mapred.min.split.size
> 
> D
> 
> On Mon, Apr 23, 2012 at 4:30 PM, Sam William <sa...@stumbleupon.com> wrote:
>> I have a file on  HDFS with a reduced block size.  I created this overriding the dfs.block.size param on the hadoop fs -put command .  hadoop fsck shows that this file has 15 blocks (as opposed to the normal 1 block) I did it so as to force Pig to use  more maps than normal .   On my pig command line   I  specify ' pig  -Dpig.splitCombination=false ' to turn off the defaulr split combination logic.   The jobs still ends running just one mapper.  How can I achieve  multiple maps ?    Splitting the original file into multiple files would be my last resort.
>> 
>> 
>> 
>> Sam William
>> sampd@stumbleupon.com
>> 
>> 
>> 

Sam William
sampd@stumbleupon.com

Re: Multiple maps for a small input file

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Set mapred.min.split.size

D

On Mon, Apr 23, 2012 at 4:30 PM, Sam William <sa...@stumbleupon.com> wrote:
> I have a file on  HDFS with a reduced block size.  I created this overriding the dfs.block.size param on the hadoop fs -put command .  hadoop fsck shows that this file has 15 blocks (as opposed to the normal 1 block) I did it so as to force Pig to use  more maps than normal .   On my pig command line   I  specify ' pig  -Dpig.splitCombination=false ' to turn off the defaulr split combination logic.   The jobs still ends running just one mapper.  How can I achieve  multiple maps ?    Splitting the original file into multiple files would be my last resort.
>
>
>
> Sam William
> sampd@stumbleupon.com
>
>
>