You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sam William <sa...@stumbleupon.com> on 2012/04/24 01:30:07 UTC
Multiple maps for a small input file
I have a file on HDFS with a reduced block size. I created this overriding the dfs.block.size param on the hadoop fs -put command . hadoop fsck shows that this file has 15 blocks (as opposed to the normal 1 block) I did it so as to force Pig to use more maps than normal . On my pig command line I specify ' pig -Dpig.splitCombination=false ' to turn off the defaulr split combination logic. The jobs still ends running just one mapper. How can I achieve multiple maps ? Splitting the original file into multiple files would be my last resort.
Sam William
sampd@stumbleupon.com
Re: Multiple maps for a small input file
Posted by Sam William <sa...@stumbleupon.com>.
Dmitriy ,
Thanks. pig.splitCombination=false and mapred.min.split.size combination worked.
Sam
On Apr 23, 2012, at 5:42 PM, Dmitriy Ryaboy wrote:
> Set mapred.min.split.size
>
> D
>
> On Mon, Apr 23, 2012 at 4:30 PM, Sam William <sa...@stumbleupon.com> wrote:
>> I have a file on HDFS with a reduced block size. I created this overriding the dfs.block.size param on the hadoop fs -put command . hadoop fsck shows that this file has 15 blocks (as opposed to the normal 1 block) I did it so as to force Pig to use more maps than normal . On my pig command line I specify ' pig -Dpig.splitCombination=false ' to turn off the defaulr split combination logic. The jobs still ends running just one mapper. How can I achieve multiple maps ? Splitting the original file into multiple files would be my last resort.
>>
>>
>>
>> Sam William
>> sampd@stumbleupon.com
>>
>>
>>
Sam William
sampd@stumbleupon.com
Re: Multiple maps for a small input file
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Set mapred.min.split.size
D
On Mon, Apr 23, 2012 at 4:30 PM, Sam William <sa...@stumbleupon.com> wrote:
> I have a file on HDFS with a reduced block size. I created this overriding the dfs.block.size param on the hadoop fs -put command . hadoop fsck shows that this file has 15 blocks (as opposed to the normal 1 block) I did it so as to force Pig to use more maps than normal . On my pig command line I specify ' pig -Dpig.splitCombination=false ' to turn off the defaulr split combination logic. The jobs still ends running just one mapper. How can I achieve multiple maps ? Splitting the original file into multiple files would be my last resort.
>
>
>
> Sam William
> sampd@stumbleupon.com
>
>
>