You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2010/04/29 23:23:35 UTC

CombineFileInputFormat not producing multiple mappers

I am using CombineFileInputFormat and CombineFileSplit to group small input files as fed to the mappers.  The job runs properly and the output is correct, but I get only one mapper task, so I lose all my paralleization in the map stage.

I realize I'm not providing much detail yet because I'm not sure what to say.  Feel free to ask questions for clarification.

What might cause this problem and how might I diagnose -- must less fix -- it?

Thank you.

________________________________________________________________________________
Keith Wiley               kwiley@keithwiley.com               www.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
  -- Homer Simpson
________________________________________________________________________________




Re: CombineFileInputFormat not producing multiple mappers

Posted by Keith Wiley <kw...@keithwiley.com>.
Yep, that was part of it.  Thank you.  Also, I was not setting  
splittable true for the Combined Input because I knew the contained  
files themselves were no splittable.  Setting the Combined Input's  
splittable to true appears to have been important as well.

Thank you.

On 2010, Apr 29, at 11:53 PM, Aleksandar Stupar wrote:

> Hi,
>
> if the mapred.max.split.size is not set (and it's not by default)  
> than CombineFileInputFormat
> only takes racks in account when grouping blocks. So if you set this  
> property it will take also
> block placement on machines into account and you should get multiple  
> mappers.
>
> Hope this helps,
> Aleksandar Stupar.
>
>
>
>
> ________________________________
> From: Keith Wiley <kw...@keithwiley.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, April 29, 2010 11:23:35 PM
> Subject: CombineFileInputFormat not producing multiple mappers
>
> I am using CombineFileInputFormat and CombineFileSplit to group  
> small input files as fed to the mappers.  The job runs properly and  
> the output is correct, but I get only one mapper task, so I lose all  
> my paralleization in the map stage.
>
> I realize I'm not providing much detail yet because I'm not sure  
> what to say.  Feel free to ask questions for clarification.
>
> What might cause this problem and how might I diagnose -- must less  
> fix -- it?
>
> Thank you.
>
> ________________________________________________________________________________
> Keith Wiley              kwiley@keithwiley.com              www.keithwiley.com
>
> "And what if we picked the wrong religion?  Every week, we're just  
> making God
> madder and madder!"
>  -- Homer Simpson
> ________________________________________________________________________________
>
>


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com     
music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                            --  Yoda
________________________________________________________________________________


Re: CombineFileInputFormat not producing multiple mappers

Posted by Aleksandar Stupar <st...@yahoo.com>.
Hi,

if the mapred.max.split.size is not set (and it's not by default) than CombineFileInputFormat 
only takes racks in account when grouping blocks. So if you set this property it will take also
block placement on machines into account and you should get multiple mappers.

Hope this helps,
Aleksandar Stupar.




________________________________
From: Keith Wiley <kw...@keithwiley.com>
To: common-user@hadoop.apache.org
Sent: Thu, April 29, 2010 11:23:35 PM
Subject: CombineFileInputFormat not producing multiple mappers

I am using CombineFileInputFormat and CombineFileSplit to group small input files as fed to the mappers.  The job runs properly and the output is correct, but I get only one mapper task, so I lose all my paralleization in the map stage.

I realize I'm not providing much detail yet because I'm not sure what to say.  Feel free to ask questions for clarification.

What might cause this problem and how might I diagnose -- must less fix -- it?

Thank you.

________________________________________________________________________________
Keith Wiley              kwiley@keithwiley.com              www.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
  -- Homer Simpson
________________________________________________________________________________