You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stijn De Weirdt <st...@ugent.be> on 2014/08/27 13:14:43 UTC

total number of map tasks

hi all,

we are tuning yarn (or trying to) on our environment (shared fielsystem, 
no hdfs) using terasort and one of the main issue we are seeing is that 
an avg map task takes < 15sec. some tuning guides and websites suggest 
that ideally map tasks run between 40sec to 1 or 2 minutes.

(however, it's also not very clear if the recommendations are still 
valid for yarn)

in particluar, we see way more map tasks then expected, and we are 
wondering how the number of map tasks per job run is determined.

teragen created 64 output files, we are only expecting 64 map tasks, 
each processing one input file. however, we see something like 3000 tasks


hints are much appreciated

stijn

Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Thanks for the update ;O)


Regards,

Chris MacKenzie
 <http://www.chrismackenziephotography.co.uk/>Expert in all aspects of
photography
telephone: 0131 332 6967 <tel:0131 332 6967>
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://www.chrismackenziephotography.co.uk/>
weddings: www.wedding.chrismackenziephotography.co.uk
<http://www.wedding.chrismackenziephotography.co.uk/>
 <http://plus.google.com/+ChrismackenziephotographyCoUk/posts>
<http://twitter.com/#!/MacKenzieStudio>
<http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250>
<http://www.linkedin.com/in/chrismackenziephotography/>
<http://pinterest.com/ChrisMacKenzieP/>




On 27/08/2014 17:36, "Stijn De Weirdt" <st...@ugent.be> wrote:

>hi all,
>
>someone PM'ed me suggesting i'd take a look in the input split setting,
>and indeed, the splitsize is determining the number of tasks
>
>
>
>stijn
>
>On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
>> It's my understanding that you don't get map tasks as such but
>>containers.
>>
>> My experience is with version 2 +
>>
>> And if that's true containers are based on memory tuning in
>>mapred-site.xml
>>
>> Otherwise I'd love to learn more.
>>
>> Sent from my iPhone
>>
>>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be>
>>>wrote:
>>>
>>> hi all,
>>>
>>> we are tuning yarn (or trying to) on our environment (shared
>>>fielsystem, no hdfs) using terasort and one of the main issue we are
>>>seeing is that an avg map task takes < 15sec. some tuning guides and
>>>websites suggest that ideally map tasks run between 40sec to 1 or 2
>>>minutes.
>>>
>>> (however, it's also not very clear if the recommendations are still
>>>valid for yarn)
>>>
>>> in particluar, we see way more map tasks then expected, and we are
>>>wondering how the number of map tasks per job run is determined.
>>>
>>> teragen created 64 output files, we are only expecting 64 map tasks,
>>>each processing one input file. however, we see something like 3000
>>>tasks
>>>
>>>
>>> hints are much appreciated
>>>
>>> stijn
>>
>>



Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Thanks for the update ;O)


Regards,

Chris MacKenzie
 <http://www.chrismackenziephotography.co.uk/>Expert in all aspects of
photography
telephone: 0131 332 6967 <tel:0131 332 6967>
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://www.chrismackenziephotography.co.uk/>
weddings: www.wedding.chrismackenziephotography.co.uk
<http://www.wedding.chrismackenziephotography.co.uk/>
 <http://plus.google.com/+ChrismackenziephotographyCoUk/posts>
<http://twitter.com/#!/MacKenzieStudio>
<http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250>
<http://www.linkedin.com/in/chrismackenziephotography/>
<http://pinterest.com/ChrisMacKenzieP/>




On 27/08/2014 17:36, "Stijn De Weirdt" <st...@ugent.be> wrote:

>hi all,
>
>someone PM'ed me suggesting i'd take a look in the input split setting,
>and indeed, the splitsize is determining the number of tasks
>
>
>
>stijn
>
>On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
>> It's my understanding that you don't get map tasks as such but
>>containers.
>>
>> My experience is with version 2 +
>>
>> And if that's true containers are based on memory tuning in
>>mapred-site.xml
>>
>> Otherwise I'd love to learn more.
>>
>> Sent from my iPhone
>>
>>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be>
>>>wrote:
>>>
>>> hi all,
>>>
>>> we are tuning yarn (or trying to) on our environment (shared
>>>fielsystem, no hdfs) using terasort and one of the main issue we are
>>>seeing is that an avg map task takes < 15sec. some tuning guides and
>>>websites suggest that ideally map tasks run between 40sec to 1 or 2
>>>minutes.
>>>
>>> (however, it's also not very clear if the recommendations are still
>>>valid for yarn)
>>>
>>> in particluar, we see way more map tasks then expected, and we are
>>>wondering how the number of map tasks per job run is determined.
>>>
>>> teragen created 64 output files, we are only expecting 64 map tasks,
>>>each processing one input file. however, we see something like 3000
>>>tasks
>>>
>>>
>>> hints are much appreciated
>>>
>>> stijn
>>
>>



Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Thanks for the update ;O)


Regards,

Chris MacKenzie
 <http://www.chrismackenziephotography.co.uk/>Expert in all aspects of
photography
telephone: 0131 332 6967 <tel:0131 332 6967>
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://www.chrismackenziephotography.co.uk/>
weddings: www.wedding.chrismackenziephotography.co.uk
<http://www.wedding.chrismackenziephotography.co.uk/>
 <http://plus.google.com/+ChrismackenziephotographyCoUk/posts>
<http://twitter.com/#!/MacKenzieStudio>
<http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250>
<http://www.linkedin.com/in/chrismackenziephotography/>
<http://pinterest.com/ChrisMacKenzieP/>




On 27/08/2014 17:36, "Stijn De Weirdt" <st...@ugent.be> wrote:

>hi all,
>
>someone PM'ed me suggesting i'd take a look in the input split setting,
>and indeed, the splitsize is determining the number of tasks
>
>
>
>stijn
>
>On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
>> It's my understanding that you don't get map tasks as such but
>>containers.
>>
>> My experience is with version 2 +
>>
>> And if that's true containers are based on memory tuning in
>>mapred-site.xml
>>
>> Otherwise I'd love to learn more.
>>
>> Sent from my iPhone
>>
>>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be>
>>>wrote:
>>>
>>> hi all,
>>>
>>> we are tuning yarn (or trying to) on our environment (shared
>>>fielsystem, no hdfs) using terasort and one of the main issue we are
>>>seeing is that an avg map task takes < 15sec. some tuning guides and
>>>websites suggest that ideally map tasks run between 40sec to 1 or 2
>>>minutes.
>>>
>>> (however, it's also not very clear if the recommendations are still
>>>valid for yarn)
>>>
>>> in particluar, we see way more map tasks then expected, and we are
>>>wondering how the number of map tasks per job run is determined.
>>>
>>> teragen created 64 output files, we are only expecting 64 map tasks,
>>>each processing one input file. however, we see something like 3000
>>>tasks
>>>
>>>
>>> hints are much appreciated
>>>
>>> stijn
>>
>>



Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Thanks for the update ;O)


Regards,

Chris MacKenzie
 <http://www.chrismackenziephotography.co.uk/>Expert in all aspects of
photography
telephone: 0131 332 6967 <tel:0131 332 6967>
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://www.chrismackenziephotography.co.uk/>
weddings: www.wedding.chrismackenziephotography.co.uk
<http://www.wedding.chrismackenziephotography.co.uk/>
 <http://plus.google.com/+ChrismackenziephotographyCoUk/posts>
<http://twitter.com/#!/MacKenzieStudio>
<http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250>
<http://www.linkedin.com/in/chrismackenziephotography/>
<http://pinterest.com/ChrisMacKenzieP/>




On 27/08/2014 17:36, "Stijn De Weirdt" <st...@ugent.be> wrote:

>hi all,
>
>someone PM'ed me suggesting i'd take a look in the input split setting,
>and indeed, the splitsize is determining the number of tasks
>
>
>
>stijn
>
>On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
>> It's my understanding that you don't get map tasks as such but
>>containers.
>>
>> My experience is with version 2 +
>>
>> And if that's true containers are based on memory tuning in
>>mapred-site.xml
>>
>> Otherwise I'd love to learn more.
>>
>> Sent from my iPhone
>>
>>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be>
>>>wrote:
>>>
>>> hi all,
>>>
>>> we are tuning yarn (or trying to) on our environment (shared
>>>fielsystem, no hdfs) using terasort and one of the main issue we are
>>>seeing is that an avg map task takes < 15sec. some tuning guides and
>>>websites suggest that ideally map tasks run between 40sec to 1 or 2
>>>minutes.
>>>
>>> (however, it's also not very clear if the recommendations are still
>>>valid for yarn)
>>>
>>> in particluar, we see way more map tasks then expected, and we are
>>>wondering how the number of map tasks per job run is determined.
>>>
>>> teragen created 64 output files, we are only expecting 64 map tasks,
>>>each processing one input file. however, we see something like 3000
>>>tasks
>>>
>>>
>>> hints are much appreciated
>>>
>>> stijn
>>
>>



Re: total number of map tasks

Posted by Stijn De Weirdt <st...@ugent.be>.
hi all,

someone PM'ed me suggesting i'd take a look in the input split setting, 
and indeed, the splitsize is determining the number of tasks



stijn

On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
> It's my understanding that you don't get map tasks as such but containers.
>
> My experience is with version 2 +
>
> And if that's true containers are based on memory tuning in mapred-site.xml
>
> Otherwise I'd love to learn more.
>
> Sent from my iPhone
>
>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
>>
>> hi all,
>>
>> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
>>
>> (however, it's also not very clear if the recommendations are still valid for yarn)
>>
>> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
>>
>> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
>>
>>
>> hints are much appreciated
>>
>> stijn
>
>

Re: total number of map tasks

Posted by Stijn De Weirdt <st...@ugent.be>.
hi all,

someone PM'ed me suggesting i'd take a look in the input split setting, 
and indeed, the splitsize is determining the number of tasks



stijn

On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
> It's my understanding that you don't get map tasks as such but containers.
>
> My experience is with version 2 +
>
> And if that's true containers are based on memory tuning in mapred-site.xml
>
> Otherwise I'd love to learn more.
>
> Sent from my iPhone
>
>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
>>
>> hi all,
>>
>> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
>>
>> (however, it's also not very clear if the recommendations are still valid for yarn)
>>
>> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
>>
>> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
>>
>>
>> hints are much appreciated
>>
>> stijn
>
>

Re: total number of map tasks

Posted by Stijn De Weirdt <st...@ugent.be>.
hi all,

someone PM'ed me suggesting i'd take a look in the input split setting, 
and indeed, the splitsize is determining the number of tasks



stijn

On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
> It's my understanding that you don't get map tasks as such but containers.
>
> My experience is with version 2 +
>
> And if that's true containers are based on memory tuning in mapred-site.xml
>
> Otherwise I'd love to learn more.
>
> Sent from my iPhone
>
>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
>>
>> hi all,
>>
>> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
>>
>> (however, it's also not very clear if the recommendations are still valid for yarn)
>>
>> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
>>
>> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
>>
>>
>> hints are much appreciated
>>
>> stijn
>
>

Re: total number of map tasks

Posted by Stijn De Weirdt <st...@ugent.be>.
hi all,

someone PM'ed me suggesting i'd take a look in the input split setting, 
and indeed, the splitsize is determining the number of tasks



stijn

On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
> It's my understanding that you don't get map tasks as such but containers.
>
> My experience is with version 2 +
>
> And if that's true containers are based on memory tuning in mapred-site.xml
>
> Otherwise I'd love to learn more.
>
> Sent from my iPhone
>
>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
>>
>> hi all,
>>
>> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
>>
>> (however, it's also not very clear if the recommendations are still valid for yarn)
>>
>> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
>>
>> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
>>
>>
>> hints are much appreciated
>>
>> stijn
>
>

Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
It's my understanding that you don't get map tasks as such but containers. 

My experience is with version 2 +

And if that's true containers are based on memory tuning in mapred-site.xml

Otherwise I'd love to learn more. 

Sent from my iPhone

> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
> 
> hi all,
> 
> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
> 
> (however, it's also not very clear if the recommendations are still valid for yarn)
> 
> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
> 
> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
> 
> 
> hints are much appreciated
> 
> stijn

Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
It's my understanding that you don't get map tasks as such but containers. 

My experience is with version 2 +

And if that's true containers are based on memory tuning in mapred-site.xml

Otherwise I'd love to learn more. 

Sent from my iPhone

> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
> 
> hi all,
> 
> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
> 
> (however, it's also not very clear if the recommendations are still valid for yarn)
> 
> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
> 
> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
> 
> 
> hints are much appreciated
> 
> stijn

Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
It's my understanding that you don't get map tasks as such but containers. 

My experience is with version 2 +

And if that's true containers are based on memory tuning in mapred-site.xml

Otherwise I'd love to learn more. 

Sent from my iPhone

> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
> 
> hi all,
> 
> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
> 
> (however, it's also not very clear if the recommendations are still valid for yarn)
> 
> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
> 
> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
> 
> 
> hints are much appreciated
> 
> stijn

Re: total number of map tasks

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
It's my understanding that you don't get map tasks as such but containers. 

My experience is with version 2 +

And if that's true containers are based on memory tuning in mapred-site.xml

Otherwise I'd love to learn more. 

Sent from my iPhone

> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be> wrote:
> 
> hi all,
> 
> we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes < 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.
> 
> (however, it's also not very clear if the recommendations are still valid for yarn)
> 
> in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined.
> 
> teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks
> 
> 
> hints are much appreciated
> 
> stijn