You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Chris MacKenzie <st...@chrismackenziephotography.co.uk> on 2014/09/01 11:26:18 UTC

Re: total number of map tasks

Thanks for the update ;O)


Regards,

Chris MacKenzie
 <http://www.chrismackenziephotography.co.uk/>Expert in all aspects of
photography
telephone: 0131 332 6967 <tel:0131 332 6967>
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://www.chrismackenziephotography.co.uk/>
weddings: www.wedding.chrismackenziephotography.co.uk
<http://www.wedding.chrismackenziephotography.co.uk/>
 <http://plus.google.com/+ChrismackenziephotographyCoUk/posts>
<http://twitter.com/#!/MacKenzieStudio>
<http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250>
<http://www.linkedin.com/in/chrismackenziephotography/>
<http://pinterest.com/ChrisMacKenzieP/>




On 27/08/2014 17:36, "Stijn De Weirdt" <st...@ugent.be> wrote:

>hi all,
>
>someone PM'ed me suggesting i'd take a look in the input split setting,
>and indeed, the splitsize is determining the number of tasks
>
>
>
>stijn
>
>On 08/27/2014 06:23 PM, Chris MacKenzie wrote:
>> It's my understanding that you don't get map tasks as such but
>>containers.
>>
>> My experience is with version 2 +
>>
>> And if that's true containers are based on memory tuning in
>>mapred-site.xml
>>
>> Otherwise I'd love to learn more.
>>
>> Sent from my iPhone
>>
>>> On 27 Aug 2014, at 12:14, Stijn De Weirdt <st...@ugent.be>
>>>wrote:
>>>
>>> hi all,
>>>
>>> we are tuning yarn (or trying to) on our environment (shared
>>>fielsystem, no hdfs) using terasort and one of the main issue we are
>>>seeing is that an avg map task takes < 15sec. some tuning guides and
>>>websites suggest that ideally map tasks run between 40sec to 1 or 2
>>>minutes.
>>>
>>> (however, it's also not very clear if the recommendations are still
>>>valid for yarn)
>>>
>>> in particluar, we see way more map tasks then expected, and we are
>>>wondering how the number of map tasks per job run is determined.
>>>
>>> teragen created 64 output files, we are only expecting 64 map tasks,
>>>each processing one input file. however, we see something like 3000
>>>tasks
>>>
>>>
>>> hints are much appreciated
>>>
>>> stijn
>>
>>