You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by es...@sina.com on 2018/09/13 00:51:26 UTC

mapreduce application which only has 17000+ map tasks runs very slow on yarn after 16000+ map completed

Hi everyone!
        





            

            

    

        I'm running a simple sql(select xx,xx... from viewXXX where xxxxx) 
using hive0.13.1 on hadoop2.6.0(the framework is MRv2, not tez). After 
submitting it, I find that it's a MR job which has only 17000+ map tasks
 and no reduce tasks.

        The job runs very quickly in the early 15 minutes(all 400+ containers
 on my cluster(20+ nodes) are allocated to run tasks during this 
period), but become very slow after that(no other jobs running on my 
cluster). 

        I run it a couple of times and find that the number of containers 
allocated to the job decreases(not strictly but roughly) as the time go 
on, and after about 15 minutes the number of containers allocated to the
 job becomes 1(which is the ApplicationMaster's container)! Then the AM 
is always waitting for RM to give  it a container to run map task. RM is
 not busy(no much GC) and has a lot of containers available(I find that 
in the RM log), but it assign AM 1 container per MINUTE. So the job 
finally takes 7 hours to finish.:(

       Parts of my AM,RM log is in the attachment.       Any help will be appreciated!

Re: mapreduce application which only has 17000+ map tasks runs very slow on yarn after 16000+ map completed

Posted by Alexey Eremikhin <a....@corp.badoo.com.INVALID>.

We were experiencing a similar issue with fair scheduler dynamic allocation.

In our case there were most of resources allocated to application 
reducers and mappers did not have enough resources to start.

That was cleanly seen on MR Application master page.

The solution for it was to specify 
mapreduce.job.reduce.slowstart.completedmaps to 1. Yes it might a bit 
delay short queries but for large queries that is essential to allocate 
enough resources for mappers



On 13.09.2018 03:51, esri_lxc@sina.com wrote:
> Hi everyone!
>
>         I'm running a simple sql(select xx,xx... from viewXXX where 
> xxxxx) using hive0.13.1 on hadoop2.6.0(the framework is MRv2, not 
> tez). After submitting it, I find that it's a MR job which has only 
> 17000+ map tasks and no reduce tasks.
>
>         The job runs very quickly in the early 15 minutes(all 400+ 
> containers on my cluster(20+ nodes) are allocated to run tasks during 
> this period), but become very slow after that(no other jobs running on 
> my cluster).
>
>         I run it a couple of times and find that the number of 
> containers allocated to the job decreases(not strictly but roughly) as 
> the time go on, and after about 15 minutes the number of containers 
> allocated to the job becomes 1(which is the ApplicationMaster's 
> container)! Then the AM is always waitting for RM to give it a 
> container to run map task. RM is not busy(no much GC) and has a lot of 
> containers available(I find that in the RM log), but it assign AM 1 
> container per MINUTE. So the job finally takes 7 hours to finish.:(
>
>        Parts of my AM,RM log is in the attachment.
>        Any help will be appreciated!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org