You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tez.apache.org by Gopal V <go...@apache.org> on 2015/02/03 18:25:41 UTC

Re: tez map task and reduce task stay pending forerver

On 1/27/15, 9:24 PM, r7raul1984@163.com wrote:
> I test again, I found if I set mapreduce.map.cpu.vcores >1 ,the job will hang . Very similar
> to https://issues.apache.org/jira/browse/TEZ-704

I suspect this might be a YARN scheduler bug.

Are you using the FairScheduler or the CapacityScheduler?

I cannot reproduce this issue using my YARN-CS cluster, but I suspect 
capacity scheduler is automatically set up to do the dominant resource 
scheduling.

Are you on FS + Fair instead of FS + DRF or CS?

Cheers,
Gopal

> From: r7raul1984@163.com
> Date: 2015-01-28 10:29
> To: user
> Subject: Re: Re: tez map task and reduce task stay pending forerver
> o yeah . I fix the problem.
> I add the config to my hive-site.xml
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
> <value>1</value>
> </property>
> <property>
> <name>yarn.app.mapreduce.am.command-opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.map.java.opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.map.memory.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>mapreduce.map.cpu.vcores</name>
> <value>1</value>
> </property>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>mapreduce.reduce.cpu.vcores</name>
> <value>1</value>
> </property>
> And config my tez-site.xml just
>   <property>
> <name>tez.lib.uris</name>
> <value>${fs.defaultFS}/apps/tez-0.5.3/tez-0.5.3-minimal.tar.gz</value>
> </property>
> <property>
> <name>tez.use.cluster.hadoop-libs</name>
> <value>true</value>
> </property>
>
> Every thing is ok.
> I think some config in my cluster is too larger .
>
>
>
> r7raul1984@163.com
>
> From: r7raul1984@163.com
> Date: 2015-01-28 10:24
> To: user
> Subject: Re: Re: tez map task and reduce task stay pending forerver
> No .   set hive.execution.engine=mr , still hang...
>
>
>
> r7raul1984@163.com
>
> 发件人： Jianfeng (Jeff) Zhang
> 发送时间： 2015-01-28 10:11
> 收件人： user
> 主题： Re: 回复: tez map task and reduce task stay pending forerver
> Can you run this query successfully using hive on mr ?
>
>
>
> Best Regards,
> Jeff Zhang
>
>
> On Wed, Jan 28, 2015 at 10:01 AM, r7raul1984@163.com <r7...@163.com> wrote:
>
> I check the tez document from HDP page http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html.
>
> tez.am.resource.memory.mb default value is 1536
> My hadoop yarn.app.mapreduce.am.resource.mb value is 5734 MiB
>
> The configuration mismatch will cause the problem ?
>
>
> r7raul1984@163.com
>
> 发件人： r7raul1984@163.com
> 发送时间： 2015-01-27 17:59
> 收件人： user
> 主题： 回复: 回复: tez map task and reduce task stay pending forerver
> Sorry Gopal V, I made a mistake, My config mapreduce.map.memory.mb is 2867 .
>
>
>
> r7raul1984@163.com
>
> 发件人： r7raul1984@163.com
> 发送时间： 2015-01-27 17:58
> 收件人： user
> 主题： 回复: 回复: tez map task and reduce task stay pending forerver
> Hello Gopal V,
>          I check my cdh config ,I found mapreduce.map.memory.mb is 2876.
> r7raul1984@163.com
>
> 发件人： r7raul1984@163.com
> 发送时间： 2015-01-27 17:31
> 收件人： user
> 主题： 回复: Re: tez map task and reduce task stay pending forerver
>
> I check the hivetez.log . No kill  request  trigger by hive.
>
>
> r7raul1984@163.com
>
> 发件人： Gopal V
> 发送时间： 2015-01-27 17:17
> 收件人： user
> 抄送： r7raul1984@163.com
> 主题： Re: tez map task and reduce task stay pending forerver
> On 1/27/15, 12:50 AM, r7raul1984@163.com wrote:
>> hive 0.14.0  tez 0.53  hadoop 2.3.0-cdh 5.0.2
>> hive> select * from p_city order by id;
>> Query ID = zhoushugang_20150127163434_da70d957-6ac4-4b8b-a484-42b593838076
> ...
>> --------------------------------------------------------------------------------
>>
>> VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>> --------------------------------------------------------------------------------
>>
>> Map 1 INITED 1 0 0 1 0 0
>> Reducer 2 INITED 1 0 0 1 0 0
>
> Looks all container requests are pending/unresponsive.
>
> I see a container request in the log with
>
> 2015-01-27 15:43:15,434 INFO [TaskSchedulerEventHandlerThread]
> rm.YarnTaskSchedulerService: Allocation request for task:
> attempt_1419300485749_371785_1_00_000000_0 with request:
> Capability[<memory:2867, vCores:3>]Priority[2] host:
> yhd-jqhadoop11.int.yihaodian.com rack: null
> ...
> 2015-01-27 15:43:17,635 INFO [DelayedContainerManager]
> rm.YarnTaskSchedulerService: Releasing held container as either there
> are pending but  unmatched requests or this is not a session,
> containerId=container_1419300485749_371785_01_000002, pendingTasks=1,
> isSession=true. isNew=true
>
> That seems to indicate that a container allocation request was made, but
> YARN resource manager never responded with a container (or gave the
> wrong container?).
>
> Does the container size 2867 suggest any idea on what that might be?
>
> Cheers,
> Gopal
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed
> and may contain information that is confidential, privileged and exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient, you are hereby
> notified that any printing, copying, dissemination, distribution, disclosure or forwarding
> of this communication is strictly prohibited. If you have received this communication in error,
> please contact the sender immediately and delete it from your system. Thank You.
>

Re: Re: tez map task and reduce task stay pending forerver

Posted by "r7raul1984@163.com" <r7...@163.com>.

I didn't config yarn.scheduler.capacity.resource-calculator ,So I am on YARN-CS + DefaultResourceCalculator 



r7raul1984@163.com
 
From: Gopal V
Date: 2015-02-04 01:25
To: r7raul1984@163.com; user
Subject: Re: tez map task and reduce task stay pending forerver
On 1/27/15, 9:24 PM, r7raul1984@163.com wrote:
> I test again, I found if I set mapreduce.map.cpu.vcores >1 ,the job will hang . Very similar
> to https://issues.apache.org/jira/browse/TEZ-704
 
I suspect this might be a YARN scheduler bug.
 
Are you using the FairScheduler or the CapacityScheduler?
 
I cannot reproduce this issue using my YARN-CS cluster, but I suspect 
capacity scheduler is automatically set up to do the dominant resource 
scheduling.
 
Are you on FS + Fair instead of FS + DRF or CS?
 
Cheers,
Gopal
 
> From: r7raul1984@163.com
> Date: 2015-01-28 10:29
> To: user
> Subject: Re: Re: tez map task and reduce task stay pending forerver
> o yeah . I fix the problem.
> I add the config to my hive-site.xml
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
> <value>1</value>
> </property>
> <property>
> <name>yarn.app.mapreduce.am.command-opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.map.java.opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
> </property>
> <property>
> <name>mapreduce.map.memory.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>mapreduce.map.cpu.vcores</name>
> <value>1</value>
> </property>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>1024</value>
> </property>
> <property>
> <name>mapreduce.reduce.cpu.vcores</name>
> <value>1</value>
> </property>
> And config my tez-site.xml just
>   <property>
> <name>tez.lib.uris</name>
> <value>${fs.defaultFS}/apps/tez-0.5.3/tez-0.5.3-minimal.tar.gz</value>
> </property>
> <property>
> <name>tez.use.cluster.hadoop-libs</name>
> <value>true</value>
> </property>
>
> Every thing is ok.
> I think some config in my cluster is too larger .
>
>
>
> r7raul1984@163.com
>
> From: r7raul1984@163.com
> Date: 2015-01-28 10:24
> To: user
> Subject: Re: Re: tez map task and reduce task stay pending forerver
> No .   set hive.execution.engine=mr , still hang...
>
>
>
> r7raul1984@163.com
>
> 发件人： Jianfeng (Jeff) Zhang
> 发送时间： 2015-01-28 10:11
> 收件人： user
> 主题： Re: 回复: tez map task and reduce task stay pending forerver
> Can you run this query successfully using hive on mr ?
>
>
>
> Best Regards,
> Jeff Zhang
>
>
> On Wed, Jan 28, 2015 at 10:01 AM, r7raul1984@163.com <r7...@163.com> wrote:
>
> I check the tez document from HDP page http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html.
>
> tez.am.resource.memory.mb default value is 1536
> My hadoop yarn.app.mapreduce.am.resource.mb value is 5734 MiB
>
> The configuration mismatch will cause the problem ?
>
>
> r7raul1984@163.com
>
> 发件人： r7raul1984@163.com
> 发送时间： 2015-01-27 17:59
> 收件人： user
> 主题： 回复: 回复: tez map task and reduce task stay pending forerver
> Sorry Gopal V, I made a mistake, My config mapreduce.map.memory.mb is 2867 .
>
>
>
> r7raul1984@163.com
>
> 发件人： r7raul1984@163.com
> 发送时间： 2015-01-27 17:58
> 收件人： user
> 主题： 回复: 回复: tez map task and reduce task stay pending forerver
> Hello Gopal V,
>          I check my cdh config ,I found mapreduce.map.memory.mb is 2876.
> r7raul1984@163.com
>
> 发件人： r7raul1984@163.com
> 发送时间： 2015-01-27 17:31
> 收件人： user
> 主题： 回复: Re: tez map task and reduce task stay pending forerver
>
> I check the hivetez.log . No kill  request  trigger by hive.
>
>
> r7raul1984@163.com
>
> 发件人： Gopal V
> 发送时间： 2015-01-27 17:17
> 收件人： user
> 抄送： r7raul1984@163.com
> 主题： Re: tez map task and reduce task stay pending forerver
> On 1/27/15, 12:50 AM, r7raul1984@163.com wrote:
>> hive 0.14.0  tez 0.53  hadoop 2.3.0-cdh 5.0.2
>> hive> select * from p_city order by id;
>> Query ID = zhoushugang_20150127163434_da70d957-6ac4-4b8b-a484-42b593838076
> ...
>> --------------------------------------------------------------------------------
>>
>> VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>> --------------------------------------------------------------------------------
>>
>> Map 1 INITED 1 0 0 1 0 0
>> Reducer 2 INITED 1 0 0 1 0 0
>
> Looks all container requests are pending/unresponsive.
>
> I see a container request in the log with
>
> 2015-01-27 15:43:15,434 INFO [TaskSchedulerEventHandlerThread]
> rm.YarnTaskSchedulerService: Allocation request for task:
> attempt_1419300485749_371785_1_00_000000_0 with request:
> Capability[<memory:2867, vCores:3>]Priority[2] host:
> yhd-jqhadoop11.int.yihaodian.com rack: null
> ...
> 2015-01-27 15:43:17,635 INFO [DelayedContainerManager]
> rm.YarnTaskSchedulerService: Releasing held container as either there
> are pending but  unmatched requests or this is not a session,
> containerId=container_1419300485749_371785_01_000002, pendingTasks=1,
> isSession=true. isNew=true
>
> That seems to indicate that a container allocation request was made, but
> YARN resource manager never responded with a container (or gave the
> wrong container?).
>
> Does the container size 2867 suggest any idea on what that might be?
>
> Cheers,
> Gopal
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed
> and may contain information that is confidential, privileged and exempt from disclosure under
> applicable law. If the reader of this message is not the intended recipient, you are hereby
> notified that any printing, copying, dissemination, distribution, disclosure or forwarding
> of this communication is strictly prohibited. If you have received this communication in error,
> please contact the sender immediately and delete it from your system. Thank You.
>