You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by Ravi Kumar <ra...@persistent.com> on 2015/03/04 08:13:23 UTC

Hadoop:Runing jobs stuck even resources are available on nodes

Hi

We are running some test cases in parallel using maven fail-safe plugin.  Most of the test cases submit MR/TEZ/oozie/hive jobs.
After some time test cases are getting stuck. When checked the job tracker, we noticed that there are some jobs stuck  in RUNNING state and some jobs SUBMITED but not getting chance to run.
Following are the details,
Hadoop Version: 2.4.1
Nodes available: 3
Yarn scheduler used: capacity scheduler
Configuration in the yarn-site.xml:
yarn.nodemanager.resource.memory-mb           4096
yarn.scheduler.minimum-allocation-mb                512
mapreduce.map.memory.mb    1536
mapreduce.reduce.memory.mb              2560
mapreduce.map.java.opts          -Xmx512m
mapreduce.reduce.java.opts     -Xmx512m
yarn.nodemanager.vmem-pmem-ratio 5

Following was resources available when stuck(there were sufficient CPU-Capacity  available),
Case-1:
Node-1 :2.5GB
Node-2: 1.5GB
Node-3: 2.5GB

Case-2
Node-1 : 0
Node-2: 0
Node-3: 4GB

We are not able to find,
Why RUNNING jobs are not getting completed even some resources are available?
Why available resources are not getting used by YARN to complete the running jobs?
Is there any case that required resources for running jobs to complete is more than available that's why jobs are not getting completed.
If this is the case, how we can find out required resources/container for any jobs?
Is there any other properties we should check ?

Any pointer/help on this would be very helpful.

Thanks in advance.
Ravi


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.