You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Siddhi Mehta <sm...@gmail.com> on 2013/06/22 01:21:56 UTC

Yarn job stuck with no application master being assigned

Hey All,

I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
NodeManager.

We have an Map only job that launches a pig job on the cluster(similar to
what oozie does)

We are seeing that the map only job launches the pig script but the pig job
is stuck in ACCEPTED state with no trackingUI assigned.

I dont see any error in the nodemanager logs or the resource manager logs
as such.


On the nodemanager i see this logs
2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
cluster=memory: 5120

2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
currently active: 2

Which suggests that the cluster has capacity but still no application
master is assigned to it.
What am I missing?Any help is appreciated.

I keep seeing this logs on the node manager
2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage
of ProcessTree 12484 for container-id
container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
used; 590.1mb of 2.1gb virtual memory used
2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage
of ProcessTree 12009 for container-id
container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
used; 1.4gb of 2.1gb virtual memory used
2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
out status for container: container_id {, app_attempt_id {, application_id
{, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
out status for container: container_id {, app_attempt_id {, application_id
{, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
out status for container: container_id {, app_attempt_id {, application_id
{, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
out status for container: container_id {, app_attempt_id {, application_id
{, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
out status for container: container_id {, app_attempt_id {, application_id
{, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
out status for container: container_id {, app_attempt_id {, application_id
{, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
state: C_RUNNING, diagnostics: "", exit_status: -1000,

Here are my memory configurations

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5120</value>
<source>yarn-site.xml</source>
</property>

property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
<source>mapred-site.xml</source>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
<source>mapred-site.xml</source>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>
-Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
</value>
<source>mapred-site.xml</source>
</property>

<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
<source>mapred-site.xml</source>
</property>

Regards,
Siddhi

Re: Yarn job stuck with no application master being assigned

Posted by Arun C Murthy <ac...@hortonworks.com>.

Siddhi,

On Jun 21, 2013, at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
> 
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent in terms of node manager. 
> What are the consequences of setting to a higher value?

This means that more AMs will be active concurrently.

One thing to remember: in terms of getting *real* work done an AM is, kinda, pure overhead (currently) in the sense that it does not do actual data-processing - this is true of the MR AM; but really depends on how the AM is implemented. An AM *may* choose to do some actual work of course - depends on the implementation.

With that context, If you have a very small cluster, then too many containers might be used for running AMs with higher values for yarn.scheduler.capacity.maximum-am-resource-percent and overall utilization might go low. As a result, you want to be aware of this.


> Also, I noticed that by default application master needs 1.5GB. Are there any side effects we will face if I lower that to 1GB
I have tried AMs with as low as 200M for small jobs. It really depends on how many tasks you want your job to manage.

Arun

> 
> Siddhi
> 
> 
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> Hi Siddhi,
> 
> Moving this question to the CDH list.
> 
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?
> 
> Have you tried using the Fair Scheduler?
> 
> -Sandy
> 
> 
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:
> Hey All,
> 
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1 NodeManager.
> 
> We have an Map only job that launches a pig job on the cluster(similar to what oozie does)
> 
> We are seeing that the map only job launches the pig script but the pig job is stuck in ACCEPTED state with no trackingUI assigned.
> 
> I dont see any error in the nodemanager logs or the resource manager logs as such.
> 
> 
> On the nodemanager i see this logs 
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048 cluster=memory: 5120
> 
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application Submission: appattempt_1371850881510_0003_000001, user: smehta queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB, usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2, currently active: 2
> 
> Which suggests that the cluster has capacity but still no application master is assigned to it.
> What am I missing?Any help is appreciated.
> 
> I keep seeing this logs on the node manager 
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12484 for container-id container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12009 for container-id container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 
> Here are my memory configurations
> 
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
> 
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
> 
> Regards,
> Siddhi
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

I'm not sure of any consequences for setting a higher value.

You probably only need more than 1GB for very large jobs with 1000s of
tasks.


On Fri, Jun 21, 2013 at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
>
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent
> in terms of node manager.
> What are the consequences of setting to a higher value?
> Also, I noticed that by default application master needs 1.5GB. Are there
> any side effects we will face if I lower that to 1GB
>
> Siddhi
>
>
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Siddhi,
>>
>> Moving this question to the CDH list.
>>
>> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
>> help?
>>
>> Have you tried using the Fair Scheduler?
>>
>> -Sandy
>>
>>
>> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>>
>>> Hey All,
>>>
>>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>>> NodeManager.
>>>
>>> We have an Map only job that launches a pig job on the cluster(similar
>>> to what oozie does)
>>>
>>> We are seeing that the map only job launches the pig script but the pig
>>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>>
>>> I dont see any error in the nodemanager logs or the resource manager
>>> logs as such.
>>>
>>>
>>> On the nodemanager i see this logs
>>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>>> cluster=memory: 5120
>>>
>>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>>> currently active: 2
>>>
>>> Which suggests that the cluster has capacity but still no application
>>> master is assigned to it.
>>> What am I missing?Any help is appreciated.
>>>
>>> I keep seeing this logs on the node manager
>>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12484 for container-id
>>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>>> used; 590.1mb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12009 for container-id
>>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>>> used; 1.4gb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>>
>>> Here are my memory configurations
>>>
>>> <property>
>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>> <value>5120</value>
>>> <source>yarn-site.xml</source>
>>> </property>
>>>
>>> property>
>>> <name>mapreduce.map.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapreduce.reduce.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.child.java.opts</name>
>>> <value>
>>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>>> </value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>yarn.app.mapreduce.am.resource.mb</name>
>>> <value>1024</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> Regards,
>>> Siddhi
>>>
>>
>>
>

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

I'm not sure of any consequences for setting a higher value.

You probably only need more than 1GB for very large jobs with 1000s of
tasks.


On Fri, Jun 21, 2013 at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
>
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent
> in terms of node manager.
> What are the consequences of setting to a higher value?
> Also, I noticed that by default application master needs 1.5GB. Are there
> any side effects we will face if I lower that to 1GB
>
> Siddhi
>
>
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Siddhi,
>>
>> Moving this question to the CDH list.
>>
>> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
>> help?
>>
>> Have you tried using the Fair Scheduler?
>>
>> -Sandy
>>
>>
>> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>>
>>> Hey All,
>>>
>>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>>> NodeManager.
>>>
>>> We have an Map only job that launches a pig job on the cluster(similar
>>> to what oozie does)
>>>
>>> We are seeing that the map only job launches the pig script but the pig
>>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>>
>>> I dont see any error in the nodemanager logs or the resource manager
>>> logs as such.
>>>
>>>
>>> On the nodemanager i see this logs
>>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>>> cluster=memory: 5120
>>>
>>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>>> currently active: 2
>>>
>>> Which suggests that the cluster has capacity but still no application
>>> master is assigned to it.
>>> What am I missing?Any help is appreciated.
>>>
>>> I keep seeing this logs on the node manager
>>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12484 for container-id
>>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>>> used; 590.1mb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12009 for container-id
>>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>>> used; 1.4gb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>>
>>> Here are my memory configurations
>>>
>>> <property>
>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>> <value>5120</value>
>>> <source>yarn-site.xml</source>
>>> </property>
>>>
>>> property>
>>> <name>mapreduce.map.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapreduce.reduce.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.child.java.opts</name>
>>> <value>
>>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>>> </value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>yarn.app.mapreduce.am.resource.mb</name>
>>> <value>1024</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> Regards,
>>> Siddhi
>>>
>>
>>
>

Re: Yarn job stuck with no application master being assigned

Posted by Arun C Murthy <ac...@hortonworks.com>.

Siddhi,

On Jun 21, 2013, at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
> 
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent in terms of node manager. 
> What are the consequences of setting to a higher value?

This means that more AMs will be active concurrently.

One thing to remember: in terms of getting *real* work done an AM is, kinda, pure overhead (currently) in the sense that it does not do actual data-processing - this is true of the MR AM; but really depends on how the AM is implemented. An AM *may* choose to do some actual work of course - depends on the implementation.

With that context, If you have a very small cluster, then too many containers might be used for running AMs with higher values for yarn.scheduler.capacity.maximum-am-resource-percent and overall utilization might go low. As a result, you want to be aware of this.


> Also, I noticed that by default application master needs 1.5GB. Are there any side effects we will face if I lower that to 1GB
I have tried AMs with as low as 200M for small jobs. It really depends on how many tasks you want your job to manage.

Arun

> 
> Siddhi
> 
> 
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> Hi Siddhi,
> 
> Moving this question to the CDH list.
> 
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?
> 
> Have you tried using the Fair Scheduler?
> 
> -Sandy
> 
> 
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:
> Hey All,
> 
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1 NodeManager.
> 
> We have an Map only job that launches a pig job on the cluster(similar to what oozie does)
> 
> We are seeing that the map only job launches the pig script but the pig job is stuck in ACCEPTED state with no trackingUI assigned.
> 
> I dont see any error in the nodemanager logs or the resource manager logs as such.
> 
> 
> On the nodemanager i see this logs 
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048 cluster=memory: 5120
> 
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application Submission: appattempt_1371850881510_0003_000001, user: smehta queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB, usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2, currently active: 2
> 
> Which suggests that the cluster has capacity but still no application master is assigned to it.
> What am I missing?Any help is appreciated.
> 
> I keep seeing this logs on the node manager 
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12484 for container-id container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12009 for container-id container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 
> Here are my memory configurations
> 
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
> 
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
> 
> Regards,
> Siddhi
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

I'm not sure of any consequences for setting a higher value.

You probably only need more than 1GB for very large jobs with 1000s of
tasks.


On Fri, Jun 21, 2013 at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
>
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent
> in terms of node manager.
> What are the consequences of setting to a higher value?
> Also, I noticed that by default application master needs 1.5GB. Are there
> any side effects we will face if I lower that to 1GB
>
> Siddhi
>
>
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Siddhi,
>>
>> Moving this question to the CDH list.
>>
>> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
>> help?
>>
>> Have you tried using the Fair Scheduler?
>>
>> -Sandy
>>
>>
>> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>>
>>> Hey All,
>>>
>>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>>> NodeManager.
>>>
>>> We have an Map only job that launches a pig job on the cluster(similar
>>> to what oozie does)
>>>
>>> We are seeing that the map only job launches the pig script but the pig
>>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>>
>>> I dont see any error in the nodemanager logs or the resource manager
>>> logs as such.
>>>
>>>
>>> On the nodemanager i see this logs
>>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>>> cluster=memory: 5120
>>>
>>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>>> currently active: 2
>>>
>>> Which suggests that the cluster has capacity but still no application
>>> master is assigned to it.
>>> What am I missing?Any help is appreciated.
>>>
>>> I keep seeing this logs on the node manager
>>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12484 for container-id
>>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>>> used; 590.1mb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12009 for container-id
>>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>>> used; 1.4gb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>>
>>> Here are my memory configurations
>>>
>>> <property>
>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>> <value>5120</value>
>>> <source>yarn-site.xml</source>
>>> </property>
>>>
>>> property>
>>> <name>mapreduce.map.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapreduce.reduce.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.child.java.opts</name>
>>> <value>
>>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>>> </value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>yarn.app.mapreduce.am.resource.mb</name>
>>> <value>1024</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> Regards,
>>> Siddhi
>>>
>>
>>
>

Re: Yarn job stuck with no application master being assigned

Posted by Arun C Murthy <ac...@hortonworks.com>.

Siddhi,

On Jun 21, 2013, at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
> 
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent in terms of node manager. 
> What are the consequences of setting to a higher value?

This means that more AMs will be active concurrently.

One thing to remember: in terms of getting *real* work done an AM is, kinda, pure overhead (currently) in the sense that it does not do actual data-processing - this is true of the MR AM; but really depends on how the AM is implemented. An AM *may* choose to do some actual work of course - depends on the implementation.

With that context, If you have a very small cluster, then too many containers might be used for running AMs with higher values for yarn.scheduler.capacity.maximum-am-resource-percent and overall utilization might go low. As a result, you want to be aware of this.


> Also, I noticed that by default application master needs 1.5GB. Are there any side effects we will face if I lower that to 1GB
I have tried AMs with as low as 200M for small jobs. It really depends on how many tasks you want your job to manage.

Arun

> 
> Siddhi
> 
> 
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> Hi Siddhi,
> 
> Moving this question to the CDH list.
> 
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?
> 
> Have you tried using the Fair Scheduler?
> 
> -Sandy
> 
> 
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:
> Hey All,
> 
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1 NodeManager.
> 
> We have an Map only job that launches a pig job on the cluster(similar to what oozie does)
> 
> We are seeing that the map only job launches the pig script but the pig job is stuck in ACCEPTED state with no trackingUI assigned.
> 
> I dont see any error in the nodemanager logs or the resource manager logs as such.
> 
> 
> On the nodemanager i see this logs 
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048 cluster=memory: 5120
> 
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application Submission: appattempt_1371850881510_0003_000001, user: smehta queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB, usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2, currently active: 2
> 
> Which suggests that the cluster has capacity but still no application master is assigned to it.
> What am I missing?Any help is appreciated.
> 
> I keep seeing this logs on the node manager 
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12484 for container-id container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12009 for container-id container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 
> Here are my memory configurations
> 
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
> 
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
> 
> Regards,
> Siddhi
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Yarn job stuck with no application master being assigned

Posted by Arun C Murthy <ac...@hortonworks.com>.

Siddhi,

On Jun 21, 2013, at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
> 
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent in terms of node manager. 
> What are the consequences of setting to a higher value?

This means that more AMs will be active concurrently.

One thing to remember: in terms of getting *real* work done an AM is, kinda, pure overhead (currently) in the sense that it does not do actual data-processing - this is true of the MR AM; but really depends on how the AM is implemented. An AM *may* choose to do some actual work of course - depends on the implementation.

With that context, If you have a very small cluster, then too many containers might be used for running AMs with higher values for yarn.scheduler.capacity.maximum-am-resource-percent and overall utilization might go low. As a result, you want to be aware of this.


> Also, I noticed that by default application master needs 1.5GB. Are there any side effects we will face if I lower that to 1GB
I have tried AMs with as low as 200M for small jobs. It really depends on how many tasks you want your job to manage.

Arun

> 
> Siddhi
> 
> 
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:
> Hi Siddhi,
> 
> Moving this question to the CDH list.
> 
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?
> 
> Have you tried using the Fair Scheduler?
> 
> -Sandy
> 
> 
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:
> Hey All,
> 
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1 NodeManager.
> 
> We have an Map only job that launches a pig job on the cluster(similar to what oozie does)
> 
> We are seeing that the map only job launches the pig script but the pig job is stuck in ACCEPTED state with no trackingUI assigned.
> 
> I dont see any error in the nodemanager logs or the resource manager logs as such.
> 
> 
> On the nodemanager i see this logs 
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048 cluster=memory: 5120
> 
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application Submission: appattempt_1371850881510_0003_000001, user: smehta queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB, usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2, currently active: 2
> 
> Which suggests that the cluster has capacity but still no application master is assigned to it.
> What am I missing?Any help is appreciated.
> 
> I keep seeing this logs on the node manager 
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12484 for container-id container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 12009 for container-id container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 
> 
> Here are my memory configurations
> 
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
> 
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
> 
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
> 
> Regards,
> Siddhi
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

I'm not sure of any consequences for setting a higher value.

You probably only need more than 1GB for very large jobs with 1000s of
tasks.


On Fri, Jun 21, 2013 at 6:07 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
>
> What is the optimal setting for yarn.scheduler.capacity.maximum-am-resource-percent
> in terms of node manager.
> What are the consequences of setting to a higher value?
> Also, I noticed that by default application master needs 1.5GB. Are there
> any side effects we will face if I lower that to 1GB
>
> Siddhi
>
>
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Hi Siddhi,
>>
>> Moving this question to the CDH list.
>>
>> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
>> help?
>>
>> Have you tried using the Fair Scheduler?
>>
>> -Sandy
>>
>>
>> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>>
>>> Hey All,
>>>
>>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>>> NodeManager.
>>>
>>> We have an Map only job that launches a pig job on the cluster(similar
>>> to what oozie does)
>>>
>>> We are seeing that the map only job launches the pig script but the pig
>>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>>
>>> I dont see any error in the nodemanager logs or the resource manager
>>> logs as such.
>>>
>>>
>>> On the nodemanager i see this logs
>>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>>> cluster=memory: 5120
>>>
>>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>>> currently active: 2
>>>
>>> Which suggests that the cluster has capacity but still no application
>>> master is assigned to it.
>>> What am I missing?Any help is appreciated.
>>>
>>> I keep seeing this logs on the node manager
>>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12484 for container-id
>>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>>> used; 590.1mb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12009 for container-id
>>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>>> used; 1.4gb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>>
>>> Here are my memory configurations
>>>
>>> <property>
>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>> <value>5120</value>
>>> <source>yarn-site.xml</source>
>>> </property>
>>>
>>> property>
>>> <name>mapreduce.map.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapreduce.reduce.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.child.java.opts</name>
>>> <value>
>>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>>> </value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>yarn.app.mapreduce.am.resource.mb</name>
>>> <value>1024</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> Regards,
>>> Siddhi
>>>
>>
>>
>

Re: Yarn job stuck with no application master being assigned

Posted by Siddhi Mehta <sm...@gmail.com>.

That solved the problem. Thanks Sandy!!

What is the optimal setting for
yarn.scheduler.capacity.maximum-am-resource-percent
in terms of node manager.
What are the consequences of setting to a higher value?
Also, I noticed that by default application master needs 1.5GB. Are there
any side effects we will face if I lower that to 1GB

Siddhi


On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Siddhi,
>
> Moving this question to the CDH list.
>
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
> help?
>
> Have you tried using the Fair Scheduler?
>
> -Sandy
>
>
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>
>> Hey All,
>>
>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>> NodeManager.
>>
>> We have an Map only job that launches a pig job on the cluster(similar to
>> what oozie does)
>>
>> We are seeing that the map only job launches the pig script but the pig
>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>
>> I dont see any error in the nodemanager logs or the resource manager logs
>> as such.
>>
>>
>> On the nodemanager i see this logs
>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>> cluster=memory: 5120
>>
>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>> currently active: 2
>>
>> Which suggests that the cluster has capacity but still no application
>> master is assigned to it.
>> What am I missing?Any help is appreciated.
>>
>> I keep seeing this logs on the node manager
>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12484 for container-id
>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>> used; 590.1mb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12009 for container-id
>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>> used; 1.4gb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>
>> Here are my memory configurations
>>
>> <property>
>> <name>yarn.nodemanager.resource.memory-mb</name>
>> <value>5120</value>
>> <source>yarn-site.xml</source>
>> </property>
>>
>> property>
>> <name>mapreduce.map.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapreduce.reduce.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>
>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>> </value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>yarn.app.mapreduce.am.resource.mb</name>
>> <value>1024</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> Regards,
>> Siddhi
>>
>
>

Re: Yarn job stuck with no application master being assigned

Posted by Siddhi Mehta <sm...@gmail.com>.

That solved the problem. Thanks Sandy!!

What is the optimal setting for
yarn.scheduler.capacity.maximum-am-resource-percent
in terms of node manager.
What are the consequences of setting to a higher value?
Also, I noticed that by default application master needs 1.5GB. Are there
any side effects we will face if I lower that to 1GB

Siddhi


On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Siddhi,
>
> Moving this question to the CDH list.
>
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
> help?
>
> Have you tried using the Fair Scheduler?
>
> -Sandy
>
>
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>
>> Hey All,
>>
>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>> NodeManager.
>>
>> We have an Map only job that launches a pig job on the cluster(similar to
>> what oozie does)
>>
>> We are seeing that the map only job launches the pig script but the pig
>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>
>> I dont see any error in the nodemanager logs or the resource manager logs
>> as such.
>>
>>
>> On the nodemanager i see this logs
>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>> cluster=memory: 5120
>>
>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>> currently active: 2
>>
>> Which suggests that the cluster has capacity but still no application
>> master is assigned to it.
>> What am I missing?Any help is appreciated.
>>
>> I keep seeing this logs on the node manager
>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12484 for container-id
>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>> used; 590.1mb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12009 for container-id
>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>> used; 1.4gb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>
>> Here are my memory configurations
>>
>> <property>
>> <name>yarn.nodemanager.resource.memory-mb</name>
>> <value>5120</value>
>> <source>yarn-site.xml</source>
>> </property>
>>
>> property>
>> <name>mapreduce.map.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapreduce.reduce.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>
>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>> </value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>yarn.app.mapreduce.am.resource.mb</name>
>> <value>1024</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> Regards,
>> Siddhi
>>
>
>

Re: Yarn job stuck with no application master being assigned

Posted by Siddhi Mehta <sm...@gmail.com>.

That solved the problem. Thanks Sandy!!

What is the optimal setting for
yarn.scheduler.capacity.maximum-am-resource-percent
in terms of node manager.
What are the consequences of setting to a higher value?
Also, I noticed that by default application master needs 1.5GB. Are there
any side effects we will face if I lower that to 1GB

Siddhi


On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Siddhi,
>
> Moving this question to the CDH list.
>
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
> help?
>
> Have you tried using the Fair Scheduler?
>
> -Sandy
>
>
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>
>> Hey All,
>>
>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>> NodeManager.
>>
>> We have an Map only job that launches a pig job on the cluster(similar to
>> what oozie does)
>>
>> We are seeing that the map only job launches the pig script but the pig
>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>
>> I dont see any error in the nodemanager logs or the resource manager logs
>> as such.
>>
>>
>> On the nodemanager i see this logs
>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>> cluster=memory: 5120
>>
>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>> currently active: 2
>>
>> Which suggests that the cluster has capacity but still no application
>> master is assigned to it.
>> What am I missing?Any help is appreciated.
>>
>> I keep seeing this logs on the node manager
>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12484 for container-id
>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>> used; 590.1mb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12009 for container-id
>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>> used; 1.4gb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>
>> Here are my memory configurations
>>
>> <property>
>> <name>yarn.nodemanager.resource.memory-mb</name>
>> <value>5120</value>
>> <source>yarn-site.xml</source>
>> </property>
>>
>> property>
>> <name>mapreduce.map.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapreduce.reduce.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>
>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>> </value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>yarn.app.mapreduce.am.resource.mb</name>
>> <value>1024</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> Regards,
>> Siddhi
>>
>
>

Re: Yarn job stuck with no application master being assigned

Posted by Siddhi Mehta <sm...@gmail.com>.

That solved the problem. Thanks Sandy!!

What is the optimal setting for
yarn.scheduler.capacity.maximum-am-resource-percent
in terms of node manager.
What are the consequences of setting to a higher value?
Also, I noticed that by default application master needs 1.5GB. Are there
any side effects we will face if I lower that to 1GB

Siddhi


On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Siddhi,
>
> Moving this question to the CDH list.
>
> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
> help?
>
> Have you tried using the Fair Scheduler?
>
> -Sandy
>
>
> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com>wrote:
>
>> Hey All,
>>
>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>> NodeManager.
>>
>> We have an Map only job that launches a pig job on the cluster(similar to
>> what oozie does)
>>
>> We are seeing that the map only job launches the pig script but the pig
>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>
>> I dont see any error in the nodemanager logs or the resource manager logs
>> as such.
>>
>>
>> On the nodemanager i see this logs
>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>> cluster=memory: 5120
>>
>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>> currently active: 2
>>
>> Which suggests that the cluster has capacity but still no application
>> master is assigned to it.
>> What am I missing?Any help is appreciated.
>>
>> I keep seeing this logs on the node manager
>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12484 for container-id
>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>> used; 590.1mb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>> usage of ProcessTree 12009 for container-id
>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>> used; 1.4gb of 2.1gb virtual memory used
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
>> out status for container: container_id {, app_attempt_id {, application_id
>> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
>> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>
>> Here are my memory configurations
>>
>> <property>
>> <name>yarn.nodemanager.resource.memory-mb</name>
>> <value>5120</value>
>> <source>yarn-site.xml</source>
>> </property>
>>
>> property>
>> <name>mapreduce.map.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapreduce.reduce.memory.mb</name>
>> <value>512</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>
>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>> -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>> </value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> <property>
>> <name>yarn.app.mapreduce.am.resource.mb</name>
>> <value>1024</value>
>> <source>mapred-site.xml</source>
>> </property>
>>
>> Regards,
>> Siddhi
>>
>
>

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Siddhi,

Moving this question to the CDH list.

Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?

Have you tried using the Fair Scheduler?

-Sandy


On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> Hey All,
>
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
> NodeManager.
>
> We have an Map only job that launches a pig job on the cluster(similar to
> what oozie does)
>
> We are seeing that the map only job launches the pig script but the pig
> job is stuck in ACCEPTED state with no trackingUI assigned.
>
> I dont see any error in the nodemanager logs or the resource manager logs
> as such.
>
>
> On the nodemanager i see this logs
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
> cluster=memory: 5120
>
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
> currently active: 2
>
> Which suggests that the cluster has capacity but still no application
> master is assigned to it.
> What am I missing?Any help is appreciated.
>
> I keep seeing this logs on the node manager
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12484 for container-id
> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
> used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12009 for container-id
> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
> used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>
> Here are my memory configurations
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
>
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
>
> Regards,
> Siddhi
>

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Siddhi,

Moving this question to the CDH list.

Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?

Have you tried using the Fair Scheduler?

-Sandy


On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> Hey All,
>
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
> NodeManager.
>
> We have an Map only job that launches a pig job on the cluster(similar to
> what oozie does)
>
> We are seeing that the map only job launches the pig script but the pig
> job is stuck in ACCEPTED state with no trackingUI assigned.
>
> I dont see any error in the nodemanager logs or the resource manager logs
> as such.
>
>
> On the nodemanager i see this logs
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
> cluster=memory: 5120
>
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
> currently active: 2
>
> Which suggests that the cluster has capacity but still no application
> master is assigned to it.
> What am I missing?Any help is appreciated.
>
> I keep seeing this logs on the node manager
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12484 for container-id
> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
> used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12009 for container-id
> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
> used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>
> Here are my memory configurations
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
>
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
>
> Regards,
> Siddhi
>

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Siddhi,

Moving this question to the CDH list.

Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?

Have you tried using the Fair Scheduler?

-Sandy


On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> Hey All,
>
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
> NodeManager.
>
> We have an Map only job that launches a pig job on the cluster(similar to
> what oozie does)
>
> We are seeing that the map only job launches the pig script but the pig
> job is stuck in ACCEPTED state with no trackingUI assigned.
>
> I dont see any error in the nodemanager logs or the resource manager logs
> as such.
>
>
> On the nodemanager i see this logs
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
> cluster=memory: 5120
>
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
> currently active: 2
>
> Which suggests that the cluster has capacity but still no application
> master is assigned to it.
> What am I missing?Any help is appreciated.
>
> I keep seeing this logs on the node manager
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12484 for container-id
> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
> used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12009 for container-id
> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
> used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>
> Here are my memory configurations
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
>
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
>
> Regards,
> Siddhi
>

Re: Yarn job stuck with no application master being assigned

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Siddhi,

Moving this question to the CDH list.

Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 help?

Have you tried using the Fair Scheduler?

-Sandy


On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <sm...@gmail.com> wrote:

> Hey All,
>
> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
> NodeManager.
>
> We have an Map only job that launches a pig job on the cluster(similar to
> what oozie does)
>
> We are seeing that the map only job launches the pig script but the pig
> job is stuck in ACCEPTED state with no trackingUI assigned.
>
> I dont see any error in the nodemanager logs or the resource manager logs
> as such.
>
>
> On the nodemanager i see this logs
> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
> cluster=memory: 5120
>
> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
> currently active: 2
>
> Which suggests that the cluster has capacity but still no application
> master is assigned to it.
> What am I missing?Any help is appreciated.
>
> I keep seeing this logs on the node manager
> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12484 for container-id
> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
> used; 590.1mb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory usage
> of ProcessTree 12009 for container-id
> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
> used; 1.4gb of 2.1gb virtual memory used
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 1, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl - Sending
> out status for container: container_id {, app_attempt_id {, application_id
> {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, }, id: 2, },
> state: C_RUNNING, diagnostics: "", exit_status: -1000,
>
> Here are my memory configurations
>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>5120</value>
> <source>yarn-site.xml</source>
> </property>
>
> property>
> <name>mapreduce.map.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>512</value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>
> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
> </value>
> <source>mapred-site.xml</source>
> </property>
>
> <property>
> <name>yarn.app.mapreduce.am.resource.mb</name>
> <value>1024</value>
> <source>mapred-site.xml</source>
> </property>
>
> Regards,
> Siddhi
>