You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Anfernee Xu <an...@gmail.com> on 2016/03/10 03:14:08 UTC

Weird latency in task launching in Yarn 2.2.0

I'm running a Hadoop 2.2.0 cluster, I observed a weird latency in launching
one map task for my MR job. Here's MRAppMaster log,


2016-03-09 22:07:09,737 INFO [ContainerLauncher #6]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Launching attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:07:09,737 INFO [ContainerLauncher #6]
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
Opening proxy : slc10ank.us.oracle.com:1340
2016-03-09 22:07:09,828 INFO [ContainerLauncher #6]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle
port returned by ContainerManager for
attempt_1454695084051_2119658_m_000006_0 : 13562
2016-03-09 22:07:09,828 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt:
[attempt_1454695084051_2119658_m_000006_0] using containerId:
[container_1454695084051_2119658_01_000011 on NM: [
slc10ank.us.oracle.com:1340]
2016-03-09 22:07:09,828 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1454695084051_2119658_m_000006_0 TaskAttempt Transitioned from
ASSIGNED to RUNNING
2016-03-09 22:07:09,829 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.MRAppMetrics: Task is starting on [
slc10ank.us.oracle.com]

As you can see from above long, the task was started at 2016-03-09
22:07:09,829, then the next corresponding logging message for this task is

2016-03-09 22:21:42,406 INFO [IPC Server handler 8 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID:
jvm_1454695084051_2119658_m_000011 given task:
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:46,282 INFO [IPC Server handler 4 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:49,301 INFO [IPC Server handler 5 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:52,318 INFO [IPC Server handler 6 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:55,330 INFO [IPC Server handler 11 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:58,343 INFO [IPC Server handler 9 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:01,362 INFO [IPC Server handler 7 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:04,371 INFO [IPC Server handler 3 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:07,387 INFO [IPC Server handler 2 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:10,407 INFO [IPC Server handler 10 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:13,422 INFO [IPC Server handler 13 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0

It's almost 15 minute later, I also can prove the task was actually started
around 22:21:41  by looking at the log of the task in question as below

2016-03-09 22:21:41,585 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-03-09 22:21:41,595 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-03-09 22:21:41,880 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2016-03-09 22:21:41,988 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2016-03-09 22:21:41,988 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
started
2016-03-09 22:21:42,002 INFO [main] org.apache.hadoop.mapred.YarnChild:
Executing with tokens:
2016-03-09 22:21:42,003 INFO [main] org.apache.hadoop.mapred.YarnChild:
Kind: mapreduce.job, Service: job_1454695084051_2119658, Ident:
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@7364985f)
2016-03-09 22:21:42,101 INFO [main] org.apache.hadoop.mapred.YarnChild:
Sleeping for 0ms before retrying again. Got null now.

So my question is what's the possible reason? any host specific setting
needs to tweaked? the reason I asked is this host is recently added.

Thanks for your help

-- 
--Anfernee