You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Anfernee Xu <an...@gmail.com> on 2016/03/10 09:10:17 UTC

Why localizing takes that long?

I'm running a Hadoop 2.2.0 cluster, I observed a weird latency in launching
map task on some nodes for my MR job, by looking at Yarn UI, drill down to
the specific task, the UI showed me the task is not started yet, still in
"localized state", this will keep like 5-15 minutes, then it can proceed.
Here's MRAppMaster log,


2016-03-09 22:07:09,737 INFO [ContainerLauncher #6]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Launching attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:07:09,737 INFO [ContainerLauncher #6]
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
Opening proxy :slc10ank.us.oracle.com:1340
2016-03-09 22:07:09,828 INFO [ContainerLauncher #6]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle
port returned by ContainerManager for
attempt_1454695084051_2119658_m_000006_0 : 13562
2016-03-09 22:07:09,828 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt:
[attempt_1454695084051_2119658_m_000006_0] using containerId:
[container_1454695084051_2119658_01_000011 on NM: [
slc10ank.us.oracle.com:1340]
2016-03-09 22:07:09,828 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1454695084051_2119658_m_000006_0 TaskAttempt Transitioned from
ASSIGNED to RUNNING
2016-03-09 22:07:09,829 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.MRAppMetrics: Task is starting on [
slc10ank.us.oracle.com]

As you can see from above long, the task was started at 2016-03-09
22:07:09,829, then the next corresponding logging message for this task is

2016-03-09 22:21:42,406 INFO [IPC Server handler 8 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID:
jvm_1454695084051_2119658_m_000011 given task:
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:46,282 INFO [IPC Server handler 4 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:49,301 INFO [IPC Server handler 5 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:52,318 INFO [IPC Server handler 6 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:55,330 INFO [IPC Server handler 11 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:21:58,343 INFO [IPC Server handler 9 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:01,362 INFO [IPC Server handler 7 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:04,371 INFO [IPC Server handler 3 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:07,387 INFO [IPC Server handler 2 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:10,407 INFO [IPC Server handler 10 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0
2016-03-09 22:22:13,422 INFO [IPC Server handler 13 on 23425]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1454695084051_2119658_m_000006_0

It's almost 15 minute later, I also can prove the task was actually started
around 22:21:41  by looking at the log of the task in question as below

2016-03-09 22:21:41,585 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-03-09 22:21:41,595 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-03-09 22:21:41,880 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2016-03-09 22:21:41,988 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2016-03-09 22:21:41,988 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
started
2016-03-09 22:21:42,002 INFO [main] org.apache.hadoop.mapred.YarnChild:
Executing with tokens:
2016-03-09 22:21:42,003 INFO [main] org.apache.hadoop.mapred.YarnChild:
Kind: mapreduce.job, Service: job_1454695084051_2119658, Ident:
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@7364985f)
2016-03-09 22:21:42,101 INFO [main] org.apache.hadoop.mapred.YarnChild:
Sleeping for 0ms before retrying again. Got null now.

So my question is what can cause "localizing" take that long? As fa as I
know, "localizing" is about download required binary/library/config files
from HDFS, so what might be the culprit?

Thanks for your help

-- 
--Anfernee