You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nicolae Marasoiu <ni...@adswizz.com> on 2015/11/19 08:37:13 UTC

map task frozen from master(s) perspective, but no process is there, and task log reports completion

Hi,

I have a map task "slot" occupied with a task that does not make progress for hours, and in fact is seen by yarn as NEW and STARTING. (Since we use yarn / hadoop2, it is not a slot per-se, but the resource mechanism works as dynamically computing slots - for instance I have top 5 map+reduce tasks running in current config. I cannot change this while the job is still running right?)

I have found a log of the task shwn completion:
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 63496; bufvoid = 104857600
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 26201248(104804992); length = 13149/6553600
2015-11-19 04:01:14,851 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2015-11-19 04:01:14,858 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1447872797537_0001_m_002241_0 is done. And is in the process of committing
2015-11-19 04:01:14,889 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1447872797537_0001_m_002241_0' done.
2015-11-19 04:01:14,889 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2015-11-19 04:01:14,890 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2015-11-19 04:01:14,890 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.

My hypothesis is that the task could not report its progress or completion to the application master, but in this case the master should have timed it out I believe?
Can I kill the task attempt in any way to allow it to restart?

Pls advise,
Nicu