You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Laxman (JIRA)" <ji...@apache.org> on 2015/05/02 18:52:06 UTC

[jira] [Created] (MAPREDUCE-6351) Reducer hung in copy phase.

Laxman created MAPREDUCE-6351:
---------------------------------

             Summary: Reducer hung in copy phase.
                 Key: MAPREDUCE-6351
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.6.0
            Reporter: Laxman


*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. 

*Analysis*
- Verfied gc logs. Found no memory related issues. Attache
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen.
- Merge thread is alive and in wait state.

On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. 

Here is the suspect code flow.

*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set<T>)
{code}
      synchronized(pendingToBeMerged) {
        pendingToBeMerged.addLast(toMergeInputs);
        pendingToBeMerged.notifyAll();
      }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
        synchronized (pendingToBeMerged) {
          while(pendingToBeMerged.size() <= 0) {
            pendingToBeMerged.wait();
          }
          // Pickup the inputs to merge.
          inputs = pendingToBeMerged.removeFirst();
        }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)