You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Laxman (JIRA)" <ji...@apache.org> on 2015/09/16 10:09:46 UTC
[jira] [Resolved] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Laxman resolved MAPREDUCE-6351.
-------------------------------
Resolution: Duplicate
Assignee: Laxman
> Reducer hung in copy phase.
> ---------------------------
>
> Key: MAPREDUCE-6351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 2.6.0
> Reporter: Laxman
> Assignee: Laxman
> Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out
>
>
> *Problem*
> Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed.
> *Observations*
> - Verfied gc logs. Found no memory related issues. Attached the logs.
> - Verified thread dumps. Found no thread related problems.
> - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen.
> - Merge thread is alive and in wait state.
> *Analysis*
> On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified.
> Here is the suspect code flow.
> *Thread #1*
> Fetcher thread - notification comes first
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set<T>)
> {code}
> synchronized(pendingToBeMerged) {
> pendingToBeMerged.addLast(toMergeInputs);
> pendingToBeMerged.notifyAll();
> }
> {code}
> *Thread #2*
> Merge Thread - goes to wait state (Notification goes unconsumed)
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
> {code}
> synchronized (pendingToBeMerged) {
> while(pendingToBeMerged.size() <= 0) {
> pendingToBeMerged.wait();
> }
> // Pickup the inputs to merge.
> inputs = pendingToBeMerged.removeFirst();
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)