You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/07/26 23:59:48 UTC

[jira] [Resolved] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe resolved MAPREDUCE-5423.
-----------------------------------

    Resolution: Duplicate
    
> Rare deadlock situation when reducers try to fetch map output
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-5423
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha
>            Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_000001_0 Thread started: EventFetcher for fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000017_0 sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000017_0 decomp: 27 len: 31 to MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_000017_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -&gt; map-output of size: 27, inMemoryMapOutputs.size() -&gt; 1, commitMemory -&gt; 0, usedMemory -&gt;27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000016_0 sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000016_0 decomp: 55841282 len: 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_000016_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -&gt; map-output of size: 55841282, inMemoryMapOutputs.size() -&gt; 2, commitMemory -&gt; 27, usedMemory -&gt;55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_000015_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -&gt; map-output of size: 118022137, inMemoryMapOutputs.size() -&gt; 3, commitMemory -&gt; 55841309, usedMemory -&gt;173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
> 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:42,190 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000014_0 sent hash and receievd reply
> 2013-07-18 04:32:42,277 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000014_0 decomp: 140715962 len: 140715966 to MEMORY
> 2013-07-18 04:32:42,493 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 140715962 bytes from map-output for attempt_1373902166027_0622_m_000014_0
> 2013-07-18 04:32:42,493 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -&gt; map-output of size: 140715962, inMemoryMapOutputs.size() -&gt; 4, commitMemory -&gt; 173863446, usedMemory -&gt;314579408
> 2013-07-18 04:32:42,494 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 306s
> 2013-07-18 04:32:43,192 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:43,192 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:43,192 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:43,195 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000013_0 sent hash and receievd reply
> 2013-07-18 04:32:43,280 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000013_0 decomp: 141243082 len: 141243086 to MEMORY
> 2013-07-18 04:32:43,506 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 141243082 bytes from map-output for attempt_1373902166027_0622_m_000013_0
> 2013-07-18 04:32:43,506 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -&gt; map-output of size: 141243082, inMemoryMapOutputs.size() -&gt; 5, commitMemory -&gt; 314579408, usedMemory -&gt;455822490
> 2013-07-18 04:32:43,507 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 315s
> 2013-07-18 04:32:44,195 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:44,195 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:44,195 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:44,198 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000011_0 sent hash and receievd reply
> 2013-07-18 04:32:44,305 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000011_0 decomp: 173528412 len: 173528416 to MEMORY
> ...
> 2013-07-18 04:32:56,901 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 282474777 bytes from map-output for attempt_1373902166027_0622_m_000001_0
> 2013-07-18 04:32:56,901 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -&gt; map-output of size: 282474777, inMemoryMapOutputs.size() -&gt; 5, commitMemory -&gt; 1179552807, usedMemory -&gt;1462027584
> 2013-07-18 04:32:56,901 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#2 in 2682s
> 2013-07-18 04:32:56,901 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 4 to fetcher#4
> 2013-07-18 04:32:56,902 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 4 of 4 to 101-09-04.sc1.verticloud.com:8080 to fetcher#4
> 2013-07-18 04:32:56,904 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000006_0,attempt_1373902166027_0622_m_000002_0,attempt_1373902166027_0622_m_000003_0,attempt_1373902166027_0622_m_000005_0 sent hash and receievd reply
> 2013-07-18 04:32:57,336 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:57,414 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#4 about to shuffle output of map attempt_1373902166027_0622_m_000006_0 decomp: 280156692 len: 280156696 to MEMORY
> 2013-07-18 04:32:57,867 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 280156692 bytes from map-output for attempt_1373902166027_0622_m_000006_0
> 2013-07-18 04:32:57,867 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -&gt; map-output of size: 280156692, inMemoryMapOutputs.size() -&gt; 6, commitMemory -&gt; 1462027584, usedMemory -&gt;1742184276
> 2013-07-18 04:32:57,900 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#4 - MergerManager returned Status.WAIT ...
> 2013-07-18 04:32:57,901 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#4 in 999s
> 2013-07-18 04:32:57,901 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 4 to fetcher#3
> 2013-07-18 04:32:57,901 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 4 of 4 to 101-09-04.sc1.verticloud.com:8080 to fetcher#3
> 2013-07-18 04:32:57,903 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000000_0,attempt_1373902166027_0622_m_000002_0,attempt_1373902166027_0622_m_000005_0,attempt_1373902166027_0622_m_000003_0 sent hash and receievd reply
> 2013-07-18 04:32:57,904 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 - MergerManager returned Status.WAIT ...
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira