You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Jacob Rideout (JIRA)" <ji...@apache.org> on 2010/03/07 21:18:27 UTC

[jira] Created: (MAPREDUCE-1571) OutOfMemoryError during shuffle

OutOfMemoryError during shuffle
-------------------------------

                 Key: MAPREDUCE-1571
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1571
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.20.2, 0.20.1
            Reporter: Jacob Rideout


A OutOfMemoryError can occur when determining if the shuffle can be accomplished in memory

2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 4191933 bytes (435311 raw bytes) into RAM from
attempt_201003060739_0002_m_000061_0
2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task
attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
attempt_201003060739_0002_m_000202_0
2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_201003060739_0002_r_000000_0 adding host
hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds
2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from previous
failures
2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner:
attempt_201003060739_0002_r_000000_0 : Map output copy failure :
java.lang.OutOfMemoryError: Java heap space
       at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
       at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
       at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
       at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)


Ted Yu identified the following potential solution:

I think there is mismatch (in ReduceTask.java) between:
     this.numCopiers = conf.getInt("mapred.reduce.parallel.copies", 5);
and:
       maxSingleShuffleLimit = (long)(maxSize *
MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f

because
     copiers = new ArrayList<MapOutputCopier>(numCopiers);
so the total memory allocated for in-mem shuffle is 1.25 * maxSize

A JIRA should be filed to correlate the constant 5 above and
MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAPREDUCE-1571) OutOfMemoryError during shuffle

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas resolved MAPREDUCE-1571.
--------------------------------------

    Resolution: Duplicate

This is a duplicate of MAPREDUCE-1182.

{{MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION}} governs the maximum size of a map output segment that will be stored in memory. The aggregate limit is enforced by a separate mechanism.

> OutOfMemoryError during shuffle
> -------------------------------
>
>                 Key: MAPREDUCE-1571
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1571
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Jacob Rideout
>
> A OutOfMemoryError can occur when determining if the shuffle can be accomplished in memory
> 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 4191933 bytes (435311 raw bytes) into RAM from
> attempt_201003060739_0002_m_000061_0
> 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask: Task
> attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
> attempt_201003060739_0002_m_000202_0
> 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
> attempt_201003060739_0002_r_000000_0 adding host
> hd37.dfs.returnpath.net to penalty box, next contact in 4 seconds
> 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from previous
> failures
> 2010-03-06 07:54:50,223 FATAL org.apache.hadoop.mapred.TaskRunner:
> attempt_201003060739_0002_r_000000_0 : Map output copy failure :
> java.lang.OutOfMemoryError: Java heap space
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
> Ted Yu identified the following potential solution:
> I think there is mismatch (in ReduceTask.java) between:
>      this.numCopiers = conf.getInt("mapred.reduce.parallel.copies", 5);
> and:
>        maxSingleShuffleLimit = (long)(maxSize *
> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
> because
>      copiers = new ArrayList<MapOutputCopier>(numCopiers);
> so the total memory allocated for in-mem shuffle is 1.25 * maxSize
> A JIRA should be filed to correlate the constant 5 above and
> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.