You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bo Shi <bs...@visiblemeasures.com> on 2010/05/08 05:08:28 UTC

Re: Shuffle In Memory OutOfMemoryError

Hey Ted, any further insights on this?  We're encountering a similar
issue (on CD2).  I'll be applying MAPREDUCE-1182 to see if that
resolves our case but it sounds like that JIRA didn't completely
eliminate the problem for some folks.

On Wed, Mar 10, 2010 at 11:54 PM, Ted Yu <yu...@gmail.com> wrote:
> I pressed send key a bit early.
>
> I will have to dig a bit deeper.
> Hopefully someone can find reader.close() call after which I will look for
> another possible root cause :-)
>
>
> On Wed, Mar 10, 2010 at 7:48 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Thanks to Andy for the log he provided.
>>
>> You can see from the log below that size increased steadily from 341535057
>> to 408181692, approaching maxSize. Then OOME:
>>
>>
>> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
>> pos=start requestedSize=3893000 size=341535057 numPendingRequests=0
>> maxSize=417601952
>> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
>> pos=end requestedSize=3893000 size=345428057 numPendingRequests=0
>> maxSize=417601952
>> ...
>> 2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
>> pos=end requestedSize=635753 size=408181692 numPendingRequests=0
>> maxSize=417601952
>> 2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task
>> attempt_201003101826_0001_r_000004_0: Failed fetch #1 from
>> attempt_201003101826_0001_m_000875_0
>>
>> 2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask:
>> attempt_201003101826_0001_r_000004_0 adding host hd17.dfs.returnpath.netto penalty box, next contact in 4 seconds
>> 2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask:
>> attempt_201003101826_0001_r_000004_0: Got 1 map-outputs from previous
>> failures
>> 2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner:
>> attempt_201003101826_0001_r_000004_0 : Map output copy failure :
>> java.lang.OutOfMemoryError: Java heap space
>>         at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513)
>>         at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413)
>>         at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266)
>>         at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200)
>>
>> Looking at the call to unreserve() in ReduceTask, two were for IOException
>> and the other was for Sanity check (line 1557). Meaning they wouldn't be
>> called in normal execution path.
>>
>> I see one call in IFile.InMemoryReader close() method:
>>       // Inform the RamManager
>>       ramManager.unreserve(bufferSize);
>>
>> And InMemoryReader is used in createInMemorySegments():
>>           Reader<K, V> reader =
>>             new InMemoryReader<K, V>(ramManager, mo.mapAttemptId,
>>                                      mo.data, 0, mo.data.length);
>>
>> But I don't see reader.close() in ReduceTask file.
>>
>
>
>> On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas <ch...@yahoo-inc.com>wrote:
>>
>>> I don't think this OOM is a framework bug per se, and given the
>>> rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning the
>>> 0.20 shuffle semantics is likely not worthwhile (though data informing
>>> improvements to trunk would be excellent). Most likely (and tautologically),
>>> ReduceTask simply requires more memory than is available and the job failure
>>> can be avoided by either 0) increasing the heap size or 1) lowering
>>> mapred.shuffle.input.buffer.percent. Most of the tasks we run have a heap of
>>> 1GB. For a reduce fetching >200k map outputs, that's a reasonable, even
>>> stingy amount of space. -C
>>>
>>>
>>> On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:
>>>
>>>  I verified that size and maxSize are long. This means MR-1182 didn't
>>>> resolve
>>>> Andy's issue.
>>>>
>>>> According to Andy:
>>>> At the beginning of the job there are 209,754 pending map tasks and 32
>>>> pending reduce tasks
>>>>
>>>> My guess is that GC wasn't reclaiming memory fast enough, leading to OOME
>>>> because of large number of in-memory shuffle candidates.
>>>>
>>>> My suggestion for Andy would be to:
>>>> 1. add -*verbose*:*gc as JVM parameter
>>>> 2. modify reserve() slightly to calculate the maximum outstanding
>>>> numPendingRequests and print the maximum.
>>>>
>>>> Based on the output from above two items, we can discuss solution.
>>>> My intuition is to place upperbound on numPendingRequests beyond which
>>>> canFitInMemory() returns false.
>>>> *
>>>> My two cents.
>>>>
>>>> On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
>>>> <ch...@yahoo-inc.com>wrote:
>>>>
>>>>  That section of code is unmodified in MR-1182. See the patches/svn log.
>>>>> -C
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>>
>>>>> On Mar 9, 2010, at 7:44 PM, "Ted Yu" <yu...@gmail.com> wrote:
>>>>>
>>>>> I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
>>>>>
>>>>>> This is what I see in ReduceTask (line 999):
>>>>>>   public synchronized boolean reserve(int requestedSize, InputStream
>>>>>> in)
>>>>>>
>>>>>>   throws InterruptedException {
>>>>>>     // Wait till the request can be fulfilled...
>>>>>>     while ((size + requestedSize) > maxSize) {
>>>>>>
>>>>>> I don't see the fix from MR-1182.
>>>>>>
>>>>>> That's why I suggested to Andy that he manually apply MR-1182.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins <
>>>>>> andy.sautins@returnpath.net
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  Thanks Christopher.
>>>>>>>
>>>>>>> The heap size for reduce tasks is configured to be 640M (
>>>>>>> mapred.child.java.opts set to -Xmx640m ).
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Christopher Douglas [mailto:chrisdo@yahoo-inc.com]
>>>>>>> Sent: Tuesday, March 09, 2010 5:19 PM
>>>>>>> To: common-user@hadoop.apache.org
>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>>
>>>>>>> No, MR-1182 is included in 0.20.2
>>>>>>>
>>>>>>> What heap size have you set for your reduce tasks? -C
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yu...@gmail.com> wrote:
>>>>>>>
>>>>>>> Andy:
>>>>>>>
>>>>>>>> You need to manually apply the patch.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <
>>>>>>>>
>>>>>>>>  andy.sautins@returnpath.net
>>>>>>>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
>>>>>>>>> in the
>>>>>>>>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
>>>>>>>>> re-ran
>>>>>>>>> the same job scenarios.  Running with mapred.reduce.parallel.copies
>>>>>>>>> set to 1
>>>>>>>>> and continue to have the same Java heap space error.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>>>>>>>>> Sent: Tuesday, March 09, 2010 12:56 PM
>>>>>>>>> To: common-user@hadoop.apache.org
>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>>>>
>>>>>>>>> This issue has been resolved in
>>>>>>>>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>>>>>>>>>
>>>>>>>>> Please apply the patch
>>>>>>>>> M1182-1v20.patch<
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <
>>>>>>>>>
>>>>>>>>>  andy.sautins@returnpath.net
>>>>>>>>
>>>>>>>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Thanks Ted.  Very helpful.  You are correct that I misunderstood
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>  code
>>>>>>>>>
>>>>>>>>>  at ReduceTask.java:1535.  I missed the fact that it's in a
>>>>>>>>>> IOException
>>>>>>>>>>
>>>>>>>>>>  catch
>>>>>>>>>
>>>>>>>>>  block.  My mistake.  That's what I get for being in a rush.
>>>>>>>>>>
>>>>>>>>>> For what it's worth I did re-run the job with
>>>>>>>>>> mapred.reduce.parallel.copies set with values from 5 all the way
>>>>>>>>>> down to
>>>>>>>>>>
>>>>>>>>>>  1.
>>>>>>>>>
>>>>>>>>>  All failed with the same error:
>>>>>>>>>>
>>>>>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
>>>>>>>>>>
>>>>>>>>> (ReduceTask.java:1195)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So from that it does seem like something else might be going on,
>>>>>>>>>> yes?
>>>>>>>>>>
>>>>>>>>>>  I
>>>>>>>>>
>>>>>>>>>  need to do some more research.
>>>>>>>>>>
>>>>>>>>>> I appreciate your insights.
>>>>>>>>>>
>>>>>>>>>> Andy
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>>>>>>>>>> Sent: Sunday, March 07, 2010 3:38 PM
>>>>>>>>>> To: common-user@hadoop.apache.org
>>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>>>>>
>>>>>>>>>> My observation is based on this call chain:
>>>>>>>>>> MapOutputCopier.run() calling copyOutput() calling getMapOutput()
>>>>>>>>>> calling
>>>>>>>>>> ramManager.canFitInMemory(decompressedLength)
>>>>>>>>>>
>>>>>>>>>> Basically ramManager.canFitInMemory() makes decision without
>>>>>>>>>> considering
>>>>>>>>>> the
>>>>>>>>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
>>>>>>>>>> total
>>>>>>>>>>
>>>>>>>>>>  heap
>>>>>>>>>
>>>>>>>>>  may be used in shuffling if default parameters were used.
>>>>>>>>>> Of course, you should check the value for
>>>>>>>>>> mapred.reduce.parallel.copies
>>>>>>>>>>
>>>>>>>>>>  to
>>>>>>>>>
>>>>>>>>>  see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
>>>>>>>>>>
>>>>>>>>>> About ramManager.unreserve() call, ReduceTask.java from hadoop
>>>>>>>>>> 0.20.2
>>>>>>>>>>
>>>>>>>>>>  only
>>>>>>>>>
>>>>>>>>>  has 2731 lines. So I have to guess the location of the code
>>>>>>>>>> snippet you
>>>>>>>>>> provided.
>>>>>>>>>> I found this around line 1535:
>>>>>>>>>>   } catch (IOException ioe) {
>>>>>>>>>>     LOG.info("Failed to shuffle from " +
>>>>>>>>>> mapOutputLoc.getTaskAttemptId(),
>>>>>>>>>>              ioe);
>>>>>>>>>>
>>>>>>>>>>     // Inform the ram-manager
>>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
>>>>>>>>>>     ramManager.unreserve(mapOutputLength);
>>>>>>>>>>
>>>>>>>>>>     // Discard the map-output
>>>>>>>>>>     try {
>>>>>>>>>>       mapOutput.discard();
>>>>>>>>>>     } catch (IOException ignored) {
>>>>>>>>>>       LOG.info("Failed to discard map-output from " +
>>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
>>>>>>>>>>     }
>>>>>>>>>> Please confirm the line number.
>>>>>>>>>>
>>>>>>>>>> If we're looking at the same code, I am afraid I don't see how we
>>>>>>>>>> can
>>>>>>>>>> improve it. First, I assume IOException shouldn't happen that
>>>>>>>>>> often.
>>>>>>>>>> Second,
>>>>>>>>>> mapOutput.discard() just sets:
>>>>>>>>>>     data = null;
>>>>>>>>>> for in memory case. Even if we call mapOutput.discard() before
>>>>>>>>>> ramManager.unreserve(), we don't know when GC would kick in and
>>>>>>>>>> make more
>>>>>>>>>> memory available.
>>>>>>>>>> Of course, given the large number of map outputs in your system, it
>>>>>>>>>>
>>>>>>>>>>  became
>>>>>>>>>
>>>>>>>>>  more likely that the root cause from my reasoning made OOME happen
>>>>>>>>>>
>>>>>>>>>>  sooner.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  andy.sautins@returnpath.net
>>>>>>>>>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Ted,
>>>>>>>>>>>
>>>>>>>>>>> I'm trying to follow the logic in your mail and I'm not sure I'm
>>>>>>>>>>> following.  If you would mind helping me understand I would
>>>>>>>>>>> appreciate
>>>>>>>>>>>
>>>>>>>>>>>  it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Looking at the code maxSingleShuffleLimit is only used in
>>>>>>>>>>> determining
>>>>>>>>>>>
>>>>>>>>>>>  if
>>>>>>>>>>
>>>>>>>>>>  the copy _can_ fit into memory:
>>>>>>>>>>>
>>>>>>>>>>> boolean canFitInMemory(long requestedSize) {
>>>>>>>>>>>   return (requestedSize < Integer.MAX_VALUE &&
>>>>>>>>>>>           requestedSize < maxSingleShuffleLimit);
>>>>>>>>>>>  }
>>>>>>>>>>>
>>>>>>>>>>> It also looks like the RamManager.reserve should wait until
>>>>>>>>>>> memory
>>>>>>>>>>>
>>>>>>>>>>>  is
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  available so it should hit a memory limit for that reason.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What does seem a little strange to me is the following (
>>>>>>>>>>>
>>>>>>>>>>>  ReduceTask.java
>>>>>>>>>>
>>>>>>>>>>  starting at 2730 ):
>>>>>>>>>>>
>>>>>>>>>>>     // Inform the ram-manager
>>>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
>>>>>>>>>>>     ramManager.unreserve(mapOutputLength);
>>>>>>>>>>>
>>>>>>>>>>>     // Discard the map-output
>>>>>>>>>>>     try {
>>>>>>>>>>>       mapOutput.discard();
>>>>>>>>>>>     } catch (IOException ignored) {
>>>>>>>>>>>       LOG.info("Failed to discard map-output from " +
>>>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
>>>>>>>>>>>     }
>>>>>>>>>>>     mapOutput = null;
>>>>>>>>>>>
>>>>>>>>>>> So to me that looks like the ramManager unreserves the memory
>>>>>>>>>>> before
>>>>>>>>>>>
>>>>>>>>>>>  the
>>>>>>>>>>
>>>>>>>>>>  mapOutput is discarded.  Shouldn't the mapOutput be discarded
>>>>>>>>>>> _before_
>>>>>>>>>>>
>>>>>>>>>>>  the
>>>>>>>>>>
>>>>>>>>>>  ramManager unreserves the memory?  If the memory is unreserved
>>>>>>>>>>> before
>>>>>>>>>>>
>>>>>>>>>>>  the
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  actual underlying data references are removed then it seems like
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  another
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  thread can try to allocate memory ( ReduceTask.java:2730 ) before
>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>> previous memory is disposed ( mapOutput.discard() ).
>>>>>>>>>>>
>>>>>>>>>>> Not sure that makes sense.  One thing to note is that the
>>>>>>>>>>> particular
>>>>>>>>>>>
>>>>>>>>>>>  job
>>>>>>>>>>
>>>>>>>>>>  that is failing does have a good number ( 200k+ ) of map
>>>>>>>>>>> outputs.  The
>>>>>>>>>>>
>>>>>>>>>>>  large
>>>>>>>>>>
>>>>>>>>>>  number of small map outputs may be why we are triggering a
>>>>>>>>>>> problem.
>>>>>>>>>>>
>>>>>>>>>>> Thanks again for your thoughts.
>>>>>>>>>>>
>>>>>>>>>>> Andy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Jacob R Rideout [mailto:apache@jacobrideout.net]
>>>>>>>>>>> Sent: Sunday, March 07, 2010 1:21 PM
>>>>>>>>>>> To: common-user@hadoop.apache.org
>>>>>>>>>>> Cc: Andy Sautins; Ted Yu
>>>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>>>>>>
>>>>>>>>>>> Ted,
>>>>>>>>>>>
>>>>>>>>>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I might
>>>>>>>>>>> have
>>>>>>>>>>> some time to write a patch later this week.
>>>>>>>>>>>
>>>>>>>>>>> Jacob Rideout
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>  I think there is mismatch (in ReduceTask.java) between:
>>>>>>>>>>>>  this.numCopiers = conf.getInt("mapred.reduce.parallel.copies",
>>>>>>>>>>>>
>>>>>>>>>>>>  5);
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>  and:
>>>>>>>>>>
>>>>>>>>>>>   maxSingleShuffleLimit = (long)(maxSize *
>>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
>>>>>>>>>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
>>>>>>>>>>>>
>>>>>>>>>>>> because
>>>>>>>>>>>>  copiers = new ArrayList<MapOutputCopier>(numCopiers);
>>>>>>>>>>>> so the total memory allocated for in-mem shuffle is 1.25 *
>>>>>>>>>>>> maxSize
>>>>>>>>>>>>
>>>>>>>>>>>> A JIRA should be filed to correlate the constant 5 above and
>>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout <
>>>>>>>>>>>>
>>>>>>>>>>>>  apache@jacobrideout.net
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> We are seeing the following error in our reducers of a
>>>>>>>>>>>>> particular
>>>>>>>>>>>>>
>>>>>>>>>>>>>  job:
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>  Error: java.lang.OutOfMemoryError: Java heap space
>>>>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
>>>>>>>>>>
>>>>>>>>> (ReduceTask.java:1195)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> After enough reducers fail the entire job fails. This error
>>>>>>>>>>>>> occurs
>>>>>>>>>>>>> regardless of whether mapred.compress.map.output is true. We
>>>>>>>>>>>>> were
>>>>>>>>>>>>>
>>>>>>>>>>>>>  able
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  to avoid the issue by reducing
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>  mapred.job.shuffle.input.buffer.percent
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  to 20%. Shouldn't the framework via
>>>>>>>>>>
>>>>>>>>>>>  ShuffleRamManager.canFitInMemory
>>>>>>>>>>>>> and.ShuffleRamManager.reserve correctly detect the the memory
>>>>>>>>>>>>> available for allocation? I would think that with poor
>>>>>>>>>>>>> configuration
>>>>>>>>>>>>> settings (and default settings in particular) the job may not
>>>>>>>>>>>>> be as
>>>>>>>>>>>>> efficient, but wouldn't die.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is some more context in the logs, I have attached the full
>>>>>>>>>>>>> reducer log here: http://gist.github.com/323746
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2010-03-06 07:54:49,621 INFO
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from
>>>>>>>>>>>>> attempt_201003060739_0002_m_000061_0
>>>>>>>>>>>>> 2010-03-06 07:54:50,222 INFO
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Task
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
>>>>>>>>>>
>>>>>>>>>>>  attempt_201003060739_0002_m_000202_0
>>>>>>>>>>>>> 2010-03-06 07:54:50,223 WARN
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 adding host
>>>>>>>>>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4
>>>>>>>>>>>>> seconds
>>>>>>>>>>>>> 2010-03-06 07:54:50,223 INFO
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from
>>>>>>>>>>>>>
>>>>>>>>>>>>>  previous
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  failures
>>>>>>>>>>
>>>>>>>>>>>  2010-03-06 07:54:50,223 FATAL
>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskRunner:
>>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy failure :
>>>>>>>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>>>>>>>   at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>>>
>>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>>>>
>>>>>>>>>    at
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
>>>>>>>>>>
>>>>>>>>> (ReduceTask.java:1195)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped
>>>>>>>>>>>>> MAPREDUCE-1182
>>>>>>>>>>>>> would address the issue in 0.20.2, but it did not. Does anyone
>>>>>>>>>>>>> have
>>>>>>>>>>>>> any comments or suggestions? Is this a bug I should file a JIRA
>>>>>>>>>>>>> for?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jacob Rideout
>>>>>>>>>>>>> Return Path
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>
>>
>

Re: Shuffle In Memory OutOfMemoryError

Posted by Alex Kozlov <al...@cloudera.com>.
In our case there were three things that helped:


   - reduce *mapred.reduce.parallel.copies *(and incur some execution time
   penalty on large clusters)
   - increase the # of reducers (and just have the larger # of output files)
   - decrease the amount of map output (may or may not be possible in your
   case)


Alex K


On Fri, May 7, 2010 at 8:25 PM, Ted Yu <yu...@gmail.com> wrote:

> You need to lower mapred.job.shuffle.input.buffer.percent to 20% or 25%.
> I didn't have time recently to find the root cause in 0.20.2
>
> I was told that shuffle has been rewritten in 0.21
> You may give it a try.
>
> On Fri, May 7, 2010 at 8:08 PM, Bo Shi <bs...@visiblemeasures.com> wrote:
>
> > Hey Ted, any further insights on this?  We're encountering a similar
> > issue (on CD2).  I'll be applying MAPREDUCE-1182 to see if that
> > resolves our case but it sounds like that JIRA didn't completely
> > eliminate the problem for some folks.
> >
> > On Wed, Mar 10, 2010 at 11:54 PM, Ted Yu <yu...@gmail.com> wrote:
> > > I pressed send key a bit early.
> > >
> > > I will have to dig a bit deeper.
> > > Hopefully someone can find reader.close() call after which I will look
> > for
> > > another possible root cause :-)
> > >
> > >
> > > On Wed, Mar 10, 2010 at 7:48 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Thanks to Andy for the log he provided.
> > >>
> > >> You can see from the log below that size increased steadily from
> > 341535057
> > >> to 408181692, approaching maxSize. Then OOME:
> > >>
> > >>
> > >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask:
> > reserve:
> > >> pos=start requestedSize=3893000 size=341535057 numPendingRequests=0
> > >> maxSize=417601952
> > >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask:
> > reserve:
> > >> pos=end requestedSize=3893000 size=345428057 numPendingRequests=0
> > >> maxSize=417601952
> > >> ...
> > >> 2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask:
> > reserve:
> > >> pos=end requestedSize=635753 size=408181692 numPendingRequests=0
> > >> maxSize=417601952
> > >> 2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task
> > >> attempt_201003101826_0001_r_000004_0: Failed fetch #1 from
> > >> attempt_201003101826_0001_m_000875_0
> > >>
> > >> 2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask:
> > >> attempt_201003101826_0001_r_000004_0 adding host
> > hd17.dfs.returnpath.netto penalty box, next contact in 4 seconds
> > >> 2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask:
> > >> attempt_201003101826_0001_r_000004_0: Got 1 map-outputs from previous
> > >> failures
> > >> 2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner:
> > >> attempt_201003101826_0001_r_000004_0 : Map output copy failure :
> > >> java.lang.OutOfMemoryError: Java heap space
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513)
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413)
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266)
> > >>         at
> > >>
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200)
> > >>
> > >> Looking at the call to unreserve() in ReduceTask, two were for
> > IOException
> > >> and the other was for Sanity check (line 1557). Meaning they wouldn't
> be
> > >> called in normal execution path.
> > >>
> > >> I see one call in IFile.InMemoryReader close() method:
> > >>       // Inform the RamManager
> > >>       ramManager.unreserve(bufferSize);
> > >>
> > >> And InMemoryReader is used in createInMemorySegments():
> > >>           Reader<K, V> reader =
> > >>             new InMemoryReader<K, V>(ramManager, mo.mapAttemptId,
> > >>                                      mo.data, 0, mo.data.length);
> > >>
> > >> But I don't see reader.close() in ReduceTask file.
> > >>
> > >
> > >
> > >> On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas <chrisdo@yahoo-inc.com
> > >wrote:
> > >>
> > >>> I don't think this OOM is a framework bug per se, and given the
> > >>> rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning
> > the
> > >>> 0.20 shuffle semantics is likely not worthwhile (though data
> informing
> > >>> improvements to trunk would be excellent). Most likely (and
> > tautologically),
> > >>> ReduceTask simply requires more memory than is available and the job
> > failure
> > >>> can be avoided by either 0) increasing the heap size or 1) lowering
> > >>> mapred.shuffle.input.buffer.percent. Most of the tasks we run have a
> > heap of
> > >>> 1GB. For a reduce fetching >200k map outputs, that's a reasonable,
> even
> > >>> stingy amount of space. -C
> > >>>
> > >>>
> > >>> On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:
> > >>>
> > >>>  I verified that size and maxSize are long. This means MR-1182 didn't
> > >>>> resolve
> > >>>> Andy's issue.
> > >>>>
> > >>>> According to Andy:
> > >>>> At the beginning of the job there are 209,754 pending map tasks and
> 32
> > >>>> pending reduce tasks
> > >>>>
> > >>>> My guess is that GC wasn't reclaiming memory fast enough, leading to
> > OOME
> > >>>> because of large number of in-memory shuffle candidates.
> > >>>>
> > >>>> My suggestion for Andy would be to:
> > >>>> 1. add -*verbose*:*gc as JVM parameter
> > >>>> 2. modify reserve() slightly to calculate the maximum outstanding
> > >>>> numPendingRequests and print the maximum.
> > >>>>
> > >>>> Based on the output from above two items, we can discuss solution.
> > >>>> My intuition is to place upperbound on numPendingRequests beyond
> which
> > >>>> canFitInMemory() returns false.
> > >>>> *
> > >>>> My two cents.
> > >>>>
> > >>>> On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
> > >>>> <ch...@yahoo-inc.com>wrote:
> > >>>>
> > >>>>  That section of code is unmodified in MR-1182. See the patches/svn
> > log.
> > >>>>> -C
> > >>>>>
> > >>>>> Sent from my iPhone
> > >>>>>
> > >>>>>
> > >>>>> On Mar 9, 2010, at 7:44 PM, "Ted Yu" <yu...@gmail.com> wrote:
> > >>>>>
> > >>>>> I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
> > >>>>>
> > >>>>>> This is what I see in ReduceTask (line 999):
> > >>>>>>   public synchronized boolean reserve(int requestedSize,
> InputStream
> > >>>>>> in)
> > >>>>>>
> > >>>>>>   throws InterruptedException {
> > >>>>>>     // Wait till the request can be fulfilled...
> > >>>>>>     while ((size + requestedSize) > maxSize) {
> > >>>>>>
> > >>>>>> I don't see the fix from MR-1182.
> > >>>>>>
> > >>>>>> That's why I suggested to Andy that he manually apply MR-1182.
> > >>>>>>
> > >>>>>> Cheers
> > >>>>>>
> > >>>>>> On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins <
> > >>>>>> andy.sautins@returnpath.net
> > >>>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>  Thanks Christopher.
> > >>>>>>>
> > >>>>>>> The heap size for reduce tasks is configured to be 640M (
> > >>>>>>> mapred.child.java.opts set to -Xmx640m ).
> > >>>>>>>
> > >>>>>>> Andy
> > >>>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Christopher Douglas [mailto:chrisdo@yahoo-inc.com]
> > >>>>>>> Sent: Tuesday, March 09, 2010 5:19 PM
> > >>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>
> > >>>>>>> No, MR-1182 is included in 0.20.2
> > >>>>>>>
> > >>>>>>> What heap size have you set for your reduce tasks? -C
> > >>>>>>>
> > >>>>>>> Sent from my iPhone
> > >>>>>>>
> > >>>>>>> On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yu...@gmail.com>
> wrote:
> > >>>>>>>
> > >>>>>>> Andy:
> > >>>>>>>
> > >>>>>>>> You need to manually apply the patch.
> > >>>>>>>>
> > >>>>>>>> Cheers
> > >>>>>>>>
> > >>>>>>>> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <
> > >>>>>>>>
> > >>>>>>>>  andy.sautins@returnpath.net
> > >>>>>>>
> > >>>>>>>  wrote:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is
> included
> > >>>>>>>>> in the
> > >>>>>>>>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend
> > and
> > >>>>>>>>> re-ran
> > >>>>>>>>> the same job scenarios.  Running with
> > mapred.reduce.parallel.copies
> > >>>>>>>>> set to 1
> > >>>>>>>>> and continue to have the same Java heap space error.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
> > >>>>>>>>> Sent: Tuesday, March 09, 2010 12:56 PM
> > >>>>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>>>
> > >>>>>>>>> This issue has been resolved in
> > >>>>>>>>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
> > >>>>>>>>>
> > >>>>>>>>> Please apply the patch
> > >>>>>>>>> M1182-1v20.patch<
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> >
> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <
> > >>>>>>>>>
> > >>>>>>>>>  andy.sautins@returnpath.net
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>  wrote:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  Thanks Ted.  Very helpful.  You are correct that I
> misunderstood
> > >>>>>>>>>> the
> > >>>>>>>>>>
> > >>>>>>>>>>  code
> > >>>>>>>>>
> > >>>>>>>>>  at ReduceTask.java:1535.  I missed the fact that it's in a
> > >>>>>>>>>> IOException
> > >>>>>>>>>>
> > >>>>>>>>>>  catch
> > >>>>>>>>>
> > >>>>>>>>>  block.  My mistake.  That's what I get for being in a rush.
> > >>>>>>>>>>
> > >>>>>>>>>> For what it's worth I did re-run the job with
> > >>>>>>>>>> mapred.reduce.parallel.copies set with values from 5 all the
> way
> > >>>>>>>>>> down to
> > >>>>>>>>>>
> > >>>>>>>>>>  1.
> > >>>>>>>>>
> > >>>>>>>>>  All failed with the same error:
> > >>>>>>>>>>
> > >>>>>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
> > >>>>>>>>>>   at
> > >>>>>>>>>>
> > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> > >>>>>>>>>>
> > >>>>>>>>> (ReduceTask.java:1195)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> So from that it does seem like something else might be going
> on,
> > >>>>>>>>>> yes?
> > >>>>>>>>>>
> > >>>>>>>>>>  I
> > >>>>>>>>>
> > >>>>>>>>>  need to do some more research.
> > >>>>>>>>>>
> > >>>>>>>>>> I appreciate your insights.
> > >>>>>>>>>>
> > >>>>>>>>>> Andy
> > >>>>>>>>>>
> > >>>>>>>>>> -----Original Message-----
> > >>>>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
> > >>>>>>>>>> Sent: Sunday, March 07, 2010 3:38 PM
> > >>>>>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>>>>
> > >>>>>>>>>> My observation is based on this call chain:
> > >>>>>>>>>> MapOutputCopier.run() calling copyOutput() calling
> > getMapOutput()
> > >>>>>>>>>> calling
> > >>>>>>>>>> ramManager.canFitInMemory(decompressedLength)
> > >>>>>>>>>>
> > >>>>>>>>>> Basically ramManager.canFitInMemory() makes decision without
> > >>>>>>>>>> considering
> > >>>>>>>>>> the
> > >>>>>>>>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7
> of
> > >>>>>>>>>> total
> > >>>>>>>>>>
> > >>>>>>>>>>  heap
> > >>>>>>>>>
> > >>>>>>>>>  may be used in shuffling if default parameters were used.
> > >>>>>>>>>> Of course, you should check the value for
> > >>>>>>>>>> mapred.reduce.parallel.copies
> > >>>>>>>>>>
> > >>>>>>>>>>  to
> > >>>>>>>>>
> > >>>>>>>>>  see if it is 5. If it is 4 or lower, my reasoning wouldn't
> > apply.
> > >>>>>>>>>>
> > >>>>>>>>>> About ramManager.unreserve() call, ReduceTask.java from hadoop
> > >>>>>>>>>> 0.20.2
> > >>>>>>>>>>
> > >>>>>>>>>>  only
> > >>>>>>>>>
> > >>>>>>>>>  has 2731 lines. So I have to guess the location of the code
> > >>>>>>>>>> snippet you
> > >>>>>>>>>> provided.
> > >>>>>>>>>> I found this around line 1535:
> > >>>>>>>>>>   } catch (IOException ioe) {
> > >>>>>>>>>>     LOG.info("Failed to shuffle from " +
> > >>>>>>>>>> mapOutputLoc.getTaskAttemptId(),
> > >>>>>>>>>>              ioe);
> > >>>>>>>>>>
> > >>>>>>>>>>     // Inform the ram-manager
> > >>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
> > >>>>>>>>>>     ramManager.unreserve(mapOutputLength);
> > >>>>>>>>>>
> > >>>>>>>>>>     // Discard the map-output
> > >>>>>>>>>>     try {
> > >>>>>>>>>>       mapOutput.discard();
> > >>>>>>>>>>     } catch (IOException ignored) {
> > >>>>>>>>>>       LOG.info("Failed to discard map-output from " +
> > >>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
> > >>>>>>>>>>     }
> > >>>>>>>>>> Please confirm the line number.
> > >>>>>>>>>>
> > >>>>>>>>>> If we're looking at the same code, I am afraid I don't see how
> > we
> > >>>>>>>>>> can
> > >>>>>>>>>> improve it. First, I assume IOException shouldn't happen that
> > >>>>>>>>>> often.
> > >>>>>>>>>> Second,
> > >>>>>>>>>> mapOutput.discard() just sets:
> > >>>>>>>>>>     data = null;
> > >>>>>>>>>> for in memory case. Even if we call mapOutput.discard() before
> > >>>>>>>>>> ramManager.unreserve(), we don't know when GC would kick in
> and
> > >>>>>>>>>> make more
> > >>>>>>>>>> memory available.
> > >>>>>>>>>> Of course, given the large number of map outputs in your
> system,
> > it
> > >>>>>>>>>>
> > >>>>>>>>>>  became
> > >>>>>>>>>
> > >>>>>>>>>  more likely that the root cause from my reasoning made OOME
> > happen
> > >>>>>>>>>>
> > >>>>>>>>>>  sooner.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> Thanks
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  andy.sautins@returnpath.net
> > >>>>>>>>>
> > >>>>>>>>>  wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  Ted,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'm trying to follow the logic in your mail and I'm not sure
> > I'm
> > >>>>>>>>>>> following.  If you would mind helping me understand I would
> > >>>>>>>>>>> appreciate
> > >>>>>>>>>>>
> > >>>>>>>>>>>  it.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> Looking at the code maxSingleShuffleLimit is only used in
> > >>>>>>>>>>> determining
> > >>>>>>>>>>>
> > >>>>>>>>>>>  if
> > >>>>>>>>>>
> > >>>>>>>>>>  the copy _can_ fit into memory:
> > >>>>>>>>>>>
> > >>>>>>>>>>> boolean canFitInMemory(long requestedSize) {
> > >>>>>>>>>>>   return (requestedSize < Integer.MAX_VALUE &&
> > >>>>>>>>>>>           requestedSize < maxSingleShuffleLimit);
> > >>>>>>>>>>>  }
> > >>>>>>>>>>>
> > >>>>>>>>>>> It also looks like the RamManager.reserve should wait until
> > >>>>>>>>>>> memory
> > >>>>>>>>>>>
> > >>>>>>>>>>>  is
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  available so it should hit a memory limit for that reason.
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> What does seem a little strange to me is the following (
> > >>>>>>>>>>>
> > >>>>>>>>>>>  ReduceTask.java
> > >>>>>>>>>>
> > >>>>>>>>>>  starting at 2730 ):
> > >>>>>>>>>>>
> > >>>>>>>>>>>     // Inform the ram-manager
> > >>>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
> > >>>>>>>>>>>     ramManager.unreserve(mapOutputLength);
> > >>>>>>>>>>>
> > >>>>>>>>>>>     // Discard the map-output
> > >>>>>>>>>>>     try {
> > >>>>>>>>>>>       mapOutput.discard();
> > >>>>>>>>>>>     } catch (IOException ignored) {
> > >>>>>>>>>>>       LOG.info("Failed to discard map-output from " +
> > >>>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
> > >>>>>>>>>>>     }
> > >>>>>>>>>>>     mapOutput = null;
> > >>>>>>>>>>>
> > >>>>>>>>>>> So to me that looks like the ramManager unreserves the memory
> > >>>>>>>>>>> before
> > >>>>>>>>>>>
> > >>>>>>>>>>>  the
> > >>>>>>>>>>
> > >>>>>>>>>>  mapOutput is discarded.  Shouldn't the mapOutput be discarded
> > >>>>>>>>>>> _before_
> > >>>>>>>>>>>
> > >>>>>>>>>>>  the
> > >>>>>>>>>>
> > >>>>>>>>>>  ramManager unreserves the memory?  If the memory is
> unreserved
> > >>>>>>>>>>> before
> > >>>>>>>>>>>
> > >>>>>>>>>>>  the
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  actual underlying data references are removed then it seems
> like
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>  another
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>  thread can try to allocate memory ( ReduceTask.java:2730 )
> > before
> > >>>>>>>>>>
> > >>>>>>>>>>> the
> > >>>>>>>>>>> previous memory is disposed ( mapOutput.discard() ).
> > >>>>>>>>>>>
> > >>>>>>>>>>> Not sure that makes sense.  One thing to note is that the
> > >>>>>>>>>>> particular
> > >>>>>>>>>>>
> > >>>>>>>>>>>  job
> > >>>>>>>>>>
> > >>>>>>>>>>  that is failing does have a good number ( 200k+ ) of map
> > >>>>>>>>>>> outputs.  The
> > >>>>>>>>>>>
> > >>>>>>>>>>>  large
> > >>>>>>>>>>
> > >>>>>>>>>>  number of small map outputs may be why we are triggering a
> > >>>>>>>>>>> problem.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks again for your thoughts.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Andy
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: Jacob R Rideout [mailto:apache@jacobrideout.net]
> > >>>>>>>>>>> Sent: Sunday, March 07, 2010 1:21 PM
> > >>>>>>>>>>> To: common-user@hadoop.apache.org
> > >>>>>>>>>>> Cc: Andy Sautins; Ted Yu
> > >>>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> > >>>>>>>>>>>
> > >>>>>>>>>>> Ted,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I
> might
> > >>>>>>>>>>> have
> > >>>>>>>>>>> some time to write a patch later this week.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Jacob Rideout
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhihong@gmail.com
> >
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  I think there is mismatch (in ReduceTask.java) between:
> > >>>>>>>>>>>>  this.numCopiers =
> > conf.getInt("mapred.reduce.parallel.copies",
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>  5);
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>  and:
> > >>>>>>>>>>
> > >>>>>>>>>>>   maxSingleShuffleLimit = (long)(maxSize *
> > >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
> > >>>>>>>>>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> because
> > >>>>>>>>>>>>  copiers = new ArrayList<MapOutputCopier>(numCopiers);
> > >>>>>>>>>>>> so the total memory allocated for in-mem shuffle is 1.25 *
> > >>>>>>>>>>>> maxSize
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> A JIRA should be filed to correlate the constant 5 above and
> > >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout <
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>  apache@jacobrideout.net
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> We are seeing the following error in our reducers of a
> > >>>>>>>>>>>>> particular
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  job:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>  Error: java.lang.OutOfMemoryError: Java heap space
> > >>>>>>>>>>>>>   at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> > >>>>>>>>>>
> > >>>>>>>>> (ReduceTask.java:1195)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>>> After enough reducers fail the entire job fails. This error
> > >>>>>>>>>>>>> occurs
> > >>>>>>>>>>>>> regardless of whether mapred.compress.map.output is true.
> We
> > >>>>>>>>>>>>> were
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  able
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  to avoid the issue by reducing
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>  mapred.job.shuffle.input.buffer.percent
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  to 20%. Shouldn't the framework via
> > >>>>>>>>>>
> > >>>>>>>>>>>  ShuffleRamManager.canFitInMemory
> > >>>>>>>>>>>>> and.ShuffleRamManager.reserve correctly detect the the
> memory
> > >>>>>>>>>>>>> available for allocation? I would think that with poor
> > >>>>>>>>>>>>> configuration
> > >>>>>>>>>>>>> settings (and default settings in particular) the job may
> not
> > >>>>>>>>>>>>> be as
> > >>>>>>>>>>>>> efficient, but wouldn't die.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Here is some more context in the logs, I have attached the
> > full
> > >>>>>>>>>>>>> reducer log here: http://gist.github.com/323746
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 2010-03-06 07:54:49,621 INFO
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from
> > >>>>>>>>>>>>> attempt_201003060739_0002_m_000061_0
> > >>>>>>>>>>>>> 2010-03-06 07:54:50,222 INFO
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  Task
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
> > >>>>>>>>>>
> > >>>>>>>>>>>  attempt_201003060739_0002_m_000202_0
> > >>>>>>>>>>>>> 2010-03-06 07:54:50,223 WARN
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 adding host
> > >>>>>>>>>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4
> > >>>>>>>>>>>>> seconds
> > >>>>>>>>>>>>> 2010-03-06 07:54:50,223 INFO
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs
> from
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  previous
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>  failures
> > >>>>>>>>>>
> > >>>>>>>>>>>  2010-03-06 07:54:50,223 FATAL
> > >>>>>>>>>>>>> org.apache.hadoop.mapred.TaskRunner:
> > >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy
> > failure :
> > >>>>>>>>>>>>> java.lang.OutOfMemoryError: Java heap space
> > >>>>>>>>>>>>>   at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> > >>>>>>>>>>
> > >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> > >>>>>>>>>
> > >>>>>>>>>    at
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> > >>>>>>>>>>
> > >>>>>>>>> (ReduceTask.java:1195)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped
> > >>>>>>>>>>>>> MAPREDUCE-1182
> > >>>>>>>>>>>>> would address the issue in 0.20.2, but it did not. Does
> > anyone
> > >>>>>>>>>>>>> have
> > >>>>>>>>>>>>> any comments or suggestions? Is this a bug I should file a
> > JIRA
> > >>>>>>>>>>>>> for?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Jacob Rideout
> > >>>>>>>>>>>>> Return Path
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>
> > >>
> > >
> >
>

Re: Shuffle In Memory OutOfMemoryError

Posted by Ted Yu <yu...@gmail.com>.
You need to lower mapred.job.shuffle.input.buffer.percent to 20% or 25%.
I didn't have time recently to find the root cause in 0.20.2

I was told that shuffle has been rewritten in 0.21
You may give it a try.

On Fri, May 7, 2010 at 8:08 PM, Bo Shi <bs...@visiblemeasures.com> wrote:

> Hey Ted, any further insights on this?  We're encountering a similar
> issue (on CD2).  I'll be applying MAPREDUCE-1182 to see if that
> resolves our case but it sounds like that JIRA didn't completely
> eliminate the problem for some folks.
>
> On Wed, Mar 10, 2010 at 11:54 PM, Ted Yu <yu...@gmail.com> wrote:
> > I pressed send key a bit early.
> >
> > I will have to dig a bit deeper.
> > Hopefully someone can find reader.close() call after which I will look
> for
> > another possible root cause :-)
> >
> >
> > On Wed, Mar 10, 2010 at 7:48 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Thanks to Andy for the log he provided.
> >>
> >> You can see from the log below that size increased steadily from
> 341535057
> >> to 408181692, approaching maxSize. Then OOME:
> >>
> >>
> >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask:
> reserve:
> >> pos=start requestedSize=3893000 size=341535057 numPendingRequests=0
> >> maxSize=417601952
> >> 2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask:
> reserve:
> >> pos=end requestedSize=3893000 size=345428057 numPendingRequests=0
> >> maxSize=417601952
> >> ...
> >> 2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask:
> reserve:
> >> pos=end requestedSize=635753 size=408181692 numPendingRequests=0
> >> maxSize=417601952
> >> 2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task
> >> attempt_201003101826_0001_r_000004_0: Failed fetch #1 from
> >> attempt_201003101826_0001_m_000875_0
> >>
> >> 2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask:
> >> attempt_201003101826_0001_r_000004_0 adding host
> hd17.dfs.returnpath.netto penalty box, next contact in 4 seconds
> >> 2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask:
> >> attempt_201003101826_0001_r_000004_0: Got 1 map-outputs from previous
> >> failures
> >> 2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner:
> >> attempt_201003101826_0001_r_000004_0 : Map output copy failure :
> >> java.lang.OutOfMemoryError: Java heap space
> >>         at
> >>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513)
> >>         at
> >>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413)
> >>         at
> >>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266)
> >>         at
> >>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200)
> >>
> >> Looking at the call to unreserve() in ReduceTask, two were for
> IOException
> >> and the other was for Sanity check (line 1557). Meaning they wouldn't be
> >> called in normal execution path.
> >>
> >> I see one call in IFile.InMemoryReader close() method:
> >>       // Inform the RamManager
> >>       ramManager.unreserve(bufferSize);
> >>
> >> And InMemoryReader is used in createInMemorySegments():
> >>           Reader<K, V> reader =
> >>             new InMemoryReader<K, V>(ramManager, mo.mapAttemptId,
> >>                                      mo.data, 0, mo.data.length);
> >>
> >> But I don't see reader.close() in ReduceTask file.
> >>
> >
> >
> >> On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas <chrisdo@yahoo-inc.com
> >wrote:
> >>
> >>> I don't think this OOM is a framework bug per se, and given the
> >>> rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning
> the
> >>> 0.20 shuffle semantics is likely not worthwhile (though data informing
> >>> improvements to trunk would be excellent). Most likely (and
> tautologically),
> >>> ReduceTask simply requires more memory than is available and the job
> failure
> >>> can be avoided by either 0) increasing the heap size or 1) lowering
> >>> mapred.shuffle.input.buffer.percent. Most of the tasks we run have a
> heap of
> >>> 1GB. For a reduce fetching >200k map outputs, that's a reasonable, even
> >>> stingy amount of space. -C
> >>>
> >>>
> >>> On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:
> >>>
> >>>  I verified that size and maxSize are long. This means MR-1182 didn't
> >>>> resolve
> >>>> Andy's issue.
> >>>>
> >>>> According to Andy:
> >>>> At the beginning of the job there are 209,754 pending map tasks and 32
> >>>> pending reduce tasks
> >>>>
> >>>> My guess is that GC wasn't reclaiming memory fast enough, leading to
> OOME
> >>>> because of large number of in-memory shuffle candidates.
> >>>>
> >>>> My suggestion for Andy would be to:
> >>>> 1. add -*verbose*:*gc as JVM parameter
> >>>> 2. modify reserve() slightly to calculate the maximum outstanding
> >>>> numPendingRequests and print the maximum.
> >>>>
> >>>> Based on the output from above two items, we can discuss solution.
> >>>> My intuition is to place upperbound on numPendingRequests beyond which
> >>>> canFitInMemory() returns false.
> >>>> *
> >>>> My two cents.
> >>>>
> >>>> On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
> >>>> <ch...@yahoo-inc.com>wrote:
> >>>>
> >>>>  That section of code is unmodified in MR-1182. See the patches/svn
> log.
> >>>>> -C
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>
> >>>>> On Mar 9, 2010, at 7:44 PM, "Ted Yu" <yu...@gmail.com> wrote:
> >>>>>
> >>>>> I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
> >>>>>
> >>>>>> This is what I see in ReduceTask (line 999):
> >>>>>>   public synchronized boolean reserve(int requestedSize, InputStream
> >>>>>> in)
> >>>>>>
> >>>>>>   throws InterruptedException {
> >>>>>>     // Wait till the request can be fulfilled...
> >>>>>>     while ((size + requestedSize) > maxSize) {
> >>>>>>
> >>>>>> I don't see the fix from MR-1182.
> >>>>>>
> >>>>>> That's why I suggested to Andy that he manually apply MR-1182.
> >>>>>>
> >>>>>> Cheers
> >>>>>>
> >>>>>> On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins <
> >>>>>> andy.sautins@returnpath.net
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>  Thanks Christopher.
> >>>>>>>
> >>>>>>> The heap size for reduce tasks is configured to be 640M (
> >>>>>>> mapred.child.java.opts set to -Xmx640m ).
> >>>>>>>
> >>>>>>> Andy
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Christopher Douglas [mailto:chrisdo@yahoo-inc.com]
> >>>>>>> Sent: Tuesday, March 09, 2010 5:19 PM
> >>>>>>> To: common-user@hadoop.apache.org
> >>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> >>>>>>>
> >>>>>>> No, MR-1182 is included in 0.20.2
> >>>>>>>
> >>>>>>> What heap size have you set for your reduce tasks? -C
> >>>>>>>
> >>>>>>> Sent from my iPhone
> >>>>>>>
> >>>>>>> On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yu...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Andy:
> >>>>>>>
> >>>>>>>> You need to manually apply the patch.
> >>>>>>>>
> >>>>>>>> Cheers
> >>>>>>>>
> >>>>>>>> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <
> >>>>>>>>
> >>>>>>>>  andy.sautins@returnpath.net
> >>>>>>>
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
> >>>>>>>>> in the
> >>>>>>>>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend
> and
> >>>>>>>>> re-ran
> >>>>>>>>> the same job scenarios.  Running with
> mapred.reduce.parallel.copies
> >>>>>>>>> set to 1
> >>>>>>>>> and continue to have the same Java heap space error.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
> >>>>>>>>> Sent: Tuesday, March 09, 2010 12:56 PM
> >>>>>>>>> To: common-user@hadoop.apache.org
> >>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> >>>>>>>>>
> >>>>>>>>> This issue has been resolved in
> >>>>>>>>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
> >>>>>>>>>
> >>>>>>>>> Please apply the patch
> >>>>>>>>> M1182-1v20.patch<
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <
> >>>>>>>>>
> >>>>>>>>>  andy.sautins@returnpath.net
> >>>>>>>>
> >>>>>>>
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>  Thanks Ted.  Very helpful.  You are correct that I misunderstood
> >>>>>>>>>> the
> >>>>>>>>>>
> >>>>>>>>>>  code
> >>>>>>>>>
> >>>>>>>>>  at ReduceTask.java:1535.  I missed the fact that it's in a
> >>>>>>>>>> IOException
> >>>>>>>>>>
> >>>>>>>>>>  catch
> >>>>>>>>>
> >>>>>>>>>  block.  My mistake.  That's what I get for being in a rush.
> >>>>>>>>>>
> >>>>>>>>>> For what it's worth I did re-run the job with
> >>>>>>>>>> mapred.reduce.parallel.copies set with values from 5 all the way
> >>>>>>>>>> down to
> >>>>>>>>>>
> >>>>>>>>>>  1.
> >>>>>>>>>
> >>>>>>>>>  All failed with the same error:
> >>>>>>>>>>
> >>>>>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
> >>>>>>>>>>   at
> >>>>>>>>>>
> >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> >>>>>>>>>>
> >>>>>>>>> (ReduceTask.java:1195)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> So from that it does seem like something else might be going on,
> >>>>>>>>>> yes?
> >>>>>>>>>>
> >>>>>>>>>>  I
> >>>>>>>>>
> >>>>>>>>>  need to do some more research.
> >>>>>>>>>>
> >>>>>>>>>> I appreciate your insights.
> >>>>>>>>>>
> >>>>>>>>>> Andy
> >>>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
> >>>>>>>>>> Sent: Sunday, March 07, 2010 3:38 PM
> >>>>>>>>>> To: common-user@hadoop.apache.org
> >>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> >>>>>>>>>>
> >>>>>>>>>> My observation is based on this call chain:
> >>>>>>>>>> MapOutputCopier.run() calling copyOutput() calling
> getMapOutput()
> >>>>>>>>>> calling
> >>>>>>>>>> ramManager.canFitInMemory(decompressedLength)
> >>>>>>>>>>
> >>>>>>>>>> Basically ramManager.canFitInMemory() makes decision without
> >>>>>>>>>> considering
> >>>>>>>>>> the
> >>>>>>>>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
> >>>>>>>>>> total
> >>>>>>>>>>
> >>>>>>>>>>  heap
> >>>>>>>>>
> >>>>>>>>>  may be used in shuffling if default parameters were used.
> >>>>>>>>>> Of course, you should check the value for
> >>>>>>>>>> mapred.reduce.parallel.copies
> >>>>>>>>>>
> >>>>>>>>>>  to
> >>>>>>>>>
> >>>>>>>>>  see if it is 5. If it is 4 or lower, my reasoning wouldn't
> apply.
> >>>>>>>>>>
> >>>>>>>>>> About ramManager.unreserve() call, ReduceTask.java from hadoop
> >>>>>>>>>> 0.20.2
> >>>>>>>>>>
> >>>>>>>>>>  only
> >>>>>>>>>
> >>>>>>>>>  has 2731 lines. So I have to guess the location of the code
> >>>>>>>>>> snippet you
> >>>>>>>>>> provided.
> >>>>>>>>>> I found this around line 1535:
> >>>>>>>>>>   } catch (IOException ioe) {
> >>>>>>>>>>     LOG.info("Failed to shuffle from " +
> >>>>>>>>>> mapOutputLoc.getTaskAttemptId(),
> >>>>>>>>>>              ioe);
> >>>>>>>>>>
> >>>>>>>>>>     // Inform the ram-manager
> >>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
> >>>>>>>>>>     ramManager.unreserve(mapOutputLength);
> >>>>>>>>>>
> >>>>>>>>>>     // Discard the map-output
> >>>>>>>>>>     try {
> >>>>>>>>>>       mapOutput.discard();
> >>>>>>>>>>     } catch (IOException ignored) {
> >>>>>>>>>>       LOG.info("Failed to discard map-output from " +
> >>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
> >>>>>>>>>>     }
> >>>>>>>>>> Please confirm the line number.
> >>>>>>>>>>
> >>>>>>>>>> If we're looking at the same code, I am afraid I don't see how
> we
> >>>>>>>>>> can
> >>>>>>>>>> improve it. First, I assume IOException shouldn't happen that
> >>>>>>>>>> often.
> >>>>>>>>>> Second,
> >>>>>>>>>> mapOutput.discard() just sets:
> >>>>>>>>>>     data = null;
> >>>>>>>>>> for in memory case. Even if we call mapOutput.discard() before
> >>>>>>>>>> ramManager.unreserve(), we don't know when GC would kick in and
> >>>>>>>>>> make more
> >>>>>>>>>> memory available.
> >>>>>>>>>> Of course, given the large number of map outputs in your system,
> it
> >>>>>>>>>>
> >>>>>>>>>>  became
> >>>>>>>>>
> >>>>>>>>>  more likely that the root cause from my reasoning made OOME
> happen
> >>>>>>>>>>
> >>>>>>>>>>  sooner.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  andy.sautins@returnpath.net
> >>>>>>>>>
> >>>>>>>>>  wrote:
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  Ted,
> >>>>>>>>>>>
> >>>>>>>>>>> I'm trying to follow the logic in your mail and I'm not sure
> I'm
> >>>>>>>>>>> following.  If you would mind helping me understand I would
> >>>>>>>>>>> appreciate
> >>>>>>>>>>>
> >>>>>>>>>>>  it.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Looking at the code maxSingleShuffleLimit is only used in
> >>>>>>>>>>> determining
> >>>>>>>>>>>
> >>>>>>>>>>>  if
> >>>>>>>>>>
> >>>>>>>>>>  the copy _can_ fit into memory:
> >>>>>>>>>>>
> >>>>>>>>>>> boolean canFitInMemory(long requestedSize) {
> >>>>>>>>>>>   return (requestedSize < Integer.MAX_VALUE &&
> >>>>>>>>>>>           requestedSize < maxSingleShuffleLimit);
> >>>>>>>>>>>  }
> >>>>>>>>>>>
> >>>>>>>>>>> It also looks like the RamManager.reserve should wait until
> >>>>>>>>>>> memory
> >>>>>>>>>>>
> >>>>>>>>>>>  is
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>  available so it should hit a memory limit for that reason.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> What does seem a little strange to me is the following (
> >>>>>>>>>>>
> >>>>>>>>>>>  ReduceTask.java
> >>>>>>>>>>
> >>>>>>>>>>  starting at 2730 ):
> >>>>>>>>>>>
> >>>>>>>>>>>     // Inform the ram-manager
> >>>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
> >>>>>>>>>>>     ramManager.unreserve(mapOutputLength);
> >>>>>>>>>>>
> >>>>>>>>>>>     // Discard the map-output
> >>>>>>>>>>>     try {
> >>>>>>>>>>>       mapOutput.discard();
> >>>>>>>>>>>     } catch (IOException ignored) {
> >>>>>>>>>>>       LOG.info("Failed to discard map-output from " +
> >>>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
> >>>>>>>>>>>     }
> >>>>>>>>>>>     mapOutput = null;
> >>>>>>>>>>>
> >>>>>>>>>>> So to me that looks like the ramManager unreserves the memory
> >>>>>>>>>>> before
> >>>>>>>>>>>
> >>>>>>>>>>>  the
> >>>>>>>>>>
> >>>>>>>>>>  mapOutput is discarded.  Shouldn't the mapOutput be discarded
> >>>>>>>>>>> _before_
> >>>>>>>>>>>
> >>>>>>>>>>>  the
> >>>>>>>>>>
> >>>>>>>>>>  ramManager unreserves the memory?  If the memory is unreserved
> >>>>>>>>>>> before
> >>>>>>>>>>>
> >>>>>>>>>>>  the
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>  actual underlying data references are removed then it seems like
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  another
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>  thread can try to allocate memory ( ReduceTask.java:2730 )
> before
> >>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>> previous memory is disposed ( mapOutput.discard() ).
> >>>>>>>>>>>
> >>>>>>>>>>> Not sure that makes sense.  One thing to note is that the
> >>>>>>>>>>> particular
> >>>>>>>>>>>
> >>>>>>>>>>>  job
> >>>>>>>>>>
> >>>>>>>>>>  that is failing does have a good number ( 200k+ ) of map
> >>>>>>>>>>> outputs.  The
> >>>>>>>>>>>
> >>>>>>>>>>>  large
> >>>>>>>>>>
> >>>>>>>>>>  number of small map outputs may be why we are triggering a
> >>>>>>>>>>> problem.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks again for your thoughts.
> >>>>>>>>>>>
> >>>>>>>>>>> Andy
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Jacob R Rideout [mailto:apache@jacobrideout.net]
> >>>>>>>>>>> Sent: Sunday, March 07, 2010 1:21 PM
> >>>>>>>>>>> To: common-user@hadoop.apache.org
> >>>>>>>>>>> Cc: Andy Sautins; Ted Yu
> >>>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
> >>>>>>>>>>>
> >>>>>>>>>>> Ted,
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you. I filled MAPREDUCE-1571 to cover this issue. I might
> >>>>>>>>>>> have
> >>>>>>>>>>> some time to write a patch later this week.
> >>>>>>>>>>>
> >>>>>>>>>>> Jacob Rideout
> >>>>>>>>>>>
> >>>>>>>>>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yu...@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>  I think there is mismatch (in ReduceTask.java) between:
> >>>>>>>>>>>>  this.numCopiers =
> conf.getInt("mapred.reduce.parallel.copies",
> >>>>>>>>>>>>
> >>>>>>>>>>>>  5);
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>  and:
> >>>>>>>>>>
> >>>>>>>>>>>   maxSingleShuffleLimit = (long)(maxSize *
> >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
> >>>>>>>>>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
> >>>>>>>>>>>>
> >>>>>>>>>>>> because
> >>>>>>>>>>>>  copiers = new ArrayList<MapOutputCopier>(numCopiers);
> >>>>>>>>>>>> so the total memory allocated for in-mem shuffle is 1.25 *
> >>>>>>>>>>>> maxSize
> >>>>>>>>>>>>
> >>>>>>>>>>>> A JIRA should be filed to correlate the constant 5 above and
> >>>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout <
> >>>>>>>>>>>>
> >>>>>>>>>>>>  apache@jacobrideout.net
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We are seeing the following error in our reducers of a
> >>>>>>>>>>>>> particular
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  job:
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>  Error: java.lang.OutOfMemoryError: Java heap space
> >>>>>>>>>>>>>   at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> >>>>>>>>>>
> >>>>>>>>> (ReduceTask.java:1195)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>> After enough reducers fail the entire job fails. This error
> >>>>>>>>>>>>> occurs
> >>>>>>>>>>>>> regardless of whether mapred.compress.map.output is true. We
> >>>>>>>>>>>>> were
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  able
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>  to avoid the issue by reducing
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>  mapred.job.shuffle.input.buffer.percent
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>  to 20%. Shouldn't the framework via
> >>>>>>>>>>
> >>>>>>>>>>>  ShuffleRamManager.canFitInMemory
> >>>>>>>>>>>>> and.ShuffleRamManager.reserve correctly detect the the memory
> >>>>>>>>>>>>> available for allocation? I would think that with poor
> >>>>>>>>>>>>> configuration
> >>>>>>>>>>>>> settings (and default settings in particular) the job may not
> >>>>>>>>>>>>> be as
> >>>>>>>>>>>>> efficient, but wouldn't die.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here is some more context in the logs, I have attached the
> full
> >>>>>>>>>>>>> reducer log here: http://gist.github.com/323746
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2010-03-06 07:54:49,621 INFO
> >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> >>>>>>>>>>>>> Shuffling 4191933 bytes (435311 raw bytes) into RAM from
> >>>>>>>>>>>>> attempt_201003060739_0002_m_000061_0
> >>>>>>>>>>>>> 2010-03-06 07:54:50,222 INFO
> >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  Task
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>  attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
> >>>>>>>>>>
> >>>>>>>>>>>  attempt_201003060739_0002_m_000202_0
> >>>>>>>>>>>>> 2010-03-06 07:54:50,223 WARN
> >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 adding host
> >>>>>>>>>>>>> hd37.dfs.returnpath.net to penalty box, next contact in 4
> >>>>>>>>>>>>> seconds
> >>>>>>>>>>>>> 2010-03-06 07:54:50,223 INFO
> >>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
> >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs from
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  previous
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>  failures
> >>>>>>>>>>
> >>>>>>>>>>>  2010-03-06 07:54:50,223 FATAL
> >>>>>>>>>>>>> org.apache.hadoop.mapred.TaskRunner:
> >>>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 : Map output copy
> failure :
> >>>>>>>>>>>>> java.lang.OutOfMemoryError: Java heap space
> >>>>>>>>>>>>>   at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >>>>>>>>>>
> >>>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> >>>>>>>>>
> >>>>>>>>>    at
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> >>>>>>>>>>
> >>>>>>>>> (ReduceTask.java:1195)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>> We tried this both in 0.20.1 and 0.20.2. We had hoped
> >>>>>>>>>>>>> MAPREDUCE-1182
> >>>>>>>>>>>>> would address the issue in 0.20.2, but it did not. Does
> anyone
> >>>>>>>>>>>>> have
> >>>>>>>>>>>>> any comments or suggestions? Is this a bug I should file a
> JIRA
> >>>>>>>>>>>>> for?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jacob Rideout
> >>>>>>>>>>>>> Return Path
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>
> >>
> >
>