You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Mingxi Wu <Mi...@turn.com> on 2011/12/01 08:07:54 UTC

RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?

Thank you,

Mingxi

-----Original Message-----
From: Ravi teja ch n v [mailto:raviteja.chnv@huawei.com] 
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can get over this problem.

Regards,
Ravi teja


________________________________________
From: Mingxi Wu [Mingxi.Wu@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not constraint by memory.
So, why will user run into this outofmemory error? After I increased my number of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi

Re: Hadoop - non disk based sorting?

Posted by Todd Lipcon <to...@cloudera.com>.
I've seen this issue in jobs with many many map tasks and small
reducer heaps. There is some heap space needed for the actual map
completion events, etc, and that isn't accounted for in determining
when to spill the fetch outputs to disk. Would be a nice patch to add
code that calculates the in-memory size of these objects during the
fetch phase and subtracts them from the heap size before multiplying
out the spill percentages, etc.

-Todd

On Thu, Dec 1, 2011 at 8:14 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
> Mingxi,
>
> My understanding was that just like with the maps that when a reducer's in memory buffer fills up it too will spill to disk as part of the sort.  In fact I think it uses the exact same code for doing the sort as the map does.  There may be an issue where your sort buffer is some how too large for the amount of heap that you requested as part of the mapred.child.java.opts.  I have personally run a reduce that took in 300GB of data, which it successfully sorted, to test this very thing.  And no the box did not have 300 GB of RAM.
>
> --Bobby Evans
>
> On 12/1/11 4:12 AM, "Ravi teja ch n v" <ra...@huawei.com> wrote:
>
> Hi Mingxi ,
>
>>So, why when map outputs are huge, reducer will not able to copy them?
>
> The Reducer  will copy the Map output into its inmemory buffer. When the Reducer JVM doesnt have enough memory to accomodate the
> Map output, then it leads to OutOfMemoryException.
>
>>Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?
>
> The Maps and Reducers will be launched in separate child JVMs launched at the Tasktrackers.
> When the Tasktracker launches the Map or Reduce JVMs, it uses the mapred.child.java.opts as JVM arguments for the new child JVMs.
>
> Regards,
> Ravi Teja
> ________________________________________
> From: Mingxi Wu [Mingxi.Wu@turn.com]
> Sent: 01 December 2011 12:37:54
> To: common-dev@hadoop.apache.org
> Subject: RE: Hadoop - non disk based sorting?
>
> Thanks Ravi.
>
> So, why when map outputs are huge, reducer will not able to copy them?
>
> Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?
>
> Thank you,
>
> Mingxi
>
> -----Original Message-----
> From: Ravi teja ch n v [mailto:raviteja.chnv@huawei.com]
> Sent: Tuesday, November 29, 2011 9:46 PM
> To: common-dev@hadoop.apache.org
> Subject: RE: Hadoop - non disk based sorting?
>
> Hi Mingxi,
>
> From your stacktrace, I understand that the OutOfMemoryError has actually occured while copying the MapOutputs, not while sorting them.
>
> Since your Mapoutputs are huge and your reducer does have enough heap memory, you got the problem.
> When you have made the reducers to 200, your Map outputs have got partitioned amoung 200 reducers, so you didnt get this problem.
>
> By setting the max memory of your reducer with mapred.child.java.opts, you can get over this problem.
>
> Regards,
> Ravi teja
>
>
> ________________________________________
> From: Mingxi Wu [Mingxi.Wu@turn.com]
> Sent: 30 November 2011 05:14:49
> To: common-dev@hadoop.apache.org
> Subject: Hadoop - non disk based sorting?
>
> Hi,
>
> I have a question regarding the shuffle phase of reducer.
>
> It appears when there are large map output (in my case, 5 billion records), I will have out of memory Error like below.
>
> Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)
>
> However, I thought the shuffling phase is using disk-based sort, which is not constraint by memory.
> So, why will user run into this outofmemory error? After I increased my number of reducers from 100 to 200, the problem went away.
>
> Any input regarding this memory issue would be appreciated!
>
> Thanks,
>
> Mingxi
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Hadoop - non disk based sorting?

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
Hi,

Which hadoop version are you running? Your error looks similar to
https://issues.apache.org/jira/browse/MAPREDUCE-1182


On Fri, Dec 2, 2011 at 11:17 AM, Ravi teja ch n v
<ra...@huawei.com>wrote:

> Hi Bobby,
>
>  You are right that the Map outputs when copied will be spilled to the
> disk, but in case the the reducer cannot accomodate the copy inmemory.
> (shuffleInMemory and shuffleToDisk are chosen by rammanager based on
> inmemory size)
>
>  But according to the stack trace provided by Mingxi,
>
>  >org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
>
> The problem has occured,after the inmemory copy was chosen,
>
> Regards,
> Ravi Teja
> ________________________________________
> From: Robert Evans [evans@yahoo-inc.com]
> Sent: 01 December 2011 21:44:50
> To: common-dev@hadoop.apache.org
> Subject: Re: Hadoop - non disk based sorting?
>
> Mingxi,
>
> My understanding was that just like with the maps that when a reducer's in
> memory buffer fills up it too will spill to disk as part of the sort.  In
> fact I think it uses the exact same code for doing the sort as the map
> does.  There may be an issue where your sort buffer is some how too large
> for the amount of heap that you requested as part of the
> mapred.child.java.opts.  I have personally run a reduce that took in 300GB
> of data, which it successfully sorted, to test this very thing.  And no the
> box did not have 300 GB of RAM.
>
> --Bobby Evans
>
> On 12/1/11 4:12 AM, "Ravi teja ch n v" <ra...@huawei.com> wrote:
>
> Hi Mingxi ,
>
> >So, why when map outputs are huge, reducer will not able to copy them?
>
> The Reducer  will copy the Map output into its inmemory buffer. When the
> Reducer JVM doesnt have enough memory to accomodate the
> Map output, then it leads to OutOfMemoryException.
>
> >Can you please kindly explain what's the function of
> mapred.child.java.opts? how does it relate to copy?
>
> The Maps and Reducers will be launched in separate child JVMs launched at
> the Tasktrackers.
> When the Tasktracker launches the Map or Reduce JVMs, it uses the
> mapred.child.java.opts as JVM arguments for the new child JVMs.
>
> Regards,
> Ravi Teja
> ________________________________________
> From: Mingxi Wu [Mingxi.Wu@turn.com]
> Sent: 01 December 2011 12:37:54
> To: common-dev@hadoop.apache.org
> Subject: RE: Hadoop - non disk based sorting?
>
> Thanks Ravi.
>
> So, why when map outputs are huge, reducer will not able to copy them?
>
> Can you please kindly explain what's the function of
> mapred.child.java.opts? how does it relate to copy?
>
> Thank you,
>
> Mingxi
>
> -----Original Message-----
> From: Ravi teja ch n v [mailto:raviteja.chnv@huawei.com]
> Sent: Tuesday, November 29, 2011 9:46 PM
> To: common-dev@hadoop.apache.org
> Subject: RE: Hadoop - non disk based sorting?
>
> Hi Mingxi,
>
> From your stacktrace, I understand that the OutOfMemoryError has actually
> occured while copying the MapOutputs, not while sorting them.
>
> Since your Mapoutputs are huge and your reducer does have enough heap
> memory, you got the problem.
> When you have made the reducers to 200, your Map outputs have got
> partitioned amoung 200 reducers, so you didnt get this problem.
>
> By setting the max memory of your reducer with mapred.child.java.opts, you
> can get over this problem.
>
> Regards,
> Ravi teja
>
>
> ________________________________________
> From: Mingxi Wu [Mingxi.Wu@turn.com]
> Sent: 30 November 2011 05:14:49
> To: common-dev@hadoop.apache.org
> Subject: Hadoop - non disk based sorting?
>
> Hi,
>
> I have a question regarding the shuffle phase of reducer.
>
> It appears when there are large map output (in my case, 5 billion
> records), I will have out of memory Error like below.
>
> Error: java.lang.OutOfMemoryError: Java heap space at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)
>
> However, I thought the shuffling phase is using disk-based sort, which is
> not constraint by memory.
> So, why will user run into this outofmemory error? After I increased my
> number of reducers from 100 to 200, the problem went away.
>
> Any input regarding this memory issue would be appreciated!
>
> Thanks,
>
> Mingxi
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,
Nokia India Pvt. Ltd.

RE: Hadoop - non disk based sorting?

Posted by Ravi teja ch n v <ra...@huawei.com>.
Hi Bobby,

 You are right that the Map outputs when copied will be spilled to the disk, but in case the the reducer cannot accomodate the copy inmemory. (shuffleInMemory and shuffleToDisk are chosen by rammanager based on inmemory size)

 But according to the stack trace provided by Mingxi, 
 >org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592) 

The problem has occured,after the inmemory copy was chosen, 

Regards,
Ravi Teja
________________________________________
From: Robert Evans [evans@yahoo-inc.com]
Sent: 01 December 2011 21:44:50
To: common-dev@hadoop.apache.org
Subject: Re: Hadoop - non disk based sorting?

Mingxi,

My understanding was that just like with the maps that when a reducer's in memory buffer fills up it too will spill to disk as part of the sort.  In fact I think it uses the exact same code for doing the sort as the map does.  There may be an issue where your sort buffer is some how too large for the amount of heap that you requested as part of the mapred.child.java.opts.  I have personally run a reduce that took in 300GB of data, which it successfully sorted, to test this very thing.  And no the box did not have 300 GB of RAM.

--Bobby Evans

On 12/1/11 4:12 AM, "Ravi teja ch n v" <ra...@huawei.com> wrote:

Hi Mingxi ,

>So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the Reducer JVM doesnt have enough memory to accomodate the
Map output, then it leads to OutOfMemoryException.

>Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja
________________________________________
From: Mingxi Wu [Mingxi.Wu@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?

Thank you,

Mingxi

-----Original Message-----
From: Ravi teja ch n v [mailto:raviteja.chnv@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can get over this problem.

Regards,
Ravi teja


________________________________________
From: Mingxi Wu [Mingxi.Wu@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not constraint by memory.
So, why will user run into this outofmemory error? After I increased my number of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi

RE: Hadoop - non disk based sorting?

Posted by Ravi teja ch n v <ra...@huawei.com>.
Hi Mingxi ,

>So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the Reducer JVM doesnt have enough memory to accomodate the
Map output, then it leads to OutOfMemoryException.

>Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja
________________________________________
From: Mingxi Wu [Mingxi.Wu@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?

Thank you,

Mingxi

-----Original Message-----
From: Ravi teja ch n v [mailto:raviteja.chnv@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can get over this problem.

Regards,
Ravi teja


________________________________________
From: Mingxi Wu [Mingxi.Wu@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not constraint by memory.
So, why will user run into this outofmemory error? After I increased my number of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi

RE: Hadoop - non disk based sorting?

Posted by Ravi teja ch n v <ra...@huawei.com>.
Hi Mingxi ,

>So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the Reducer JVM doesnt have enough memory to accomodate the 
Map output, then it leads to OutOfMemoryException.

>Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja
________________________________________
From: Mingxi Wu [Mingxi.Wu@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? how does it relate to copy?

Thank you,

Mingxi

-----Original Message-----
From: Ravi teja ch n v [mailto:raviteja.chnv@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can get over this problem.

Regards,
Ravi teja


________________________________________
From: Mingxi Wu [Mingxi.Wu@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not constraint by memory.
So, why will user run into this outofmemory error? After I increased my number of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi