You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Christian Kunz <ck...@yahoo-inc.com> on 2008/12/08 08:29:24 UTC

How are records with equal key sorted in hadoop-0.18?

Since running with hadoop-0.18 we have many more problems with running out
of memory during the final merge process in the reduce phase, especially
when dealing with a lot of records with the same key.

Typical exception:
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:278)
    at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:340)
    at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:134)
    at 
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:2
25)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242)
    at 
org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:720)
    at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:679)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.jav
a:227)
    at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:60)
    at 
org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:36)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
    at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

This did not occur in earlier releases although we used a much larger fan
factor io.sort.factor (500+ versus currently just 100). Also tasks are run
with 2GB of heap space.

What changed in the merge algorithm between hadoop-0.17 and hadoop-0.18?

Are records with same key getting sorted by size for some reason? This would
cause large values to be merged at the same time.

Thanks,
Christian

Re: How are records with equal key sorted in hadoop-0.18?

Posted by Christian Kunz <ck...@yahoo-inc.com>.

They vary a lot, also depends on the application.
Most of them are tiny (< 1kb), but there are outlayers going into 10's of
MB.

One application had a maximum of 30 MB per value and ran into OOM, with a
fan factor of 65 (from the logs:

2008-12-08 00:36:56,575 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 65 segments left of total size: 2478699273 bytes)

indicating that at one time nearly all values on the heap were from the
outlayers.

That's why I though there is some systematic ordering. How are records of
equal key compared on the merge heap? Are by any chance smaller values
processed first?

Thanks,
-Christian

On 12/8/08 10:55 AM, "Arun C Murthy" <ac...@yahoo-inc.com> wrote:

> Christian,
> 
> On Dec 7, 2008, at 11:29 PM, Christian Kunz wrote:
> 
>> Since running with hadoop-0.18 we have many more problems with
>> running out
>> of memory during the final merge process in the reduce phase,
>> especially
>> when dealing with a lot of records with the same key.
>> 
> 
> Would you have any data on the sizes of keys/values?
> 
> Arun
>

Re: How are records with equal key sorted in hadoop-0.18?

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

Christian,

On Dec 7, 2008, at 11:29 PM, Christian Kunz wrote:

> Since running with hadoop-0.18 we have many more problems with  
> running out
> of memory during the final merge process in the reduce phase,  
> especially
> when dealing with a lot of records with the same key.
>

Would you have any data on the sizes of keys/values?

Arun

Re: How are records with equal key sorted in hadoop-0.18?

Posted by Christian Kunz <ck...@yahoo-inc.com>.

Owen,

So far we did a couple of work-arounds and made the jobs to work. Now we are
already downstream running other applications. When we (likely) hit the
problem again, I will try to get heap profile.

Thanks
-Christian


On 12/8/08 9:10 AM, "Owen O'Malley" <om...@apache.org> wrote:

> On Dec 8, 2008, at 8:02 AM, Christian Kunz wrote:
> 
>> Comparing hadoop-default.xml of hadoop-0.18 with hadoop-0.17, didn't
>> map.sort.class change from
>> org.apache.hadoop.mapred.MergeSorter to
>> org.apache.hadoop.util.QuickSort?
> 
> Yes, but the quick sort is only used in the mapper. The reducer
> already has sorted runs and therefore only needs an external merge sort.
> 
> The primary change in 0.18 for the reducer was HADOOP-2095. What were
> the values of io.sort.factor and io.file.buffer.size?
> 
> Christian, can you get the heap profile for one of the reduces that is
> failing?
> 
> mapred.task.profile=true
> mapred.task.profile.maps= <empty string, since we don't want any maps>
> mapred.task.profile.reduces= <number of reduce to profile>
> mapred.task.profile.params=-
> agentlib:hprof=heap=sites,force=n,thread=y,verbose=n,file=%s
> 
> -- Owen

Re: How are records with equal key sorted in hadoop-0.18?

Posted by Owen O'Malley <om...@apache.org>.

On Dec 8, 2008, at 8:02 AM, Christian Kunz wrote:

> Comparing hadoop-default.xml of hadoop-0.18 with hadoop-0.17, didn't
> map.sort.class change from
> org.apache.hadoop.mapred.MergeSorter to
> org.apache.hadoop.util.QuickSort?

Yes, but the quick sort is only used in the mapper. The reducer  
already has sorted runs and therefore only needs an external merge sort.

The primary change in 0.18 for the reducer was HADOOP-2095. What were  
the values of io.sort.factor and io.file.buffer.size?

Christian, can you get the heap profile for one of the reduces that is  
failing?

mapred.task.profile=true
mapred.task.profile.maps= <empty string, since we don't want any maps>
mapred.task.profile.reduces= <number of reduce to profile>
mapred.task.profile.params=- 
agentlib:hprof=heap=sites,force=n,thread=y,verbose=n,file=%s

-- Owen

Re: How are records with equal key sorted in hadoop-0.18?

Posted by Christian Kunz <ck...@yahoo-inc.com>.

Devaraj,

fs.inmemory.size.mb =200

Intermediate compression turned on. At one time we checked whether turning
it off would help, but it did not.

I do not know the typical map output, but the reduces having trouble
typically merging 1-3GB (compressed) directly into the reduce application
with a fan of 50-100. That's when the task runs out of memory, and it is not
due to the application because it runs as a different process (pipes
application).

Comparing hadoop-default.xml of hadoop-0.18 with hadoop-0.17, didn't
map.sort.class change from
org.apache.hadoop.mapred.MergeSorter to
org.apache.hadoop.util.QuickSort?

What is the stability of hadoop's QuickSort? When comparing two records with
the same key, does the size of the value have an impact on sorting?

Thanks
Christian


On 12/8/08 2:15 AM, "Devaraj Das" <dd...@yahoo-inc.com> wrote:

> Hi Christian, there is no notable change to the merge algorithm except that
> it uses IFile instead of SequenceFile for the input and output.
> Is your application running with intermediate compression on? What's the
> value configured for fs.inmemory.size.mb? What is the typical map output
> size (if you happen to know)?
> 
> Devaraj
> 
> 
> On 12/8/08 12:59 PM, "Christian Kunz" <ck...@yahoo-inc.com> wrote:
> 
>> Since running with hadoop-0.18 we have many more problems with running out
>> of memory during the final merge process in the reduce phase, especially
>> when dealing with a lot of records with the same key.
>> 
>> Typical exception:
>> java.lang.OutOfMemoryError: Java heap space
>>     at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:278)
>>     at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:340)
>>     at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:134)
>>     at 
>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:2
>> 25)
>>     at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242)
>>     at 
>> org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:720)
>>     at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:679)
>>     at 
>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.jav
>> a:227)
>>     at 
>> org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:60)
>>     at 
>> org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:36)
>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>> 
>> This did not occur in earlier releases although we used a much larger fan
>> factor io.sort.factor (500+ versus currently just 100). Also tasks are run
>> with 2GB of heap space.
>> 
>> What changed in the merge algorithm between hadoop-0.17 and hadoop-0.18?
>> 
>> Are records with same key getting sorted by size for some reason? This would
>> cause large values to be merged at the same time.
>> 
>> Thanks,
>> Christian
>> 
> 
>

Re: How are records with equal key sorted in hadoop-0.18?

Posted by Devaraj Das <dd...@yahoo-inc.com>.

Hi Christian, there is no notable change to the merge algorithm except that
it uses IFile instead of SequenceFile for the input and output.
Is your application running with intermediate compression on? What's the
value configured for fs.inmemory.size.mb? What is the typical map output
size (if you happen to know)?

Devaraj


On 12/8/08 12:59 PM, "Christian Kunz" <ck...@yahoo-inc.com> wrote:

> Since running with hadoop-0.18 we have many more problems with running out
> of memory during the final merge process in the reduce phase, especially
> when dealing with a lot of records with the same key.
> 
> Typical exception:
> java.lang.OutOfMemoryError: Java heap space
>     at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:278)
>     at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:340)
>     at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:134)
>     at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:2
> 25)
>     at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242)
>     at 
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:720)
>     at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:679)
>     at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.jav
> a:227)
>     at 
> org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:60)
>     at 
> org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:36)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
>     at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> 
> This did not occur in earlier releases although we used a much larger fan
> factor io.sort.factor (500+ versus currently just 100). Also tasks are run
> with 2GB of heap space.
> 
> What changed in the merge algorithm between hadoop-0.17 and hadoop-0.18?
> 
> Are records with same key getting sorted by size for some reason? This would
> cause large values to be merged at the same time.
> 
> Thanks,
> Christian
>