You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "leiwangouc@gmail.com" <le...@gmail.com> on 2014/04/15 14:41:20 UTC

memoryjava.lang.OutOfMemoryError related with number of reducer?

I can fix this by changing heap size.
But what confuse me is that when i change the reducer number from 24 to 84, there's no this error.

Any insight on this?

Thanks
Lei
Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2786)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
	at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
	at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
	at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
	at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
	at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)


leiwangouc@gmail.com

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Found the root reason.
It is because of the nested distinct operation relies on the RAM to calculate unique values.
As described here: http://stackoverflow.com/questions/10732456/how-to-optimize-a-group-by-statement-in-pig-latin 

Thanks,
Lei


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-16 11:58
To: user; th; german.fl; user
Subject: Re: Re: java.lang.OutOfMemoryError related with number of reducer?
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Found the root reason.
It is because of the nested distinct operation relies on the RAM to calculate unique values.
As described here: http://stackoverflow.com/questions/10732456/how-to-optimize-a-group-by-statement-in-pig-latin 

Thanks,
Lei


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-16 11:58
To: user; th; german.fl; user
Subject: Re: Re: java.lang.OutOfMemoryError related with number of reducer?
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Found the root reason.
It is because of the nested distinct operation relies on the RAM to calculate unique values.
As described here: http://stackoverflow.com/questions/10732456/how-to-optimize-a-group-by-statement-in-pig-latin 

Thanks,
Lei


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-16 11:58
To: user; th; german.fl; user
Subject: Re: Re: java.lang.OutOfMemoryError related with number of reducer?
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Found the root reason.
It is because of the nested distinct operation relies on the RAM to calculate unique values.
As described here: http://stackoverflow.com/questions/10732456/how-to-optimize-a-group-by-statement-in-pig-latin 

Thanks,
Lei


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-16 11:58
To: user; th; german.fl; user
Subject: Re: Re: java.lang.OutOfMemoryError related with number of reducer?
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Found the root reason.
It is because of the nested distinct operation relies on the RAM to calculate unique values.
As described here: http://stackoverflow.com/questions/10732456/how-to-optimize-a-group-by-statement-in-pig-latin 

Thanks,
Lei


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-16 11:58
To: user; th; german.fl; user
Subject: Re: Re: java.lang.OutOfMemoryError related with number of reducer?
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: java.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Hi German & Thomas,

    Seems i found the data that causes the error, but i still don't know the exactly reason.

    I just do a group with pig latin: 
    
domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
        distinct_userid = DISTINCT data_filter.userid; 
        GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed.
But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. 


leiwangouc@gmail.com
 
From: leiwangouc@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks, let me take a careful look at it. 



leiwangouc@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Lei

A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 

Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 

 

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

 

 

 

From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

 

Thanks Thomas. 

 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?

Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

 

Thanks a lot.

 

 

  _____  

leiwangouc@gmail.com

 

From: Thomas Bentsen <ma...@bentzn.com> 

Date: 2014-04-15 21:53

To: user <ma...@hadoop.apache.org> 

Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

When you increase the number of reducers they each have less to work

with provided the data is distributed evenly between them - in this case

about one third of the original work.

It is eessentially the same thing as increasing the heap size - it's

just distributed between more reducers.

 

/th

 

 

 

On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:

> I can fix this by changing heap size.

> But what confuse me is that when i change the reducer number from 24

> to 84, there's no this error.

> 

> 

> Any insight on this?

> 

> 

> Thanks

> Lei

> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space

> at java.util.Arrays.copyOf(Arrays.java:2786)

> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)

> at java.io.DataOutputStream.write(DataOutputStream.java:90)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)

> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)

> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)

> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)

> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)

> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)

> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)

> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)

> 

> ______________________________________________________________________

> leiwangouc@gmail.com

 

 


RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Lei

A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 

Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 

 

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

 

 

 

From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

 

Thanks Thomas. 

 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?

Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

 

Thanks a lot.

 

 

  _____  

leiwangouc@gmail.com

 

From: Thomas Bentsen <ma...@bentzn.com> 

Date: 2014-04-15 21:53

To: user <ma...@hadoop.apache.org> 

Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

When you increase the number of reducers they each have less to work

with provided the data is distributed evenly between them - in this case

about one third of the original work.

It is eessentially the same thing as increasing the heap size - it's

just distributed between more reducers.

 

/th

 

 

 

On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:

> I can fix this by changing heap size.

> But what confuse me is that when i change the reducer number from 24

> to 84, there's no this error.

> 

> 

> Any insight on this?

> 

> 

> Thanks

> Lei

> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space

> at java.util.Arrays.copyOf(Arrays.java:2786)

> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)

> at java.io.DataOutputStream.write(DataOutputStream.java:90)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)

> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)

> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)

> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)

> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)

> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)

> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)

> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)

> 

> ______________________________________________________________________

> leiwangouc@gmail.com

 

 


RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Lei

A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 

Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 

 

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

 

 

 

From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

 

Thanks Thomas. 

 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?

Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

 

Thanks a lot.

 

 

  _____  

leiwangouc@gmail.com

 

From: Thomas Bentsen <ma...@bentzn.com> 

Date: 2014-04-15 21:53

To: user <ma...@hadoop.apache.org> 

Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

When you increase the number of reducers they each have less to work

with provided the data is distributed evenly between them - in this case

about one third of the original work.

It is eessentially the same thing as increasing the heap size - it's

just distributed between more reducers.

 

/th

 

 

 

On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:

> I can fix this by changing heap size.

> But what confuse me is that when i change the reducer number from 24

> to 84, there's no this error.

> 

> 

> Any insight on this?

> 

> 

> Thanks

> Lei

> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space

> at java.util.Arrays.copyOf(Arrays.java:2786)

> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)

> at java.io.DataOutputStream.write(DataOutputStream.java:90)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)

> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)

> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)

> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)

> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)

> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)

> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)

> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)

> 

> ______________________________________________________________________

> leiwangouc@gmail.com

 

 


RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Lei

A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. 

Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. 

 

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

 

 

 

From: leiwangouc@gmail.com [mailto:leiwangouc@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

 

Thanks Thomas. 

 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?

Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

 

Thanks a lot.

 

 

  _____  

leiwangouc@gmail.com

 

From: Thomas Bentsen <ma...@bentzn.com> 

Date: 2014-04-15 21:53

To: user <ma...@hadoop.apache.org> 

Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

When you increase the number of reducers they each have less to work

with provided the data is distributed evenly between them - in this case

about one third of the original work.

It is eessentially the same thing as increasing the heap size - it's

just distributed between more reducers.

 

/th

 

 

 

On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:

> I can fix this by changing heap size.

> But what confuse me is that when i change the reducer number from 24

> to 84, there's no this error.

> 

> 

> Any insight on this?

> 

> 

> Thanks

> Lei

> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space

> at java.util.Arrays.copyOf(Arrays.java:2786)

> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)

> at java.io.DataOutputStream.write(DataOutputStream.java:90)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)

> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)

> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)

> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)

> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)

> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)

> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)

> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)

> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)

> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)

> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)

> 

> ______________________________________________________________________

> leiwangouc@gmail.com

 

 


Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks Thomas. 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

Thanks a lot.




leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks Thomas. 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

Thanks a lot.




leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks Thomas. 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

Thanks a lot.




leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by "leiwangouc@gmail.com" <le...@gmail.com>.
Thanks Thomas. 

Anohter question.  I have no idea what is "Failed to merge in memory".  Does the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is there any other alternatives to fix this issue? 

Thanks a lot.




leiwangouc@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com
 
 

Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by Thomas Bentsen <th...@bentzn.com>.
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.

/th



On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2786)
> 	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> 	at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> 	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> 	at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> 	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> 	at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com



Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by Serega Sheypak <se...@gmail.com>.
when get ~4 (84/24) times less data on each reducer while increasing its
quantity from 24 to 84. It's good. Sometimes data is skewed and simple
reducers quantity bump doesn't help.


2014-04-15 16:41 GMT+04:00 leiwangouc@gmail.com <le...@gmail.com>:

> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24 to
> 84, there's no this error.
>
> Any insight on this?
>
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2786)
>         at
> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
>         at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
>         at
> org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
>         at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
>         at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
>         at
> org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
>         at
> org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
>         at
> org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
>         at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
>         at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
>         at
> org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
>         at
> org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
>         at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
>         at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
>         at
> org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
>         at
> org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
>         at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
>         at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
>         at
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
>         at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
>         at
> org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
>         at
> org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
>         at
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
>
>
> leiwangouc@gmail.com
>

Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by Thomas Bentsen <th...@bentzn.com>.
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.

/th



On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2786)
> 	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> 	at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> 	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> 	at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> 	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> 	at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com



Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by Thomas Bentsen <th...@bentzn.com>.
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.

/th



On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2786)
> 	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> 	at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> 	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> 	at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> 	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> 	at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com



Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

Posted by Thomas Bentsen <th...@bentzn.com>.
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.

/th



On Tue, 2014-04-15 at 20:41 +0800, leiwangouc@gmail.com wrote:
> I can fix this by changing heap size.
> But what confuse me is that when i change the reducer number from 24
> to 84, there's no this error.
> 
> 
> Any insight on this?
> 
> 
> Thanks
> Lei
> Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2786)
> 	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
> 	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
> 	at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
> 	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
> 	at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
> 	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> 	at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
> 
> ______________________________________________________________________
> leiwangouc@gmail.com