You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vivek Veeramani <vi...@gmail.com> on 2016/08/08 15:25:51 UTC

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Hi,

I'm relatively very new to Pig and came up with this issue while trying to
do a basic GROUP BY and LIMIT on a sample dataset.

What I'm doing here is:

sample_set = LOAD 's3n://<bucket>/<dev_dir>/000*-part.gz' USING
PigStorage(',') AS (col1:chararray,col2:chararray..col23:chararray);
sample_set_group_by_col1 = GROUP sample_set BY col1;
sample_set_group_by_col1_10 = LIMIT sample_set_group_by_col1 10;
DUMP sample_set_group_by_col1_10;

This job fails with the following error:

2016-08-08 14:28:59,622 FATAL [main]
org.apache.hadoop.mapred.YarnChild: Error running child :
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at java.util.Arrays.copyOf(Arrays.java:2271)
	at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
	at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
	at java.io.DataOutputStream.write(DataOutputStream.java:107)
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401)
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
	at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:580)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:641)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:474)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462)
	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470)
	at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:40)
	at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:198)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:1696)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1180)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)


Has anybody come across this error before? If yes, what's the solution to
this?


Best ,
Vivek Veeramani

cell : +91 - 9632 975 975
        +91 - 9895 277 101