You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by wi...@thomsonreuters.com on 2011/06/10 20:15:30 UTC
workaround for java.lang.OutOfMemoryError: Java heap space?
I have a pig script that is working well for small test data sets but fails on a run over realistic-sized data. Logs show
INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201106061024_0331 has failed!
…
job_201106061024_0331 CitedItemsGrpByDocId,DedupTCPerDocId GROUP_BY,COMBINER Message: Job failed!
…
attempt_201106061024_0331_m_000198_0 […] Error: java.lang.OutOfMemoryError: Java heap space
and similar same for all attempts at a few of the other (many) map tasks for this job.
I believe this job corresponds to these lines in my pig script:
CitedItemsGrpByDocId = group CitedItems by citeddocid;
DedupTCPerDocId =
foreach CitedItemsGrpByDocId {
CitingDocids = CitedItems.citingdocid;
UniqCitingDocids = distinct CitingDocids;
generate group, COUNT(UniqCitingDocids) as tc;
};
I tried increasing mapred.child.java.opts but the job failed in a setup stage with
Error occurred during initialization of VM
Could not reserve enough space for object heap
Are there job configurations/parameters for Hadoop or pig I can set to get around this? Is there a Pig Latin circumlocution, or better way to express what I want, that is not as memory-hungry?
Thank in advance,
Will
William F Dowling
Sr Technical Specialist, Software Engineering
RE: workaround for java.lang.OutOfMemoryError: Java heap space?
Posted by wi...@thomsonreuters.com.
Thank you Thejas! Turning off the combiner let the job go to completion. Next I can try the two-level approach to see what the performance penalty was. Kind regards,
Will
William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters
-----Original Message-----
From: Thejas M Nair [mailto:tejas@yahoo-inc.com]
Sent: Friday, June 10, 2011 2:50 PM
To: user@pig.apache.org; Dowling, William (Professional)
Subject: Re: workaround for java.lang.OutOfMemoryError: Java heap space?
I have seen this happen when there are very large number of distinct values
for a set of group keys. When combiner gets used, input records for reduce
task already has partial distinct bags, and this can result in large records
which cause MR to run out of memory trying to load the records.
You can modify the query the way its mentioned in comemnt#1 in -
https://issues.apache.org/jira/browse/PIG-1846
Or you can adding following to your script to disable combiner -
set pig.exec.nocombiner true;
Thanks,
Thejas
On 6/10/11 11:15 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:
> I have a pig script that is working well for small test data sets but fails on
> a run over realistic-sized data. Logs show
> INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201106061024_0331 has failed!
> S
> job_201106061024_0331 CitedItemsGrpByDocId,DedupTCPerDocId
> GROUP_BY,COMBINER Message: Job failed!
> S
> attempt_201106061024_0331_m_000198_0 [S] Error:
> java.lang.OutOfMemoryError: Java heap space
> and similar same for all attempts at a few of the other (many) map tasks for
> this job.
>
> I believe this job corresponds to these lines in my pig script:
>
> CitedItemsGrpByDocId = group CitedItems by citeddocid;
> DedupTCPerDocId =
> foreach CitedItemsGrpByDocId {
> CitingDocids = CitedItems.citingdocid;
> UniqCitingDocids = distinct CitingDocids;
> generate group, COUNT(UniqCitingDocids) as tc;
> };
>
> I tried increasing mapred.child.java.opts but the job failed in a setup stage
> with
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
>
> Are there job configurations/parameters for Hadoop or pig I can set to get
> around this? Is there a Pig Latin circumlocution, or better way to express
> what I want, that is not as memory-hungry?
>
> Thank in advance,
>
> Will
>
> William F Dowling
> Sr Technical Specialist, Software Engineering
>
>
>
--
Re: workaround for java.lang.OutOfMemoryError: Java heap space?
Posted by Thejas M Nair <te...@yahoo-inc.com>.
I have seen this happen when there are very large number of distinct values
for a set of group keys. When combiner gets used, input records for reduce
task already has partial distinct bags, and this can result in large records
which cause MR to run out of memory trying to load the records.
You can modify the query the way its mentioned in comemnt#1 in -
https://issues.apache.org/jira/browse/PIG-1846
Or you can adding following to your script to disable combiner -
set pig.exec.nocombiner true;
Thanks,
Thejas
On 6/10/11 11:15 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:
> I have a pig script that is working well for small test data sets but fails on
> a run over realistic-sized data. Logs show
> INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201106061024_0331 has failed!
> S
> job_201106061024_0331 CitedItemsGrpByDocId,DedupTCPerDocId
> GROUP_BY,COMBINER Message: Job failed!
> S
> attempt_201106061024_0331_m_000198_0 [S] Error:
> java.lang.OutOfMemoryError: Java heap space
> and similar same for all attempts at a few of the other (many) map tasks for
> this job.
>
> I believe this job corresponds to these lines in my pig script:
>
> CitedItemsGrpByDocId = group CitedItems by citeddocid;
> DedupTCPerDocId =
> foreach CitedItemsGrpByDocId {
> CitingDocids = CitedItems.citingdocid;
> UniqCitingDocids = distinct CitingDocids;
> generate group, COUNT(UniqCitingDocids) as tc;
> };
>
> I tried increasing mapred.child.java.opts but the job failed in a setup stage
> with
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
>
> Are there job configurations/parameters for Hadoop or pig I can set to get
> around this? Is there a Pig Latin circumlocution, or better way to express
> what I want, that is not as memory-hungry?
>
> Thank in advance,
>
> Will
>
> William F Dowling
> Sr Technical Specialist, Software Engineering
>
>
>
--