You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by wi...@thomsonreuters.com on 2011/06/10 20:15:30 UTC

workaround for java.lang.OutOfMemoryError: Java heap space?

I have a pig script that is working well for small test data sets but fails on a run over realistic-sized data. Logs show
  INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201106061024_0331 has failed!
  …
  job_201106061024_0331   CitedItemsGrpByDocId,DedupTCPerDocId    GROUP_BY,COMBINER       Message: Job failed!
  …
 attempt_201106061024_0331_m_000198_0  […]   Error: java.lang.OutOfMemoryError: Java heap space
  and similar same for all attempts at a few of the other (many) map tasks for this job.

I believe  this job corresponds to these lines in my pig script:

 CitedItemsGrpByDocId = group CitedItems by citeddocid;
 DedupTCPerDocId =
     foreach CitedItemsGrpByDocId {
         CitingDocids =  CitedItems.citingdocid;
         UniqCitingDocids = distinct CitingDocids;
         generate group, COUNT(UniqCitingDocids) as tc;
      };

I tried increasing mapred.child.java.opts but the job failed in a setup stage with 
  Error occurred during initialization of VM
  Could not reserve enough space for object heap

Are there job configurations/parameters for Hadoop or pig I can set to get around this? Is there a Pig Latin circumlocution, or better way to express what I want, that is not as memory-hungry?

Thank in advance,

Will

William F Dowling
Sr Technical Specialist, Software Engineering



RE: workaround for java.lang.OutOfMemoryError: Java heap space?

Posted by wi...@thomsonreuters.com.
Thank you Thejas! Turning off the combiner let the job go to completion.  Next I can try the two-level approach to see what the performance penalty was.  Kind regards,
Will

William F Dowling
Sr Technical Specialist, Software Engineering
Thomson Reuters



-----Original Message-----
From: Thejas M Nair [mailto:tejas@yahoo-inc.com] 
Sent: Friday, June 10, 2011 2:50 PM
To: user@pig.apache.org; Dowling, William (Professional)
Subject: Re: workaround for java.lang.OutOfMemoryError: Java heap space?

I have seen this happen when there are very large number of distinct values
for a set of group keys. When combiner gets used, input records for reduce
task already has partial distinct bags, and this can result in large records
which cause MR to run out of memory trying to load the records.

You can modify the query the way its mentioned in comemnt#1 in -
https://issues.apache.org/jira/browse/PIG-1846

Or you can adding following to your script to disable combiner -

set pig.exec.nocombiner true;

Thanks,
Thejas




On 6/10/11 11:15 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:

> I have a pig script that is working well for small test data sets but fails on
> a run over realistic-sized data. Logs show
>   INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201106061024_0331 has failed!
>   S
>   job_201106061024_0331   CitedItemsGrpByDocId,DedupTCPerDocId
> GROUP_BY,COMBINER       Message: Job failed!
>   S
>  attempt_201106061024_0331_m_000198_0  [S]   Error:
> java.lang.OutOfMemoryError: Java heap space
>   and similar same for all attempts at a few of the other (many) map tasks for
> this job.
> 
> I believe  this job corresponds to these lines in my pig script:
> 
>  CitedItemsGrpByDocId = group CitedItems by citeddocid;
>  DedupTCPerDocId =
>      foreach CitedItemsGrpByDocId {
>          CitingDocids =  CitedItems.citingdocid;
>          UniqCitingDocids = distinct CitingDocids;
>          generate group, COUNT(UniqCitingDocids) as tc;
>       };
> 
> I tried increasing mapred.child.java.opts but the job failed in a setup stage
> with
>   Error occurred during initialization of VM
>   Could not reserve enough space for object heap
> 
> Are there job configurations/parameters for Hadoop or pig I can set to get
> around this? Is there a Pig Latin circumlocution, or better way to express
> what I want, that is not as memory-hungry?
> 
> Thank in advance,
> 
> Will
> 
> William F Dowling
> Sr Technical Specialist, Software Engineering
> 
> 
> 


-- 



Re: workaround for java.lang.OutOfMemoryError: Java heap space?

Posted by Thejas M Nair <te...@yahoo-inc.com>.
I have seen this happen when there are very large number of distinct values
for a set of group keys. When combiner gets used, input records for reduce
task already has partial distinct bags, and this can result in large records
which cause MR to run out of memory trying to load the records.

You can modify the query the way its mentioned in comemnt#1 in -
https://issues.apache.org/jira/browse/PIG-1846

Or you can adding following to your script to disable combiner -

set pig.exec.nocombiner true;

Thanks,
Thejas




On 6/10/11 11:15 AM, "william.dowling@thomsonreuters.com"
<wi...@thomsonreuters.com> wrote:

> I have a pig script that is working well for small test data sets but fails on
> a run over realistic-sized data. Logs show
>   INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201106061024_0331 has failed!
>   S
>   job_201106061024_0331   CitedItemsGrpByDocId,DedupTCPerDocId
> GROUP_BY,COMBINER       Message: Job failed!
>   S
>  attempt_201106061024_0331_m_000198_0  [S]   Error:
> java.lang.OutOfMemoryError: Java heap space
>   and similar same for all attempts at a few of the other (many) map tasks for
> this job.
> 
> I believe  this job corresponds to these lines in my pig script:
> 
>  CitedItemsGrpByDocId = group CitedItems by citeddocid;
>  DedupTCPerDocId =
>      foreach CitedItemsGrpByDocId {
>          CitingDocids =  CitedItems.citingdocid;
>          UniqCitingDocids = distinct CitingDocids;
>          generate group, COUNT(UniqCitingDocids) as tc;
>       };
> 
> I tried increasing mapred.child.java.opts but the job failed in a setup stage
> with
>   Error occurred during initialization of VM
>   Could not reserve enough space for object heap
> 
> Are there job configurations/parameters for Hadoop or pig I can set to get
> around this? Is there a Pig Latin circumlocution, or better way to express
> what I want, that is not as memory-hungry?
> 
> Thank in advance,
> 
> Will
> 
> William F Dowling
> Sr Technical Specialist, Software Engineering
> 
> 
> 


--