You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/06/28 17:05:24 UTC

[jira] [Commented] (PIG-4043) JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks

    [ https://issues.apache.org/jira/browse/PIG-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046878#comment-14046878 ] 

Rohini Palaniswamy commented on PIG-4043:
-----------------------------------------

Patch is good. Just one minor comment. Can you make pig.stats.noTaskReport configuration all lower case?

But before having to try add an option to turn off taskreports, I would like to suggest trying another thing. HadoopShims.java getTaskReports() for Hadoop 2.x does the below

{code}
org.apache.hadoop.mapreduce.TaskReport[] reports = mrJob.getTaskReports(type);
            return DowngradeHelper.downgradeTaskReports(reports); 
{code}

I think the OOM is because there are two huge arrays during the same time unlike Hadoop 1.x HadoopShims. I would suggest getTaskReports return an iterator instead of array and for every .next() call do TaskReport.downgrade() for the TaskReport to be returned. That way there is only the initial big array and only one TaskReport at a time.

> JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-4043
>                 URL: https://issues.apache.org/jira/browse/PIG-4043
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4043-1.patch, heapdump.png
>
>
> With Hadoop 2.4, I often see Pig client fails due to OOM when there are many tasks (~100K) with 1GB heap size.
> The heap dump (attached) shows that TaskReport[] occupies about 80% of heap space at the time of OOM.
> The problem is that JobClient.getMap/ReduceTaskReports() returns an array of TaskReport objects, which can be huge if the number of task is large.



--
This message was sent by Atlassian JIRA
(v6.2#6252)