You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/09/17 06:05:34 UTC
[jira] [Commented] (PIG-4043) JobClient.getMap/ReduceTaskReports()
causes OOM for jobs with a large number of tasks
[ https://issues.apache.org/jira/browse/PIG-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136742#comment-14136742 ]
Rohini Palaniswamy commented on PIG-4043:
-----------------------------------------
bq. In 0.12, there are no two copies of TaskReport arrays
Actually. Even it has. The two arrays are inside JobClient.java in mapreduce though.
> JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks
> -------------------------------------------------------------------------------------
>
> Key: PIG-4043
> URL: https://issues.apache.org/jira/browse/PIG-4043
> Project: Pig
> Issue Type: Bug
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.14.0
>
> Attachments: PIG-4043-1.patch, PIG-4043-2.patch, PIG-4043-3.patch, PIG-4043-4.patch, PIG-4043-5.patch, heapdump.png
>
>
> With Hadoop 2.4, I often see Pig client fails due to OOM when there are many tasks (~100K) with 1GB heap size.
> The heap dump (attached) shows that TaskReport[] occupies about 80% of heap space at the time of OOM.
> The problem is that JobClient.getMap/ReduceTaskReports() returns an array of TaskReport objects, which can be huge if the number of task is large.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)