You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/01/08 18:36:52 UTC

[jira] [Commented] (PIG-3659) Memory management for each vertex

    [ https://issues.apache.org/jira/browse/PIG-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865666#comment-13865666 ] 

Rohini Palaniswamy commented on PIG-3659:
-----------------------------------------

  Current code just defaults to 1G for each vertex to get things to work. 

We need to 
   1) Classify whether a vertex is a map or reduce and set java.opts (mapreduce.map.java.opts or mapreduce.reduce.java.opts), memory.mb (mapreduce.map.memory.mb or mapreduce.reduce.memory.mb) and env (mapreduce.map.env or mapreduce.reduce.env) accordingly on the vertex. A simple thing would be to assume all root vertexes to be map vertexes and intermediate or leaf vertexes to be reduce vertexes.
   2) Even for a map vertex, if there are multiple outputs more memory is required as combine and sort happens on each output. Similarly on a reduce vertex if there are multiple inputs shuffle and sort happens on each  input thus requiring more memory than the traditional map or reduce. i.e the sort buffers (io.sort.mb) and buffer for holding each record before serializing or deserializing them take up memory. For eg: With 3 inputs or outputs, thrice the amount of memory is tried to be allocated for the buffers leading to OOM. Increasing memory for a vertex based on number of inputs or outputs might not solve the problem totally. This is something we will have to talk to Tez guys to see how effectively this can be solved.

> Memory management for each vertex
> ---------------------------------
>
>                 Key: PIG-3659
>                 URL: https://issues.apache.org/jira/browse/PIG-3659
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: tez-branch
>
>
> We need to configure appropriate memory options for each vertex.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)