You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2016/01/11 03:48:39 UTC

[jira] [Comment Edited] (PIG-4775) Better default values for shuffle bytes per reducer

    [ https://issues.apache.org/jira/browse/PIG-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091353#comment-15091353 ] 

Daniel Dai edited comment on PIG-4775 at 1/11/16 2:47 AM:
----------------------------------------------------------

I mean the default 128MB. Anyway, this is not related to the patch.

+1


was (Author: daijy):
I mean the default 128MB. Anyway, this is not related to the patch.

> Better default values for shuffle bytes per reducer
> ---------------------------------------------------
>
>                 Key: PIG-4775
>                 URL: https://issues.apache.org/jira/browse/PIG-4775
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4775-1.patch, PIG-4775-2.patch
>
>
> Currently the code does not set TEZ_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE if BYTES_PER_REDUCER_PARAM is not set or equal to DEFAULT_BYTES_PER_REDUCER (1G). Which makes it default to TEZ_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE_DEFAULT = 1024*1024*100L (100MB) which is low and can cause to produce more output files than usual. Removing that check and defaulting to 1G would be bad for performance as in case of mapreduce that was based as map input size, but in Tez it is taken as map output size. So setting 384MB as default for group by as they usually reduce size of data output and keeping 256MB for joins as they increase size of output data.
> Did not touch order by and skewed join as DEFAULT_BYTES_PER_REDUCER of 1G is honored there. Using 1G for them would be similar to mapreduce, as map input and output would be same for those cases. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)