You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jie Li (JIRA)" <ji...@apache.org> on 2012/06/29 19:48:42 UTC

[jira] [Commented] (PIG-2779) Refactoring the code for setting number of reducers

    [ https://issues.apache.org/jira/browse/PIG-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404063#comment-13404063 ] 

Jie Li commented on PIG-2779:
-----------------------------

For the order-by, we need to pass its *final* #reducer (not the estimated one) to the sample job to generate the partition file, otherwise the partition file will be inconsistent and cause errors.

The final #reducer is calculated based on the requested one and the estimated one, the latter of which is calculated based on the input data size. Luckily the sample job has the same input data with the order-by, thus it can calculate in advance the final #reducer of the order-by.
                
> Refactoring the code for setting number of reducers
> ---------------------------------------------------
>
>                 Key: PIG-2779
>                 URL: https://issues.apache.org/jira/browse/PIG-2779
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jie Li
>             Fix For: 0.11
>
>
> As PIG-2652 observed, currently the code for setting number of reducers is a little messy. MapReduceOper.requestedParallelism seems being misused in some plases, and now we support runtime estimation of #reducer which further complicates the problem.
> For example, if we specify parallel 1 for the order-by, the estimated #reducer will be used. If we specify parallel 2 while it estimates 4, order-by will fail due to "Illegal partition for Null". If we specify parallel 4 while it estimates 2, then some reducers will have nothing to do. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira