You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/09/20 23:52:09 UTC

[jira] [Created] (PIG-2295) determine number of reducers before MR plan optimizations are done

determine number of reducers before MR plan optimizations are done
------------------------------------------------------------------

                 Key: PIG-2295
                 URL: https://issues.apache.org/jira/browse/PIG-2295
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.9.0
            Reporter: Thejas M Nair


MR plan optimization rules use the requested parallelism (specified by user) in the optimization rules. But if the user has not specified the number of reducers, they are determined based on the input data size. But this final number of reducers is not what the optimization rules see, and as a result the plans are sub optimal in two cases -
1. If user has not specified parallelism, and the parallelism heuristic sets parallelism to 1, the LimitAdjuster ends up introducing an unnecessary extra MR job.

2. If the user has not specfied parallelism and parallelism heuristic sets parallelism to be higher than pig.files.concatenation.threshold, the extra concatenation job in case of FRJoin does not get added. The check is in MRCompiler.visitFRJoin().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira