You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/04/26 01:38:14 UTC

[jira] [Updated] (PIG-3928) Reducer estimator gets wrong configuration for ORDER_BY job

     [ https://issues.apache.org/jira/browse/PIG-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-3928:
------------------------------------
    Fix Version/s:     (was: 0.16.0)
                   0.17.0

> Reducer estimator gets wrong configuration for ORDER_BY job
> -----------------------------------------------------------
>
>                 Key: PIG-3928
>                 URL: https://issues.apache.org/jira/browse/PIG-3928
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.1, 0.13.0
>            Reporter: Aniket Mokashi
>             Fix For: 0.17.0
>
>
> SAMPLER job requires a parameter that needs to be equal to number of reducers used by ORDER_BY job. This is done by getting successor of SAMPLER job and estimating reducers for it in the following code. However, job (conf) passed to calculateRuntimeReducers is corresponding to SAMPLER job instead of ORDER_BY job which causes problems in some custom reducer estimators that depend on the configuration.
> {code}
> // inside org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>     public void adjustNumReducers(MROperPlan plan, MapReduceOper mro,
>             org.apache.hadoop.mapreduce.Job nwJob) throws IOException {
>         int jobParallelism = calculateRuntimeReducers(mro, nwJob);
>         if (mro.isSampler() && plan.getSuccessors(mro) != null) {
>             // We need to calculate the final number of reducers of the next job (order-by or skew-join)
>             // to generate the quantfile.
>             MapReduceOper nextMro = plan.getSuccessors(mro).get(0);
>             // Here we use the same conf and Job to calculate the runtime #reducers of the next job
>             // which is fine as the statistics comes from the nextMro's POLoads
>             int nPartitions = calculateRuntimeReducers(nextMro, nwJob);
>             // set the runtime #reducer of the next job as the #partition
>             ParallelConstantVisitor visitor =
>                     new ParallelConstantVisitor(mro.reducePlan, nPartitions);
>             visitor.visit();
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)