You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@gmail.com> on 2014/03/24 14:40:58 UTC

Could not estimate number of reducers

Hi,

Since I moved from Pig 0.10.0 to  0.11.0 or 0.12.0, the estimation 
of the number of reducers no longer work.

My script:

A = load 'data';
B = group A by $0;
store B into 'out';

My data:

grunt> ls
hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging <dir>
hdfs://computation-master.dev.ubithere.com:9000/user/root/data<r 
3>    1908911680

When I run my script (see the last line):

Apache Pig version 0.12.1-SNAPSHOT (rexported) compiled Feb 06 2014, 
16:57:49
Logging error messages to: /root/pig.log
Default bootup file /root/.pigbootup not found
Connecting to hadoop file system at: 
hdfs://computation-master.dev.ubithere.com:9000
Connecting to map-reduce job tracker at: 
computation-master.dev.ubithere.com:9001
Pig features used in the script: GROUP_BY
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, 
DuplicateForEachColumnRewrite, GroupByConstParallelSetter, 
ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, 
MergeFilter, MergeForEach, NewPartitionFilterOptimizer, 
PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, 
SplitFilter, StreamTypeCastInserter], 
RULES_DISABLED=[FilterLogicExpressionSimplifier]}
File concatenation threshold: 100 optimistic? false
MR plan size before optimization: 1
MR plan size after optimization: 1
Pig script settings are added to the job
mapred.job.reduce.markreset.buffer.percent is not set, set to 
default 0.3
creating jar file Job7470230163933306330.jar
jar file Job7470230163933306330.jar created
Setting up single store job
Reduce phase detected, estimating # of required reducers.
Using reducer estimator: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
Could not estimate number of reducers and no requested or default 
parallelism set. Defaulting to 1 reducer.
Setting Parallelism to 1

I tried to debug; in the source code below, the 
PlanHelper.getPhysicalOperators always return an empty list.

     public int estimateNumberOfReducers(Job job, MapReduceOper 
mapReduceOper) throws IOException {
         Configuration conf = job.getConfiguration();

         long bytesPerReducer = 
conf.getLong(BYTES_PER_REDUCER_PARAM, DEFAULT_BYTES_PER_REDUCER);
         int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM, 
DEFAULT_MAX_REDUCER_COUNT_PARAM);

         List<POLoad> poLoads = 
PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
         long totalInputFileSize = getTotalInputFileSize(conf, 
poLoads, job);

Any idea ?

Thanks for your help


Re: Could not estimate number of reducers

Posted by Vincent Barat <vi...@gmail.com>.
I hithttps://issues.apache.org/jira/browse/PIG-3512

Le 24/03/2014 14:40, Vincent Barat a écrit :
> Hi,
>
> Since I moved from Pig 0.10.0 to  0.11.0 or 0.12.0, the estimation 
> of the number of reducers no longer work.
>
> My script:
>
> A = load 'data';
> B = group A by $0;
> store B into 'out';
>
> My data:
>
> grunt> ls
> hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging 
> <dir>
> hdfs://computation-master.dev.ubithere.com:9000/user/root/data<r 
> 3>    1908911680
>
> When I run my script (see the last line):
>
> Apache Pig version 0.12.1-SNAPSHOT (rexported) compiled Feb 06 
> 2014, 16:57:49
> Logging error messages to: /root/pig.log
> Default bootup file /root/.pigbootup not found
> Connecting to hadoop file system at: 
> hdfs://computation-master.dev.ubithere.com:9000
> Connecting to map-reduce job tracker at: 
> computation-master.dev.ubithere.com:9001
> Pig features used in the script: GROUP_BY
> {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, 
> DuplicateForEachColumnRewrite, GroupByConstParallelSetter, 
> ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, 
> MergeFilter, MergeForEach, NewPartitionFilterOptimizer, 
> PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, 
> SplitFilter, StreamTypeCastInserter], 
> RULES_DISABLED=[FilterLogicExpressionSimplifier]}
> File concatenation threshold: 100 optimistic? false
> MR plan size before optimization: 1
> MR plan size after optimization: 1
> Pig script settings are added to the job
> mapred.job.reduce.markreset.buffer.percent is not set, set to 
> default 0.3
> creating jar file Job7470230163933306330.jar
> jar file Job7470230163933306330.jar created
> Setting up single store job
> Reduce phase detected, estimating # of required reducers.
> Using reducer estimator: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
> BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
> Could not estimate number of reducers and no requested or default 
> parallelism set. Defaulting to 1 reducer.
> Setting Parallelism to 1
>
> I tried to debug; in the source code below, the 
> PlanHelper.getPhysicalOperators always return an empty list.
>
>     public int estimateNumberOfReducers(Job job, MapReduceOper 
> mapReduceOper) throws IOException {
>         Configuration conf = job.getConfiguration();
>
>         long bytesPerReducer = 
> conf.getLong(BYTES_PER_REDUCER_PARAM, DEFAULT_BYTES_PER_REDUCER);
>         int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM, 
> DEFAULT_MAX_REDUCER_COUNT_PARAM);
>
>         List<POLoad> poLoads = 
> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
>         long totalInputFileSize = getTotalInputFileSize(conf, 
> poLoads, job);
>
> Any idea ?
>
> Thanks for your help
>
>