You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@gmail.com> on 2014/03/24 14:40:58 UTC
Could not estimate number of reducers
Hi,
Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation
of the number of reducers no longer work.
My script:
A = load 'data';
B = group A by $0;
store B into 'out';
My data:
grunt> ls
hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging <dir>
hdfs://computation-master.dev.ubithere.com:9000/user/root/data<r
3> 1908911680
When I run my script (see the last line):
Apache Pig version 0.12.1-SNAPSHOT (rexported) compiled Feb 06 2014,
16:57:49
Logging error messages to: /root/pig.log
Default bootup file /root/.pigbootup not found
Connecting to hadoop file system at:
hdfs://computation-master.dev.ubithere.com:9000
Connecting to map-reduce job tracker at:
computation-master.dev.ubithere.com:9001
Pig features used in the script: GROUP_BY
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune,
DuplicateForEachColumnRewrite, GroupByConstParallelSetter,
ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter,
MergeFilter, MergeForEach, NewPartitionFilterOptimizer,
PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter,
SplitFilter, StreamTypeCastInserter],
RULES_DISABLED=[FilterLogicExpressionSimplifier]}
File concatenation threshold: 100 optimistic? false
MR plan size before optimization: 1
MR plan size after optimization: 1
Pig script settings are added to the job
mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
creating jar file Job7470230163933306330.jar
jar file Job7470230163933306330.jar created
Setting up single store job
Reduce phase detected, estimating # of required reducers.
Using reducer estimator:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
Could not estimate number of reducers and no requested or default
parallelism set. Defaulting to 1 reducer.
Setting Parallelism to 1
I tried to debug; in the source code below, the
PlanHelper.getPhysicalOperators always return an empty list.
public int estimateNumberOfReducers(Job job, MapReduceOper
mapReduceOper) throws IOException {
Configuration conf = job.getConfiguration();
long bytesPerReducer =
conf.getLong(BYTES_PER_REDUCER_PARAM, DEFAULT_BYTES_PER_REDUCER);
int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
DEFAULT_MAX_REDUCER_COUNT_PARAM);
List<POLoad> poLoads =
PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
long totalInputFileSize = getTotalInputFileSize(conf,
poLoads, job);
Any idea ?
Thanks for your help
Re: Could not estimate number of reducers
Posted by Vincent Barat <vi...@gmail.com>.
I hithttps://issues.apache.org/jira/browse/PIG-3512
Le 24/03/2014 14:40, Vincent Barat a écrit :
> Hi,
>
> Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation
> of the number of reducers no longer work.
>
> My script:
>
> A = load 'data';
> B = group A by $0;
> store B into 'out';
>
> My data:
>
> grunt> ls
> hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging
> <dir>
> hdfs://computation-master.dev.ubithere.com:9000/user/root/data<r
> 3> 1908911680
>
> When I run my script (see the last line):
>
> Apache Pig version 0.12.1-SNAPSHOT (rexported) compiled Feb 06
> 2014, 16:57:49
> Logging error messages to: /root/pig.log
> Default bootup file /root/.pigbootup not found
> Connecting to hadoop file system at:
> hdfs://computation-master.dev.ubithere.com:9000
> Connecting to map-reduce job tracker at:
> computation-master.dev.ubithere.com:9001
> Pig features used in the script: GROUP_BY
> {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune,
> DuplicateForEachColumnRewrite, GroupByConstParallelSetter,
> ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter,
> MergeFilter, MergeForEach, NewPartitionFilterOptimizer,
> PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter,
> SplitFilter, StreamTypeCastInserter],
> RULES_DISABLED=[FilterLogicExpressionSimplifier]}
> File concatenation threshold: 100 optimistic? false
> MR plan size before optimization: 1
> MR plan size after optimization: 1
> Pig script settings are added to the job
> mapred.job.reduce.markreset.buffer.percent is not set, set to
> default 0.3
> creating jar file Job7470230163933306330.jar
> jar file Job7470230163933306330.jar created
> Setting up single store job
> Reduce phase detected, estimating # of required reducers.
> Using reducer estimator:
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
> BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
> Could not estimate number of reducers and no requested or default
> parallelism set. Defaulting to 1 reducer.
> Setting Parallelism to 1
>
> I tried to debug; in the source code below, the
> PlanHelper.getPhysicalOperators always return an empty list.
>
> public int estimateNumberOfReducers(Job job, MapReduceOper
> mapReduceOper) throws IOException {
> Configuration conf = job.getConfiguration();
>
> long bytesPerReducer =
> conf.getLong(BYTES_PER_REDUCER_PARAM, DEFAULT_BYTES_PER_REDUCER);
> int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
> DEFAULT_MAX_REDUCER_COUNT_PARAM);
>
> List<POLoad> poLoads =
> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
> long totalInputFileSize = getTotalInputFileSize(conf,
> poLoads, job);
>
> Any idea ?
>
> Thanks for your help
>
>