You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Philip (flip) Kromer (JIRA)" <ji...@apache.org> on 2014/06/03 04:53:01 UTC
[jira] [Updated] (PIG-3985) Multiquery execution of RANK with RANK
BY causes NPE JobCreationException "ERROR 2017: Internal error creating job
configuration"
[ https://issues.apache.org/jira/browse/PIG-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip (flip) Kromer updated PIG-3985:
--------------------------------------
Attachment: us_city_pops.tsv
many_ranks_much_sadness.pig
Added script and sample data.
> Multiquery execution of RANK with RANK BY causes NPE JobCreationException "ERROR 2017: Internal error creating job configuration"
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-3985
> URL: https://issues.apache.org/jira/browse/PIG-3985
> Project: Pig
> Issue Type: Bug
> Reporter: Philip (flip) Kromer
> Labels: nullpointerexception, rank, udf
> Attachments: many_ranks_much_sadness.pig, us_city_pops.tsv
>
>
> A script with both RANK and RANK BY will crash with a Null Pointer Exception in JobControlCompiler.java when multiquery is enabled.
> The following script will work for any combination of the RANK BY operations; or if there is one RANK operation only (i.e. no other RANK or RANK BY operation). Non-BY-RANKS will perish together but succeed alone.
> Disabling multiquery execution makes everything work again.
> I am using Hadoop 2.4.0 with Pig Trunk (d24d06a48, after PIG-3739). The error occurs in local or mapreduce mode.
> {code}
> -- disable multiquery and you can rank all day long
> -- SET opt.multiquery false
> citypops = LOAD 'us_city_pops.tsv' AS (city:chararray, state:chararray, pop_2011:int);
> citypops_o = ORDER citypops BY city;
> --
> -- if you have one non-by RANK you may not have any other RANKs
> --
> citypops_nosort_inplace = RANK citypops;
> citypops_presorted_inplace = RANK citypops_o;
> citypops_ties_cause_skips = RANK citypops BY city;
> citypops_ties_no_skips = RANK citypops BY city DENSE;
> citypops_presorted_ranked = RANK citypops_o BY city;
> STORE citypops_nosort_inplace INTO '/tmp/citypops_nosort_inplace' USING PigStorage('\t', '--overwrite true');
> -- STORE citypops_presorted_inplace INTO '/tmp/citypops_presorted_inplace' USING PigStorage('\t', '--overwrite true');
> STORE citypops_ties_cause_skips INTO '/tmp/citypops_ties_cause_skips' USING PigStorage('\t', '--overwrite true');
> -- STORE citypops_ties_no_skips INTO '/tmp/citypops_ties_no_skips' USING PigStorage('\t', '--overwrite true');
> -- STORE citypops_presorted_ranked INTO '/tmp/citypops_presorted_ranked' USING PigStorage('\t', '--overwrite true');
> {code}
> {code}
> Pig Stack Trace
> ---------------
> ERROR 2017: Internal error creating job configuration.
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 2017: Internal error creating job configuration.
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:946)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:322)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:200)
> --- SNIP ----
> Caused by: java.lang.NullPointerException
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:886)
> ... 19 more
> {code}
> The proximate offense seems to be that globalCounters.get(operationID) returns null:
> {code}
> if(mro.isRankOperation()) {
> Iterator<String> operationIDs = mro.getRankOperationId().iterator();
> while(operationIDs.hasNext()) {
> String operationID = operationIDs.next();
> Iterator<Pair<String, Long>> itPairs = globalCounters.get(operationID).iterator();
> Pair<String,Long> pair = null;
> while(itPairs.hasNext()) {
> pair = itPairs.next();
> conf.setLong(pair.first, pair.second);
> }
> }
> }
> {code}
> PORank.java line 184 seems to need a counter value, and so this part does need to happen.
--
This message was sent by Atlassian JIRA
(v6.2#6252)