You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Philip (flip) Kromer (JIRA)" <ji...@apache.org> on 2014/06/03 04:53:01 UTC

[jira] [Updated] (PIG-3985) Multiquery execution of RANK with RANK BY causes NPE JobCreationException "ERROR 2017: Internal error creating job configuration"

     [ https://issues.apache.org/jira/browse/PIG-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip (flip) Kromer updated PIG-3985:
--------------------------------------

    Attachment: us_city_pops.tsv
                many_ranks_much_sadness.pig

Added script and sample data.

> Multiquery execution of RANK with RANK BY causes NPE JobCreationException "ERROR 2017: Internal error creating job configuration"
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3985
>                 URL: https://issues.apache.org/jira/browse/PIG-3985
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Philip (flip) Kromer
>              Labels: nullpointerexception, rank, udf
>         Attachments: many_ranks_much_sadness.pig, us_city_pops.tsv
>
>
> A script with both RANK and RANK BY will crash with a Null Pointer Exception in JobControlCompiler.java when multiquery is enabled.
> The following script will work for any combination of the RANK BY operations; or if there is one RANK operation only (i.e. no other RANK or RANK BY operation). Non-BY-RANKS will perish together but succeed alone.
> Disabling multiquery execution makes everything work again.
> I am using Hadoop 2.4.0 with Pig Trunk (d24d06a48, after PIG-3739). The error occurs in local or mapreduce mode.
> {code}
> -- disable multiquery and you can rank all day long
> -- SET opt.multiquery false
> citypops = LOAD 'us_city_pops.tsv' AS (city:chararray, state:chararray, pop_2011:int);
> citypops_o = ORDER citypops BY city;
> --
> -- if you have one non-by RANK you may not have any other RANKs
> --
> citypops_nosort_inplace    = RANK citypops;
> citypops_presorted_inplace = RANK citypops_o;
> citypops_ties_cause_skips  = RANK citypops   BY city;
> citypops_ties_no_skips     = RANK citypops   BY city  DENSE;
> citypops_presorted_ranked  = RANK citypops_o BY city;
> STORE citypops_nosort_inplace    INTO '/tmp/citypops_nosort_inplace'    USING PigStorage('\t', '--overwrite true');
> -- STORE citypops_presorted_inplace INTO '/tmp/citypops_presorted_inplace' USING PigStorage('\t', '--overwrite true');
> STORE citypops_ties_cause_skips  INTO '/tmp/citypops_ties_cause_skips'  USING PigStorage('\t', '--overwrite true');
> -- STORE citypops_ties_no_skips     INTO '/tmp/citypops_ties_no_skips'     USING PigStorage('\t', '--overwrite true');
> -- STORE citypops_presorted_ranked  INTO '/tmp/citypops_presorted_ranked'  USING PigStorage('\t', '--overwrite true');
> {code}
> {code}
> Pig Stack Trace
> ---------------
> ERROR 2017: Internal error creating job configuration.
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 2017: Internal error creating job configuration.
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:946)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:322)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:200)
>      --- SNIP ----
> Caused by: java.lang.NullPointerException
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:886)
>         ... 19 more
> {code}
> The proximate offense seems to be that globalCounters.get(operationID) returns null:
> {code}
>             if(mro.isRankOperation()) {
>                 Iterator<String> operationIDs = mro.getRankOperationId().iterator();
>                 while(operationIDs.hasNext()) {
>                     String operationID = operationIDs.next();
>                     Iterator<Pair<String, Long>> itPairs = globalCounters.get(operationID).iterator();
>                     Pair<String,Long> pair = null;
>                     while(itPairs.hasNext()) {
>                         pair = itPairs.next();
>                         conf.setLong(pair.first, pair.second);
>                     }
>                 }
>             }
> {code}
> PORank.java line 184 seems to need a counter value, and so this part does need to happen.



--
This message was sent by Atlassian JIRA
(v6.2#6252)