You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/09/22 03:27:33 UTC

[jira] [Created] (PIG-4188) FindQuantilesTez throwing IndexOutOfBoundsException with small dataset

Rajesh Balamohan created PIG-4188:
-------------------------------------

             Summary: FindQuantilesTez throwing IndexOutOfBoundsException with small dataset
                 Key: PIG-4188
                 URL: https://issues.apache.org/jira/browse/PIG-4188
             Project: Pig
          Issue Type: Bug
          Components: tez
            Reporter: Rajesh Balamohan


java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.pig.impl.builtin.FindQuantiles.exec(FindQuantiles.java:217)
        at org.apache.pig.backend.hadoop.executionengine.tez.FindQuantilesTez.exec(FindQuantilesTez.java:96)
        at org.apache.pig.backend.hadoop.executionengine.tez.FindQuantilesTez.exec(FindQuantilesTez.java:35)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:344)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:383)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:355)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:379)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:299)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:301)
        at org.apache.pig.backend.hadoop.executionengine.tez.POValueOutputTez.getNextTuple(POValueOutputTez.java:141)
        at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.runPipeline(PigProcessor.java:319)
        at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.run(PigProcessor.java:198)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

Pig script:
========
set tez.lib.uris '<appropriate_location>'
set tez.runtime.shuffle.fetch.max.task.output.at.once 2
set mapreduce.map.output.compress true;
set mapreduce.map.output.compress.codec 'org.apache.hadoop.io.compress.SnappyCodec';
set mapred.reduce.child.java.opts '-Xmx1024m';

A = load '/user/data/studenttab10' as (name, age, gpa);
B = filter A by age > 20;
C = group B by name;
D = foreach C generate group, COUNT(B) PARALLEL 16;
E = order D by $0 PARALLEL 16;
F = limit E 10;
store F into '/user/output/';

Dataset:
=======
katie underhill 44      3.49
irene thompson  72      3.42
quinn robinson  50      3.26
david quirinius 76      0.86
nick ichabod    32      2.87
fred ichabod    57      3.95
fred hernandez  18      2.17
sarah nixon     21      3.70
holly ichabod   35      2.91
fred hernandez  42      2.68




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)