You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/09/22 11:44:34 UTC

[jira] [Resolved] (PIG-4188) FindQuantilesTez throwing IndexOutOfBoundsException with small dataset

     [ https://issues.apache.org/jira/browse/PIG-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan resolved PIG-4188.
-----------------------------------
    Resolution: Duplicate

Thanks Daniel.  PIG-4177 was the issue; I am marking this as duplicate and closing it.

> FindQuantilesTez throwing IndexOutOfBoundsException with small dataset
> ----------------------------------------------------------------------
>
>                 Key: PIG-4188
>                 URL: https://issues.apache.org/jira/browse/PIG-4188
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>            Reporter: Rajesh Balamohan
>
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>         at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>         at java.util.ArrayList.get(ArrayList.java:411)
>         at org.apache.pig.impl.builtin.FindQuantiles.exec(FindQuantiles.java:217)
>         at org.apache.pig.backend.hadoop.executionengine.tez.FindQuantilesTez.exec(FindQuantilesTez.java:96)
>         at org.apache.pig.backend.hadoop.executionengine.tez.FindQuantilesTez.exec(FindQuantilesTez.java:35)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:344)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:383)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:355)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:379)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:299)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:301)
>         at org.apache.pig.backend.hadoop.executionengine.tez.POValueOutputTez.getNextTuple(POValueOutputTez.java:141)
>         at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.runPipeline(PigProcessor.java:319)
>         at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.run(PigProcessor.java:198)
>         at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Pig script:
> ========
> set tez.lib.uris '<appropriate_location>'
> set tez.runtime.shuffle.fetch.max.task.output.at.once 2
> set mapreduce.map.output.compress true;
> set mapreduce.map.output.compress.codec 'org.apache.hadoop.io.compress.SnappyCodec';
> set mapred.reduce.child.java.opts '-Xmx1024m';
> A = load '/user/data/studenttab10' as (name, age, gpa);
> B = filter A by age > 20;
> C = group B by name;
> D = foreach C generate group, COUNT(B) PARALLEL 16;
> E = order D by $0 PARALLEL 16;
> F = limit E 10;
> store F into '/user/output/';
> Dataset:
> =======
> katie underhill 44      3.49
> irene thompson  72      3.42
> quinn robinson  50      3.26
> david quirinius 76      0.86
> nick ichabod    32      2.87
> fred ichabod    57      3.95
> fred hernandez  18      2.17
> sarah nixon     21      3.70
> holly ichabod   35      2.91
> fred hernandez  42      2.68



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)