You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Anthony Hsu (JIRA)" <ji...@apache.org> on 2015/02/05 20:34:37 UTC

[jira] [Updated] (PIG-3972) java.lang.IndexOutOfBoundsException when flatten meets an empty row

     [ https://issues.apache.org/jira/browse/PIG-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anthony Hsu updated PIG-3972:
-----------------------------
    Description: 
{code:title=test1.txt}
{(,A1111,A),(,B222,B),(,C333,C)}
{code}
{code:title=test2.txt}
A       Helloworld
B       Pig
C       Hive
{code}
{code:title=tt.pig}
A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
C = JOIN A1 BY id LEFT OUTER, B BY id;
dump C;
{code}
{code}
$ pig -x local -f tt.pig
....
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:604)
        at java.util.ArrayList.get(ArrayList.java:382)
        at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:115)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:350)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:273)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:425)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:416)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:636)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:396)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:441)
      [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failu
re if you want Pig to stop immediately on failure.
==============
{code}
If we change the code into this one:
{code:title=tt.pig}
A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
A1 = FOREACH A1 generate title,name,id;
B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
C = JOIN A1 BY id LEFT OUTER, B BY id;
dump C;
{code}
The job succeed, and here is  the result of execution.
{code}
========
(,A1111,A,A,Helloworld)
(,B222,B,B,Pig)
(,C333,C,C,Hive)
(,,,,)
========
{code}

  was:
$ cat test1.txt

{(,A1111,A),(,B222,B),(,C333,C)}
$ cat test2.txt 
A       Helloworld
B       Pig
C       Hive
$ cat tt.pig
A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
C = JOIN A1 BY id LEFT OUTER, B BY id;
dump C;

$ pig -x local -f tt.pig
....
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:604)
        at java.util.ArrayList.get(ArrayList.java:382)
        at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:115)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:350)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:273)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:425)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:416)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:636)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:396)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:441)
      [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failu
re if you want Pig to stop immediately on failure.
==============

If we change the code into this one:
$ cat tt.pig
A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
A1 = FOREACH A1 generate title,name,id;
B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
C = JOIN A1 BY id LEFT OUTER, B BY id;
dump C;

The job succeed, and here is  the result of execution.
========
(,A1111,A,A,Helloworld)
(,B222,B,B,Pig)
(,C333,C,C,Hive)
(,,,,)
========








> java.lang.IndexOutOfBoundsException when flatten meets an empty row
> -------------------------------------------------------------------
>
>                 Key: PIG-3972
>                 URL: https://issues.apache.org/jira/browse/PIG-3972
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Bing Jiang
>            Assignee: Mona Chitnis
>
> {code:title=test1.txt}
> {(,A1111,A),(,B222,B),(,C333,C)}
> {code}
> {code:title=test2.txt}
> A       Helloworld
> B       Pig
> C       Hive
> {code}
> {code:title=tt.pig}
> A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
> A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
> B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
> C = JOIN A1 BY id LEFT OUTER, B BY id;
> dump C;
> {code}
> {code}
> $ pig -x local -f tt.pig
> ....
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>         at java.util.ArrayList.rangeCheck(ArrayList.java:604)
>         at java.util.ArrayList.get(ArrayList.java:382)
>         at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:115)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:350)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:273)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:425)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:416)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
>         at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:636)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:396)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:441)
>       [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failu
> re if you want Pig to stop immediately on failure.
> ==============
> {code}
> If we change the code into this one:
> {code:title=tt.pig}
> A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
> A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
> A1 = FOREACH A1 generate title,name,id;
> B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
> C = JOIN A1 BY id LEFT OUTER, B BY id;
> dump C;
> {code}
> The job succeed, and here is  the result of execution.
> {code}
> ========
> (,A1111,A,A,Helloworld)
> (,B222,B,B,Pig)
> (,C333,C,C,Hive)
> (,,,,)
> ========
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)