You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Viraj Bhat (JIRA)" <ji...@apache.org> on 2009/12/08 01:28:18 UTC

[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines

     [ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-1131:
----------------------------

    Attachment: simplejoinscript.pig
                junk2.txt
                junk1.txt

Dummy datasets and pig script

> Pig simple join does not work when it contains empty lines
> ----------------------------------------------------------
>
>                 Key: PIG-1131
>                 URL: https://issues.apache.org/jira/browse/PIG-1131
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Viraj Bhat
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: junk1.txt, junk2.txt, simplejoinscript.pig
>
>
> I have a simple script, which does a JOIN.
> {code}
> input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
> describe input1;
> input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
> describe input2;
> joineddata = JOIN input1 by $0, input2 by $0;
> describe joineddata;
> store joineddata into 'result';
> {code}
> The input data contains empty lines.  
> The join fails in the Map phase with the following error in the PRLocalRearrange.java
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> 	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> 	at java.util.ArrayList.get(ArrayList.java:322)
> 	at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.