You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/06/27 22:55:47 UTC

[jira] [Commented] (PIG-2144) ClassCastException when using IsEmpty(DIFF())

    [ https://issues.apache.org/jira/browse/PIG-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055752#comment-13055752 ] 

Thejas M Nair commented on PIG-2144:
------------------------------------

The bug is in LogicalExpressionSimplifier optimization rule, it is doing a wrong transformation and adding a NOT operator to output of DIFF.
The logical plan - 
{code}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
B: (Name: LOStore Schema: null)
|
|---B: (Name: LOFilter Schema: null)
    |   |
    |   (Name: Not Type: boolean Uid: 12)
    |   |
    |   |---(Name: UserFunc(org.apache.pig.builtin.IsEmpty) Type: boolean Uid: 9)
    |       |
    |       |---(Name: Not Type: boolean Uid: 11)
    |           |
    |           |---(Name: UserFunc(org.apache.pig.builtin.DIFF) Type: bag Uid: 8)
    |               |
    |               |---(Name: Project Type: bytearray Uid: 6 Input: 0 Column: 0)
    |               |
    |               |---(Name: Project Type: bytearray Uid: 7 Input: 0 Column: 1)
    |
    |---A: (Name: LOLoad Schema: null)RequiredFields:null

{code}
Note: In the explain command, the logical optimizer is also called twice, which seems to eliminate the Not in the logical plan that is printed. I made local changes to code to find the actual logical plan gets used for generating the physical plan (pasted above). Will make changes to ensure that logical optimizer runs only once in the explain command, as part of this patch.



> ClassCastException when using IsEmpty(DIFF()) 
> ----------------------------------------------
>
>                 Key: PIG-2144
>                 URL: https://issues.apache.org/jira/browse/PIG-2144
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mitesh Singh Jat
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>
> I have following input <name>:<nickname>, for which I want to find records where name is different from nickname.
> {code:title=input/name_nickname.txt}
> Bharat:Bharat
> Amita:Amita
> Mitesh:Mitesh
> Reenu:Anshu
> Shikha:Shikhu
> Shilpa:Shilpi
> {code}
> I have following script to find records where name is different from nickname.
> {code:title=isEmpty_diff.pig}
> A = LOAD 'input/name_nickname.txt' using PigStorage(':');
> B = FILTER A BY NOT IsEmpty(DIFF($0, $1));
> DUMP B;
> {code}
> The above pig script works with older pig versions (e.g. 0.8.0 (r1043805)) and gives following output
> {code:title=output of isEmpty_diff.pig}
> (Reenu,Anshu)
> (Shikha,Shikhu)
> (Shilpa,Shilpi)
> {code}
> However, the above pig script (isEmpty_diff.pig) fails on Pig 0.9 (e.g. 0.9.0.1105251322 (r1127671)) and newer version of Pig 0.8 (e.g. version 0.8.0.1105131316 (r1102885)) , with ClassCastException
> {code:title=ClassCastException}
> java.lang.ClassCastException: org.apache.pig.data.DefaultDataBag cannot be cast to java.lang.Boolean
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:75)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:318)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:269)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>         at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:676)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> {code}
> As a workaround, I used the following pig script.
> {code:titlee=isEmpty_diff2.pig}
> A = LOAD 'input/name_nickname.txt' using PigStorage(':');
> --B = FILTER A BY NOT IsEmpty(DIFF($0, $1));
> B1 = FOREACH A GENERATE $0, $1, DIFF($0, $1);
> B2 = FILTER B1 BY NOT IsEmpty($2);
> B = FOREACH B2 GENERATE $0, $1;
> DUMP B;
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira