You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/07/09 10:18:05 UTC

[jira] [Commented] (TAJO-926) Join condition including column references of a row-preserving table in left outer join causes incorrect result

    [ https://issues.apache.org/jira/browse/TAJO-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055985#comment-14055985 ] 

ASF GitHub Bot commented on TAJO-926:
-------------------------------------

GitHub user hyunsik opened a pull request:

    https://github.com/apache/tajo/pull/62

    TAJO-926: Join condition including column references of a row-preserving table in left outer join causes incorrect result.

    See the jira issue. 
    https://issues.apache.org/jira/browse/TAJO-926
    
    This patch fixes the bug and adds the unit test which produces the bug.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hyunsik/tajo TAJO-926

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/62.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #62
    
----
commit 25ad6d84de992c124651c7e8d3fda4d5ffc6ac8c
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-09T06:26:46Z

    Fixed bugs of hash outer join operator and wrong PPD against in predicate.

commit 737896d3baaae217d85df152bed89a4bb37bcbd9
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-09T06:27:19Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into HASHOUTER_INPRED_BUG

commit ab8c2ddb8ab26f3c00bc0002e082a611565a48e2
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-09T08:14:44Z

    TAJO-926: Join condition including column references of a row-preserving table in left outer join causes incorrect result. (hyunsik)

----


> Join condition including column references of a row-preserving table in left outer join causes incorrect result
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: TAJO-926
>                 URL: https://issues.apache.org/jira/browse/TAJO-926
>             Project: Tajo
>          Issue Type: Bug
>          Components: physical operator, planner/optimizer
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.9.0
>
>
> This patch fixes two bugs.
> One is wrong projection push down (PPD). See the example, reproducing the bug:
> {noformat}
> select
>   r_name,
>   r_regionkey,
>   n_name,
>   n_regionkey
> from
>   region left outer join nation on n_regionkey = r_regionkey and r_name in ('AMERICA', 'ASIA')
> order by r_name;
> {noformat}
> The above query includes one left outer join (LOJ) and one join filter. Since this join filter {{R_NAME in ('AMERICA', 'ASIA')}} includes column references corresponding to the row preserved table {{region}}, the join filter is placed on the LOJ operator. It only results in the sub expression push down of RowConstantEval and replaces right expression of IN predicate by FieldEval. But, we assume that the RHS of InEval is always RowConstantEval. This is the main clause of this bug.
> {noformat}
> 2014-07-09 16:39:37,527 ERROR: org.apache.tajo.worker.Task (run(395)) - org.apache.tajo.engine.eval.FieldEval cannot be cast to org.apache.tajo.engine.eval.RowConstantEval
> java.lang.ClassCastException: org.apache.tajo.engine.eval.FieldEval cannot be cast to org.apache.tajo.engine.eval.RowConstantEval
> 	at org.apache.tajo.engine.eval.InEval.eval(InEval.java:62)
> 	at org.apache.tajo.engine.eval.BinaryEval.eval(BinaryEval.java:104)
> 	at org.apache.tajo.engine.planner.physical.NLLeftOuterJoinExec.next(NLLeftOuterJoinExec.java:109)
> 	at org.apache.tajo.engine.planner.physical.ExternalSortExec.sortAndStoreAllChunks(ExternalSortExec.java:201)
> 	at org.apache.tajo.engine.planner.physical.ExternalSortExec.next(ExternalSortExec.java:278)
> 	at org.apache.tajo.engine.planner.physical.RangeShuffleFileWriteExec.next(RangeShuffleFileWriteExec.java:99)
> 	at org.apache.tajo.worker.Task.run(Task.java:388)
> 	at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:406)
> 	at java.lang.Thread.run(Thread.java:744)
> 2014-07-09 16:39:37,528 INFO: org.apache.tajo.worker.TaskAttemptContext (setState(115)) - Query status of ta_1404891573341_0004_000003_000000_02 is changed to TA_FAILED
> 2014-07-09 16:39:37,529 INFO: org.apache.tajo.worker.Task (run(452)) - Worker's task counter - total:3, succeeded: 0, killed: 3, failed: 3
> {noformat}
> The second bug is that HashLeftOuterJoin results in wrong result when it has join filter corresponding to row preserved table like the above example query. 
> In order to fix this bug, we have to skip the right iterator of hash table when if the joined tuple is filtered.
> Expected:
> {noformat}
> r_name,r_regionkey,n_name,n_regionkey
> -------------------------------
> AFRICA,0,null,null
> AMERICA,1,ARGENTINA,1
> AMERICA,1,BRAZIL,1
> AMERICA,1,CANADA,1
> AMERICA,1,PERU,1
> AMERICA,1,UNITED STATES,1
> ASIA,2,INDIA,2
> ASIA,2,INDONESIA,2
> ASIA,2,JAPAN,2
> ASIA,2,CHINA,2
> ASIA,2,VIETNAM,2
> EUROPE,3,null,null
> MIDDLE EAST,4,null,null
> {noformat}
> Actual result:
> {noformat}
> r_name,r_regionkey,n_name,n_regionkey
> -------------------------------
> AFRICA,0,null,null
> AFRICA,0,null,null
> AFRICA,0,null,null
> AFRICA,0,null,null
> AFRICA,0,null,null
> AMERICA,1,ARGENTINA,1
> AMERICA,1,BRAZIL,1
> AMERICA,1,CANADA,1
> AMERICA,1,PERU,1
> AMERICA,1,UNITED STATES,1
> ASIA,2,INDIA,2
> ASIA,2,INDONESIA,2
> ASIA,2,JAPAN,2
> ASIA,2,CHINA,2
> ASIA,2,VIETNAM,2
> EUROPE,3,null,null
> EUROPE,3,null,null
> EUROPE,3,null,null
> EUROPE,3,null,null
> EUROPE,3,null,null
> MIDDLE EAST,4,null,null
> MIDDLE EAST,4,null,null
> MIDDLE EAST,4,null,null
> MIDDLE EAST,4,null,null
> MIDDLE EAST,4,null,null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)