You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jun Di (Jira)" <ji...@apache.org> on 2022/01/12 07:29:00 UTC
[jira] [Comment Edited] (HIVE-25861) When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan
[ https://issues.apache.org/jira/browse/HIVE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474311#comment-17474311 ]
Jun Di edited comment on HIVE-25861 at 1/12/22, 7:28 AM:
---------------------------------------------------------
case when unoptimized ExprNode is
{code:sql}
GenericUDFOPEqual(
GenericUDFWhen(
GenericUDFIn(Column[_col20], Const int 310000, Const int 320000, Const int 330000, Const int 340000),
Const int 310000,
Column[_col20]),
GenericUDFWhen(
GenericUDFIn(Column[_col46], Const int 310000, Const int 320000, Const int 330000, Const int 340000),
Const int 310000,
Column[_col46])
)
{code}
ExprNode optimized by ConstantPropagate for the first time
{code:sql}
GenericUDFWhen(
GenericUDFIn(Column[_col20], Const int 310000, Const int 320000, Const int 330000, Const int 340000),
GenericUDFOPEqual(
Const int 310000,
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 310000, Const int 320000, Const int 330000, Const int 340000), Const int 310000, Column[_col46]))
GenericUDFOPEqual(
Column[_col20],
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 310000, Const int 320000, Const int 330000, Const int 340000), Const int 310000, Column[_col46])))
{code}
But two GenericUDFWhen in GenericUDFOPEqual are the same object
So it resulted in the wrong result when optimized by ConstantPropagate for the second time
!2.png!
was (Author: JIRAUSER283421):
case when unoptimized ExprNode in sql is
{code:sql}
GenericUDFOPEqual(
GenericUDFWhen(
GenericUDFIn(Column[_col20], Const int 310000, Const int 320000, Const int 330000, Const int 340000),
Const int 310000,
Column[_col20]),
GenericUDFWhen(
GenericUDFIn(Column[_col46], Const int 310000, Const int 320000, Const int 330000, Const int 340000),
Const int 310000,
Column[_col46])
)
{code}
ExprNode optimized by ConstantPropagate for the first time
{code:sql}
GenericUDFWhen(
GenericUDFIn(Column[_col20], Const int 310000, Const int 320000, Const int 330000, Const int 340000),
GenericUDFOPEqual(
Const int 310000,
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 310000, Const int 320000, Const int 330000, Const int 340000), Const int 310000, Column[_col46]))
GenericUDFOPEqual(
Column[_col20],
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 310000, Const int 320000, Const int 330000, Const int 340000), Const int 310000, Column[_col46])))
{code}
But two GenericUDFWhen in GenericUDFOPEqual are the same object
So it resulted in the wrong result when optimized by ConstantPropagate for the second time
!2.png!
> When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan
> -------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-25861
> URL: https://issues.apache.org/jira/browse/HIVE-25861
> Project: Hive
> Issue Type: Bug
> Components: Logical Optimizer
> Reporter: Jun Di
> Assignee: Jun Di
> Priority: Critical
> Attachments: 1.png, 2.png
>
>
> when run the following sql:
> {code:sql}
> select
> t1.column_1,
> t2.column_1,
> t1.column_2,
> t1.column_3,
> case
> when (
> case
> when t1.column_1 in (310000, 320000, 330000, 340000)
> then 310000
> else t1.column_1
> end
> ) = (
> case
> when t2.column_1 in (310000, 320000, 330000, 340000)
> then 310000
> else t2.column_1
> end
> )
> then t1.column_2
> else t1.column_3
> end as result
> from
> dim.dim_xmf_center t1
> left join dim.dim_xmf_center t2
> where
> t1.mt = '202201';
> {code}
> t1.column_1 is 440000 and t2.column_1 is 440000 but the result is t1.column_3
> Please see picture 1.png in the attachment for the result
> I found that the case when part of the execution plan is wrong:
> {code:sql}
> CASE WHEN (CASE WHEN ((_col20) IN (310000, 320000, 330000, 340000)) THEN (CASE WHEN ((_col46) IN (310000, 320000, 330000, 340000)) THEN ((true = _col20)) ELSE (((_col46 = 310000) = _col20)) END) ELSE (CASE WHEN ((_col46) IN (310000, 320000, 330000, 340000)) THEN ((true = _col20)) ELSE (((_col46 = 310000) = _col20)) END) END) THEN (_col12) ELSE (_col15) END
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)