You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sahil Takiar (JIRA)" <ji...@apache.org> on 2016/09/05 16:58:20 UTC

[jira] [Commented] (HIVE-14705) Hive outer queries is not picking up the right column from subqueries

    [ https://issues.apache.org/jira/browse/HIVE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15465410#comment-15465410 ] 

Sahil Takiar commented on HIVE-14705:
-------------------------------------

Details about the fix and the bug are below:

*Background:*

Queries:

Query 1: {{select * from (select a-1 as a from test where a=7) z}}
Query 2: {{select a-1 as a from test where a=7}}
Query 3: {{select * from (select a-1 as a1 from test where a=7) z}}

Constant Propagation:

* Constant Propagation in MySQL: https://dev.mysql.com/doc/internals/en/optimizer-constant-propagation.html
* Expression Folding in MySQL: https://dev.mysql.com/doc/internals/en/optimizer-folding-constants.html
* Constants can only be propagated from a parent to a child, if an operator has no constants inside it and is passed no constants, then the child operator will see no constants

Code:

* The bug + fix is encapsulated within the Constant Propagate Rule in the RBO - specifically the {{ConstantPropagateProcCtx}} and {{ConstantPropagateProcFactory}} classes
* The {{ConstantPropagate}} rule walks through the operator tree and invokes the corresponding method in the {{ConstantPropagateProcFactory}} class for each operator
** For example, if the walker hits a {{FilterOperator}} it invokes the {{ConstantPropagateFilterProc.process}} method - this method is responsible for doing any constant propagation for the given operator
* Each invocation of the {{process}} method is passed in a shared context called {{ConstantPropagateProcCtx}} which contains a map called {{opToConstantExprs}}; this map is important because it tracks a column to constants mapping; it is updated as constants are propagated
* {{ConstantPropagateProcFactory.propagate}} propagates constants inside assingment operators, only {{=}} and {{is null}} are supported
* {{ConstantPropagateProcFactory.foldExprFull}} folds expressions, this essentially evaluates any deterministic UDF operator whose parameters are constants
* {{ConstantPropagateProcFactory.foldOperator}} looks through the list of propagated constants and tries to replace any columns with their constant equivalent

*Stepping Through Query 1:*

* The {{TableScanOperator}} is processed first, but no constant propagation occurs here
* The {{FilterOperator}} will be "folded" via the {{propagate}} method; basically, this means that the expression {{a = 7}} is added to the map {{opToConstantExprs}}
* The {{SelectOperator}} from the sub-query is processed next
** {{ConstantPropagateProcCtx.getPropagatedConstants}} is invoked; this method is responsible for getting all the constants that should be propagated from the parent operators to the current operator, in this case it fetches all the constants from the {{FilterOperator}}
** {{foldOperator}} is invoked; this method will take the constants from the previous step and replace any columns with there new constant values, so in this case {{a}} is replaced with a value of 7
** {{foldExprFull}} is invoked; this method will take the {{a - 1}} clause in the select statement and fold it to a value of 6, it can do this because it knows that {{a}} is now a constant with a value of 7

*The Bug:*

* The bug occurs after expression folding is done in the {{ConstantPropagateSelectProc.process}} method, the code doesn't update the {{opToConstantExprs}} map with the new value of {{a}} (it should update it to 6, but it doesn't so the value remains 7)
* When the walker hits the next {{SelectOperator}} (the one in the outer query), it propagates the value of {{z.a}} as 7, rather than 6

*Why Query 2 Succeeds:*

* The same bug occurs in query 2, but it has no impact because there is only one select operator
* After {{foldExprFull}} completes, the new column value {{6}} is returned, the operator is updated so that the schema reflects the update, but the map {{opToConstantExprs}} is not updated; since this the last relevant operator that is walked by the Constant Propagate Rule it doesn't matter if the map is up to date or not

*Why Query 3 Succeeds:*

* Query 3 works due to an unrelated bug inside the {{ConstantPropagateProcCtx.resolve}} method
* The bug causes the {{ConstantPropagateProcCtx.getPropagatedConstants}} to return an empty list when processing the {{SelectOperator}} in the sub-query
* Since the list returned is empty, the {{opToConstantExprs}} map has no entries for the {{SelectOperator}}, so a failure to update the {{opToConstantExprs}} map doesn't cause any issues
* This bug causes constant propagation to not occur from the inner {{SelectOperator}} to the outer {{SelectOperator}}
** Hive is evaluatoing the inner query first and then selecting all of its results
** It should realize that the {{select *}} clause will always return a value of 6
** This bug will not cause the query to return incorrect results, but it will have a performance impact
** The bug is fixed by HIVE-13602, but the changes are non-trivial, the original approach needs to be revised, so for now I am leaving it as is

> Hive outer queries is not picking up the right column from subqueries
> ---------------------------------------------------------------------
>
>                 Key: HIVE-14705
>                 URL: https://issues.apache.org/jira/browse/HIVE-14705
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Sahil Takiar
>             Fix For: 2.0.1
>
>
> The following queries show the bug:
> *Setup:*
> {code}
> create table test (a int);
> insert into test values (7);
> {code}
> *Produces Wrong Results:*
> {code}
> select * from (select a-1 as a from test where a=7) z;
> +------+--+
> | z.a  |
> +------+--+
> | 7    |
> +------+--+
> {code}
> *Produces Correct Results:*
> {code}
> select * from (select a-1 as a1 from test where a=7) z;
> +-------+--+
> | z.a1  |
> +-------+--+
> | 6     |
> +-------+--+
> {code}
> Note this only happens with subqueries, as the following query returns the correct value of 6 {{select a-1 as a from test where a=7}}
> This affects version 1.1.0 but has been fixed in version 2.1.0 by HIVE-13602



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)