You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/10/29 20:05:00 UTC

[jira] [Assigned] (IMPALA-7753) Rewrite engine ignores top-level expressions in ORDER BY clause

     [ https://issues.apache.org/jira/browse/IMPALA-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers reassigned IMPALA-7753:
-----------------------------------

       Assignee:     (was: Paul Rogers)
    Description: 
The select statement represents the ORDER BY clause in two distinct ways. First, there is a list of the "original" ordering expressions, {{orderByElements_}}. Second, there is an analyzed list in {{sortInfo_}}. The explanation is:

{code:java}
      // create copies, we don't want to modify the original parse node, in case
      // we need to print it
{code}

Later, we apply rewrite rules to the {{ORDER BY}} expression, but we do so using the original version, not the copy:

{code:java}
      for (OrderByElement orderByElem: orderByElements_) {
        orderByElem.setExpr(rewriteCheckOrdinalResult(rewriter, orderByElem.getExpr()));
      }
{code}

The result is that we apply rewrite rules to expressions which have not been analyzed, triggering the assertion mentioned above. This assertion is telling us something: we skipped a step. Here, it turns out we are rewriting the wrong set of expressions. Modifying the code to rewrite those in {{sortInfo_}} solves the problem. The current behavior is a bug as the rewrites currently do nothing, and the expressions we thought we were rewriting are never touched.

The correct code would rewrite the expressions which are actually used when analyzing the query:

{code}    if (orderByElements_ != null) {
      List<Expr> sortExprs = sortInfo_.getSortExprs();
      for (int i = 0; i < sortExprs.size(); i++) {
        sortExprs.set(i, rewriteCheckOrdinalResult(rewriter, sortExprs.get(i)));
      }
{code}

We can, in addition, ask a more basic question: do we even need to do rewrites for {{ORDER BY}} expressions? The only valid expressions are column references, aren't they? Or, does Impala allow expressions in the {{ORDER BY}} clause?

Here is the result of a {{PlannerTest}} run that shows how the bug affects the conditional function rewrite:

{noformat}
07:AGGREGATE [FINALIZE]
|  output: avg(sum(t1.id)), sum(avg(g)), count(id)
|  group by: if(TupleIsNull(), NULL, CASE WHEN int_col IS NOT NULL THEN int_col ELSE 20 END)
|
06:ANALYTIC
|  functions: avg(if(TupleIsNull(), NULL, CASE WHEN id + bigint_col IS NOT NULL THEN id + bigint_col ELSE 40 END))
|  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN bigint_col ELSE 30 END) ASC
|  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
05:SORT
|  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN bigint_col ELSE 30 END) ASC
{noformat}

Note that the top-level ifs are not rewritten, but the nested coalesce’s are rewritten to CASE.

Because this code rewrites the wrong list, the rewrite rules contain if-statements to check for un-analyzed nodes. In actuality, no node should be un-analyzed when passing through the rewrite engine.

  was:
The select statement represents the ORDER BY clause in two distinct ways. First, there is a list of the "original" ordering expressions, {{orderByElements_}}. Second, there is an analyzed list in {{sortInfo_}}. The explanation is:

{code:java}
      // create copies, we don't want to modify the original parse node, in case
      // we need to print it
{code}

Later, we apply rewrite rules to the {{ORDER BY}} expression, but we do so using the original version, not the copy:

{code:java}
      for (OrderByElement orderByElem: orderByElements_) {
        orderByElem.setExpr(rewriteCheckOrdinalResult(rewriter, orderByElem.getExpr()));
      }
{code}

The result is that we apply rewrite rules to expressions which have not been analyzed, triggering the assertion mentioned above. This assertion is telling us something: we skipped a step. Here, it turns out we are rewriting the wrong set of expressions. Modifying the code to rewrite those in {{sortInfo_}} solves the problem. The current behavior is a bug as the rewrites currently do nothing, and the expressions we thought we were rewriting are never touched.

The correct code would rewrite the expressions which are actually used when analyzing the query:

{code}    if (orderByElements_ != null) {
      List<Expr> sortExprs = sortInfo_.getSortExprs();
      for (int i = 0; i < sortExprs.size(); i++) {
        sortExprs.set(i, rewriteCheckOrdinalResult(rewriter, sortExprs.get(i)));
      }
{code}

We can, in addition, ask a more basic question: do we even need to do rewrites for {{ORDER BY}} expressions? The only valid expressions are column references, aren't they? Or, does Impala allow expressions in the {{ORDER BY}} clause?

        Summary: Rewrite engine ignores top-level expressions in ORDER BY clause  (was: Invalid logic when rewriting ORDER BY clause expressions)

> Rewrite engine ignores top-level expressions in ORDER BY clause
> ---------------------------------------------------------------
>
>                 Key: IMPALA-7753
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7753
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> The select statement represents the ORDER BY clause in two distinct ways. First, there is a list of the "original" ordering expressions, {{orderByElements_}}. Second, there is an analyzed list in {{sortInfo_}}. The explanation is:
> {code:java}
>       // create copies, we don't want to modify the original parse node, in case
>       // we need to print it
> {code}
> Later, we apply rewrite rules to the {{ORDER BY}} expression, but we do so using the original version, not the copy:
> {code:java}
>       for (OrderByElement orderByElem: orderByElements_) {
>         orderByElem.setExpr(rewriteCheckOrdinalResult(rewriter, orderByElem.getExpr()));
>       }
> {code}
> The result is that we apply rewrite rules to expressions which have not been analyzed, triggering the assertion mentioned above. This assertion is telling us something: we skipped a step. Here, it turns out we are rewriting the wrong set of expressions. Modifying the code to rewrite those in {{sortInfo_}} solves the problem. The current behavior is a bug as the rewrites currently do nothing, and the expressions we thought we were rewriting are never touched.
> The correct code would rewrite the expressions which are actually used when analyzing the query:
> {code}    if (orderByElements_ != null) {
>       List<Expr> sortExprs = sortInfo_.getSortExprs();
>       for (int i = 0; i < sortExprs.size(); i++) {
>         sortExprs.set(i, rewriteCheckOrdinalResult(rewriter, sortExprs.get(i)));
>       }
> {code}
> We can, in addition, ask a more basic question: do we even need to do rewrites for {{ORDER BY}} expressions? The only valid expressions are column references, aren't they? Or, does Impala allow expressions in the {{ORDER BY}} clause?
> Here is the result of a {{PlannerTest}} run that shows how the bug affects the conditional function rewrite:
> {noformat}
> 07:AGGREGATE [FINALIZE]
> |  output: avg(sum(t1.id)), sum(avg(g)), count(id)
> |  group by: if(TupleIsNull(), NULL, CASE WHEN int_col IS NOT NULL THEN int_col ELSE 20 END)
> |
> 06:ANALYTIC
> |  functions: avg(if(TupleIsNull(), NULL, CASE WHEN id + bigint_col IS NOT NULL THEN id + bigint_col ELSE 40 END))
> |  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN bigint_col ELSE 30 END) ASC
> |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> |
> 05:SORT
> |  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN bigint_col ELSE 30 END) ASC
> {noformat}
> Note that the top-level ifs are not rewritten, but the nested coalesce’s are rewritten to CASE.
> Because this code rewrites the wrong list, the rewrite rules contain if-statements to check for un-analyzed nodes. In actuality, no node should be un-analyzed when passing through the rewrite engine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org