You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "gruuya (via GitHub)" <gi...@apache.org> on 2023/11/20 13:25:08 UTC

[I] `rebase_expr` should probably use `transform_down` and not `transform_up` [arrow-datafusion]

gruuya opened a new issue, #8280:
URL: https://github.com/apache/arrow-datafusion/issues/8280

   ### Describe the bug
   
   While working with `first_value` aggregation function I noticed what seems to be a minute, albeit general, deficiency in the `rebase_expr` function, which in the case below leads to an error.
   
   Namely, when re-basing an expression against a list of provided base expressions transformations are done in a post-order fashion:
   https://github.com/apache/arrow-datafusion/blob/2156dde54623d26635d4388d161d94ac79918cdc/datafusion/sql/src/utils.rs#L64-L69
   
   The issue below arises when this list of base expressions contains overlapping elements, e.g. the first base expression (group by clause) is a sub-expression of the second base expression (the `first_value` aggregation). Since the transformation is done inside-out, once the first sub-expression is columnized, the original expression gets changed and so it can't be re-based anymore (even when the original root expression is also an available base expression as is the case below).
   
   However, even regardless of that error, I think it could be argued that `rebase_expr` should perform pre-order traversal in general:
   1. `rebase_expr` should strive to re-base on the outermost available base expression in order to minimize (however negligible) the computation that needs to be done on top of it. The only reservation I have here is if somehow there is some logic that relies on repeated inside-out columnization which can only be achieved via `transform_up` (this doesn't seem to be the case judging by the tests).
   2. While `transform_up` transforms the current sub-expression only after visiting it (and it's children), `transform_down` transforms it before traversing any child expressions. Since the transformation involved is columnization, which effectively prunes that branch of the expression tree, this leads to strictly less-or-equal number of node visits than `transform_up`.
   
   
   ### To Reproduce
   
   ```sql
   ❯ create table t as values ('a', 1, 'a1'), ('a', 2, 'a2'), ('b', 1, 'b1'), ('b', 2, 'b2');
   0 rows in set. Query took 0.023 seconds.
   
   ❯ select first_value(column3 order by ascii(column1) * (column2 % 2)) from t group by ascii(column1);
   Error during planning: Projection references non-aggregate values: Expression t.column3 could not be resolved from available columns: ascii(t.column1), FIRST_VALUE(t.column3) ORDER BY [ascii(t.column1) * t.column2 % Int64(2) ASC NULLS LAST]
   ```
   
   ### Expected behavior
   
   ```sql
   ❯ create table t as values ('a', 1, 'a1'), ('a', 2, 'a2'), ('b', 1, 'b1'), ('b', 2, 'b2');
   0 rows in set. Query took 0.019 seconds.
   
   ❯ select first_value(column3 order by ascii(column1) * (column2 % 2)) from t group by ascii(column1);
   +------------------------------------------------------------------------------------------+
   | FIRST_VALUE(t.column3) ORDER BY [ascii(t.column1) * t.column2 % Int64(2) ASC NULLS LAST] |
   +------------------------------------------------------------------------------------------+
   | a2                                                                                       |
   | b2                                                                                       |
   +------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.034 seconds.
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] `rebase_expr` should probably use `transform_down` and not `transform_up` [arrow-datafusion]

Posted by "wizardxz (via GitHub)" <gi...@apache.org>.
wizardxz commented on issue #8280:
URL: https://github.com/apache/arrow-datafusion/issues/8280#issuecomment-1895127320

   I found similar issue,
   ❯ SELECT a + 1 AS d, a + 1 + b AS c FROM (SELECT 1 AS a, 2 AS b) GROUP BY a + 1, a + 1 + b;
   Error during planning: Projection references non-aggregate values: Expression b could not be resolved from available columns: a + Int64(1), a + Int64(1) + b
   
   because we are transform_up in rebase_expr()
   a+1+b is split to a+1 and b, then we found b is not in group list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] `rebase_expr` should probably use `transform_down` and not `transform_up` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #8280: `rebase_expr` should probably use `transform_down` and not `transform_up`
URL: https://github.com/apache/arrow-datafusion/issues/8280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org