You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/03 15:10:53 UTC

[GitHub] [arrow-datafusion] comphead opened a new issue, #2430: Group by incorrectly works with alias.

comphead opened a new issue, #2430:
URL: https://github.com/apache/arrow-datafusion/issues/2430

   **Describe the bug**
   Group by incorrectly works with alias. The query fails with "Projection references non-aggregate values"
   `
   
   **To Reproduce**
   ```
   async fn csv_query_group_by_substr() -> Result<()> {
       let ctx = SessionContext::new();
       register_aggregate_csv(&ctx).await?;
       let sql = "SELECT substr(c1, 1, 1) c1 \
           FROM aggregate_test_100 \
           GROUP BY substr(c1, 1, 1) \
           ";
       let actual = execute_to_batches(&ctx, sql).await;
       let expected = vec![
           "+----+-----------------------------+",
           "| c1 | AVG(aggregate_test_100.c12) |",
           "+----+-----------------------------+",
           "| a  | 0.48754517466109415         |",
           "| b  | 0.41040709263815384         |",
           "| c  | 0.6600456536439784          |",
           "| d  | 0.48855379387549824         |",
           "| e  | 0.48600669271341534         |",
           "+----+-----------------------------+",
       ];
       assert_batches_sorted_eq!(expected, &actual);
       Ok(())
   }
   ```
   **Expected behavior**
   Test should pass
   
   **Additional context**
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #2430: Group by incorrectly works with alias.

Posted by GitBox <gi...@apache.org>.
comphead commented on issue #2430:
URL: https://github.com/apache/arrow-datafusion/issues/2430#issuecomment-1117138434

   2 bugs here.
   - `resolve_aliases_to_exprs(&group_by_expr, &alias_map)?;` which does wrong resolving and breaks expr to `substr(substr(aggregate_test_100.c1,Int64(1),Int64(1)),Int64(1),Int64(1))` instead of `substr(aggregate_test_100.c1,Int64(1),Int64(1))`
   - `check_columns_satisfy_exprs` that gets confused because of alias the same as col name in agg expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed issue #2430: Group by incorrectly works with alias.

Posted by GitBox <gi...@apache.org>.
andygrove closed issue #2430: Group by incorrectly works with alias. 
URL: https://github.com/apache/arrow-datafusion/issues/2430


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #2430: Group by incorrectly works with alias.

Posted by GitBox <gi...@apache.org>.
comphead commented on issue #2430:
URL: https://github.com/apache/arrow-datafusion/issues/2430#issuecomment-1119374656

   Thank you, I was trying to solve it in planner but your proposed solution looks more robust! Looking forward to test it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #2430: Group by incorrectly works with alias.

Posted by GitBox <gi...@apache.org>.
comphead commented on issue #2430:
URL: https://github.com/apache/arrow-datafusion/issues/2430#issuecomment-1117140190

   > @comphead I have assigned this to you. I have spent some time working on this and keep running into problems with the way we try and compute names and determine type information based on an expression and a schema. I will be interested to see what you find.
   > 
   > I am now going to start working on #1468 and see where that takes me. If the expressions carry their names with them then perhaps it helps with some of these issues? I am still learning and exploring ...
   
   @andygrove I'm not sure this will solve all the problems. Currently the test like `select count(1), count(1) from t` will fail because the name gets generated based on colname + function + types. You may want include some increment like position, or reference to current query block hashcode, something like that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2430: Group by incorrectly works with alias.

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2430:
URL: https://github.com/apache/arrow-datafusion/issues/2430#issuecomment-1118946092

   @comphead I have the test passing in the WIP PR https://github.com/apache/arrow-datafusion/pull/2457
   
   I ran into the solution whilst working on ROLLUP support


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #2430: Group by incorrectly works with alias.

Posted by GitBox <gi...@apache.org>.
comphead commented on issue #2430:
URL: https://github.com/apache/arrow-datafusion/issues/2430#issuecomment-1116214112

   @andygrove can this be assigned to me?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2430: Group by incorrectly works with alias.

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2430:
URL: https://github.com/apache/arrow-datafusion/issues/2430#issuecomment-1116276369

   @comphead I have assigned this to you. I have spent some time working on this and keep running into problems with the way we try and compute names and determine type information based on an expression and a schema. I will be interested to see what you find.
   
   I am now going to start working on https://github.com/apache/arrow-datafusion/issues/1468 and see where that takes me. If the expressions carry their names with them then perhaps it helps with some of these issues? I am still learning and exploring ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org