You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/27 16:24:49 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2360: Cannot group by `substr` expression

andygrove opened a new issue, #2360:
URL: https://github.com/apache/arrow-datafusion/issues/2360

   **Describe the bug**
   See repro case below.
   
   **To Reproduce**
   
   Add this test to `datafusion/core/tests/sql/group_by.rs`.
   
   ``` rust
   #[tokio::test]
   async fn csv_query_group_by_substr() -> Result<()> {
       let ctx = SessionContext::new();
       register_aggregate_csv(&ctx).await?;
       let sql = "SELECT substr(c1, 0, 1), avg(c12) \
           FROM aggregate_test_100 \
           GROUP BY substr(c1, 0, 1) \
           ORDER BY substr(c1, 0, 1)";
       let actual = execute_to_batches(&ctx, sql).await;
       let expected = vec![
           "+----+-----------------------------+",
           "| c1 | AVG(aggregate_test_100.c12) |",
           "+----+-----------------------------+",
           "| a  | 0.48754517466109415         |",
           "| b  | 0.41040709263815384         |",
           "| c  | 0.6600456536439784          |",
           "| d  | 0.48855379387549824         |",
           "| e  | 0.48600669271341534         |",
           "+----+-----------------------------+",
       ];
       assert_batches_sorted_eq!(expected, &actual);
       Ok(())
   }
   ```
   
   Fails with:
   
   ```
   No field named 'aggregate_test_100.c1'. Valid fields are 'substr(aggregate_test_100.c1,Int64(0),Int64(1))', 'AVG(aggregate_test_100.c12)'.
   ```
   
   **Expected behavior**
   This should not fail.
   
   **Additional context**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2360: Cannot have `order by` expression that references complex `group by` expression

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2360:
URL: https://github.com/apache/arrow-datafusion/issues/2360#issuecomment-1119923765

   I updated this issue. Some of the original bugs are now fixed but this still fails and I am looking into it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed issue #2360: Cannot have `order by` expression that references complex `group by` expression

Posted by GitBox <gi...@apache.org>.
andygrove closed issue #2360: Cannot have `order by` expression that references complex `group by` expression
URL: https://github.com/apache/arrow-datafusion/issues/2360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2360: Cannot group by `substr` expression

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2360:
URL: https://github.com/apache/arrow-datafusion/issues/2360#issuecomment-1111385381

   I experiemented a bit and was able to work around it with this ugly hack in `to_arrays` only to see the same error occur later on during physical planning. I will come back to this later and review how we are generally dealing with this type of logic and find a better solution.
   
   ```
   -            let data_type = e.get_type(input.schema())?;
   +            // ugly hack for https://github.com/apache/arrow-datafusion/issues/2360
   +            let input_schema = input.schema();
   +            let data_type = match e {
   +                Expr::Sort { expr, .. } => {
   +                    let name = expr.name(input_schema.as_ref()).unwrap();
   +                    match input_schema.field_with_name(None, &name) {
   +                        Ok(x) => x.data_type().clone(),
   +                        _ => e.get_type(input_schema)?,
   +                    }
   +                }
   +                _ => e.get_type(input_schema)?
   +            };
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org