You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "comphead (via GitHub)" <gi...@apache.org> on 2023/04/27 15:03:01 UTC

[GitHub] [arrow-datafusion] comphead opened a new issue, #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

comphead opened a new issue, #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139

   ### Describe the bug
   
   UNION ALL sets the wrong column alias if union arm has aggregate expression, see query below.
   
   ### To Reproduce
   
   ```
   #[tokio::test]
   async fn union_with_aggr_first() -> Result<()> {
       let ctx = create_union_context()?;
       let sql = "select count(1) x from (select 1 a) group by a union all select 1 b;";
       let df = ctx.sql(sql).await.unwrap().repartition(Partitioning::RoundRobinBatch(1)).unwrap();
       let actual = df.collect().await.unwrap();
   
       let expected = vec![
           "+---+",
           "| x |",
           "+---+",
           "| 1 |",
           "| 1 |",
           "+---+",
       ];
       assert_batches_eq!(expected, &actual);
       Ok(())
   }
   ```
   Without group by the query works fine
   ```select count(1) x from (select 1 a)  a union all select 1 b;```
   
   ### Expected behavior
   
   The test should pass
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139#issuecomment-1537220392

   > `Union` is `Union All + Dedup`, so in most DB, Union is combination of `Agg` + `Union`
   
   Thanks, I was expecting DISTINCT in plan, not a aggregation. But DISTINCT may be can considered as partial case of aggregation 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139#issuecomment-1538604267

   > I agree that Distinct would be clearer in the LogicalPlan (though it gets rewritten to Aggregate pretty early on as I recall). Maybe it would be clearer to remove LogicalPlan::Distinct entirely and only use LogicalPlan::Aggregate 🤔
   
   Yes, it's replaced in rule `SingleDistinctToGroupBy`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139#issuecomment-1536917297

   @alamb @jackwener may I ask your help?  I have found why projection is not correct. push_down_projection never called for query like that 
   ```
   select 1 a group by a union all select 2 b
   ```
   
   However its called  for 
   ```
   select 1 a group by a union select 2 b
   ```
   
   The reason for that  UNION has another logical plan
   UNION
   ```
   | logical_plan  | Aggregate: groupBy=[[a]], aggr=[[]]                                                                                  |
   |               |   Union                                                                                                              |
   |               |     Projection: Int64(1) AS a                                                                                        |
   |               |       Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                                                     |
   |               |         EmptyRelation                                                                                                |
   |               |     Projection: Int64(2) AS a                                                                                        |
   |               |       EmptyRelation   
   ```
   
   UNION ALL
   ```
   | logical_plan  | Union                                                                                                        |
   |               |   Projection: Int64(1) AS a                                                                                  |
   |               |     Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                                               |
   |               |       EmptyRelation                                                                                          |
   |               |   Projection: Int64(2) AS b                                                                                  |
   |               |     EmptyRelation      
   ```
   
   UNION have a weird top Aggregate node, which looks strange for me. as Aggregation happens twice in that case, but because of that UNION push_down_projection works.
   
   I'm trying to understand the real rootcause if it in bad plan generated in the first place or bug in push_down_projection
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression
URL: https://github.com/apache/arrow-datafusion/issues/6139


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139#issuecomment-1538573402

   > Thanks, I was expecting DISTINCT in plan, not a aggregation. But DISTINCT may be can considered as partial case of aggregation
   
   I agree that Distinct would be clearer in the LogicalPlan (though it gets rewritten to Aggregate pretty early on as I recall). Maybe it would be clearer to remove `LogicalPlan::Distinct` entirely and only use `LogicalPlan::Aggregate` 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139#issuecomment-1536931121

   the effect of that is another bug with simple union all table creation(without aggr)
   
   ```
   ❯ create table t as select 1 a union all select 1 b;
   Error during planning: Mismatch between schema and batches
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139#issuecomment-1525860329

   Working on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #6139: UNION ALL sets the wrong column alias if union arm has aggregate expression

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #6139:
URL: https://github.com/apache/arrow-datafusion/issues/6139#issuecomment-1537066510

   `Union` is `Union All + Dedup`, so in most DB, Union is combination of `Agg` + `Union`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org