You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "Jefffrey (via GitHub)" <gi...@apache.org> on 2023/02/11 11:30:44 UTC

[GitHub] [arrow-datafusion] Jefffrey opened a new issue, #5251: SQL GROUP BY doesn't do ambiguity check

Jefffrey opened a new issue, #5251:
URL: https://github.com/apache/arrow-datafusion/issues/5251

   **Describe the bug**
   A clear and concise description of what the bug is.
   
   In SQL when GROUP BY an ambiguous column, it doesn't return an error.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   ```sql
   ❯ select * from test1 t1 join test2 t2 using (a);
   +---+---+---+---+---+
   | a | b | c | b | c |
   +---+---+---+---+---+
   | 1 | 2 | 3 | 2 | 3 |
   | 4 | 5 | 6 | 5 | 6 |
   +---+---+---+---+---+
   2 rows in set. Query took 0.013 seconds.
   ❯ select max(a) from test1 t1 join test2 t2 using (a) group by c;
   +-----------+
   | MAX(t1.a) |
   +-----------+
   | 1         |
   | 4         |
   +-----------+
   2 rows in set. Query took 0.014 seconds.
   ❯ explain select max(a) from test1 t1 join test2 t2 using (a) group by c;
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                                                         |
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: MAX(t1.a)                                                                                                                                        |
   |               |   Aggregate: groupBy=[[t1.c]], aggr=[[MAX(t1.a)]]                                                                                                            |
   |               |     Inner Join: Using t1.a = t2.a                                                                                                                            |
   |               |       SubqueryAlias: t1                                                                                                                                      |
   |               |         TableScan: test1 projection=[a, c]                                                                                                                   |
   |               |       SubqueryAlias: t2                                                                                                                                      |
   |               |         TableScan: test2 projection=[a]                                                                                                                      |
   | physical_plan | ProjectionExec: expr=[MAX(t1.a)@1 as MAX(t1.a)]                                                                                                              |
   |               |   AggregateExec: mode=FinalPartitioned, gby=[c@0 as c], aggr=[MAX(t1.a)]                                                                                     |
   |               |     CoalesceBatchesExec: target_batch_size=8192                                                                                                              |
   |               |       RepartitionExec: partitioning=Hash([Column { name: "c", index: 0 }], 12), input_partitions=12                                                          |
   |               |         AggregateExec: mode=Partial, gby=[c@1 as c], aggr=[MAX(t1.a)]                                                                                        |
   |               |           CoalesceBatchesExec: target_batch_size=8192                                                                                                        |
   |               |             HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: "a", index: 0 }, Column { name: "a", index: 0 })]                           |
   |               |               CoalesceBatchesExec: target_batch_size=8192                                                                                                    |
   |               |                 RepartitionExec: partitioning=Hash([Column { name: "a", index: 0 }], 12), input_partitions=12                                                |
   |               |                   RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                      |
   |               |                     CsvExec: files={1 group: [[home/jeffrey/Code/arrow-datafusion/datafusion-cli/test.csv]]}, has_header=true, limit=None, projection=[a, c] |
   |               |               CoalesceBatchesExec: target_batch_size=8192                                                                                                    |
   |               |                 RepartitionExec: partitioning=Hash([Column { name: "a", index: 0 }], 12), input_partitions=12                                                |
   |               |                   RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1                                                                      |
   |               |                     CsvExec: files={1 group: [[home/jeffrey/Code/arrow-datafusion/datafusion-cli/test.csv]]}, has_header=true, limit=None, projection=[a]    |
   |               |                                                                                                                                                              |
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.007 seconds.
   ```
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   Should return error
   
   **Additional context**
   Add any other context about the problem here.
   
   The same query on latest Postgres raises an error about the ambiguity
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #5251: SQL GROUP BY doesn't do ambiguity check

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb closed issue #5251: SQL GROUP BY doesn't do ambiguity check
URL: https://github.com/apache/arrow-datafusion/issues/5251


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Jefffrey commented on issue #5251: SQL GROUP BY doesn't do ambiguity check

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.

Jefffrey commented on issue #5251:
URL: https://github.com/apache/arrow-datafusion/issues/5251#issuecomment-1455053176

   I plan to resolve this as part of https://github.com/apache/arrow-datafusion/issues/5211


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org