You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/25 10:51:23 UTC

[GitHub] [arrow-datafusion] gandronchik opened a new pull request, #2334: fix: union schema

gandronchik opened a new pull request, #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334

   fix UNION schema
   
   example:
   SELECT 1 a UNION ALL SELECT 2;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
jackwener commented on code in PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#discussion_r857510834


##########
datafusion/core/src/execution/context.rs:
##########
@@ -3243,7 +3243,7 @@ mod tests {
             "SELECT cat, SUM(i) AS total FROM (
                     SELECT i, 'a' AS cat FROM catalog_a.schema_a.table_a
                     UNION ALL
-                    SELECT i, 'b' AS cat FROM catalog_b.schema_b.table_b

Review Comment:
   Got it. Thanksā¤.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#discussion_r864394949


##########
datafusion/core/src/execution/context.rs:
##########
@@ -3243,7 +3243,7 @@ mod tests {
             "SELECT cat, SUM(i) AS total FROM (
                     SELECT i, 'a' AS cat FROM catalog_a.schema_a.table_a
                     UNION ALL
-                    SELECT i, 'b' AS cat FROM catalog_b.schema_b.table_b

Review Comment:
   @gandronchik  -- What do you think about changing how this code is tested?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#discussion_r858034480


##########
datafusion/core/src/execution/context.rs:
##########
@@ -3243,7 +3243,7 @@ mod tests {
             "SELECT cat, SUM(i) AS total FROM (
                     SELECT i, 'a' AS cat FROM catalog_a.schema_a.table_a
                     UNION ALL
-                    SELECT i, 'b' AS cat FROM catalog_b.schema_b.table_b

Review Comment:
   I would recommend leaving this test alone and instead adding a new test, perhaps in https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sql/union.rs using the reproducer from https://github.com/apache/arrow-datafusion/issues/2083
   
   That way it is clear we haven't changed the coverage of this test and there is a more focused tests for just this particular change



##########
datafusion/core/src/logical_plan/builder.rs:
##########
@@ -1055,19 +1055,23 @@ pub fn union_with_alias(
     right_plan: LogicalPlan,
     alias: Option<String>,
 ) -> Result<LogicalPlan> {
-    let inputs = vec![left_plan, right_plan]
+    let union_schema = left_plan.schema().clone();
+    let inputs_iter = vec![left_plan, right_plan]

Review Comment:
   I wonder if there is a reason you needed to make changes below this line (other than to use `union_schema` rather than `schema` when creating the `Projection`
   
   In particular, I wonder if we can avoid the extra call to `clone()` by going back to the previous setup?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#issuecomment-1114965577

   Hi @gandronchik Do you have time to address the PR feedback here so that we can get your change approved + merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove merged pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
andygrove merged PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
jackwener commented on PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#issuecomment-1108423115

   It's better to link the related issue if it exists.
   
   If it doesn't exist, I think it would be better to create it and link it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] gandronchik commented on a diff in pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
gandronchik commented on code in PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#discussion_r857509283


##########
datafusion/core/src/execution/context.rs:
##########
@@ -3243,7 +3243,7 @@ mod tests {
             "SELECT cat, SUM(i) AS total FROM (
                     SELECT i, 'a' AS cat FROM catalog_a.schema_a.table_a
                     UNION ALL
-                    SELECT i, 'b' AS cat FROM catalog_b.schema_b.table_b

Review Comment:
   to cover case when columns names in union inputs are not the same.
   
   example:
   SELECT 1 a UNION ALL SELECT 2;



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
jackwener commented on code in PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#discussion_r857507492


##########
datafusion/core/src/execution/context.rs:
##########
@@ -3243,7 +3243,7 @@ mod tests {
             "SELECT cat, SUM(i) AS total FROM (
                     SELECT i, 'a' AS cat FROM catalog_a.schema_a.table_a
                     UNION ALL
-                    SELECT i, 'b' AS cat FROM catalog_b.schema_b.table_b

Review Comment:
   Why we need to change it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] gandronchik commented on a diff in pull request #2334: fix: union schema

Posted by GitBox <gi...@apache.org>.
gandronchik commented on code in PR #2334:
URL: https://github.com/apache/arrow-datafusion/pull/2334#discussion_r863629908


##########
datafusion/core/src/logical_plan/builder.rs:
##########
@@ -1055,19 +1055,23 @@ pub fn union_with_alias(
     right_plan: LogicalPlan,
     alias: Option<String>,
 ) -> Result<LogicalPlan> {
-    let inputs = vec![left_plan, right_plan]
+    let union_schema = left_plan.schema().clone();
+    let inputs_iter = vec![left_plan, right_plan]

Review Comment:
   Fixed. We still have to clone it to check schema compatibility, however, now the scope is in iteration only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org