You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mustafasrepo (via GitHub)" <gi...@apache.org> on 2023/10/24 13:13:42 UTC

[PR] Cleanup logical optimizer rules. [arrow-datafusion]

mustafasrepo opened a new pull request, #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   ## Rationale for this change
   While working on another PR. I noticed that, some of the Logical plan rules doesn't use approriate constructors when re-creating a LogicalPlan node. Because of this reason, during rules we have schema mismatches, that are unnoticed. This PR solves this problem.
   
   To prevent schema mismatches, I use `try_new` API, instead of `try_new_with_schema` API. Because of this change we receive schema mismatch errors during LogicalPlan optimization(These were bugs, that were unnoticed, because of the use of wrong schema). This PR includes bug fixes that are the cause of schema mismatch.
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ## What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ## Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   Existing tests, should work
   
   ## Are there any user-facing changes?
   No
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1372132078


##########
datafusion/common/src/dfschema.rs:
##########
@@ -444,6 +444,14 @@ impl DFSchema {
                         .zip(iter2)
                         .all(|((t1, f1), (t2, f2))| t1 == t2 && Self::field_is_semantically_equal(f1, f2))
             }
+            (

Review Comment:
   FYI @viirya this may be of interest to you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak merged PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1370363265


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -2051,21 +1991,4 @@ mod tests {
 
         Ok(())
     }
-
-    #[test]
-    fn test_get_updated_id_keys() {

Review Comment:
   This test moved under functional dependencies.rs file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1370363265


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -2051,21 +1991,4 @@ mod tests {
 
         Ok(())
     }
-
-    #[test]
-    fn test_get_updated_id_keys() {

Review Comment:
   This test moved under functional dependence file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1370360262


##########
datafusion/common/src/dfschema.rs:
##########
@@ -444,6 +444,14 @@ impl DFSchema {
                         .zip(iter2)
                         .all(|((t1, f1), (t2, f2))| t1 == t2 && Self::field_is_semantically_equal(f1, f2))
             }
+            (

Review Comment:
   During schema check, we were missing out these cases as equal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1370239945


##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -315,6 +320,72 @@ fn function_to_name() -> &'static HashMap<BuiltinScalarFunction, &'static str> {
     })
 }
 
+/// Returns the wider type among lhs and rhs.
+/// Wider type is the type that can safely represent the other type without information loss.
+/// Returns Error if types are incompatible.
+fn get_wider_type(lhs: &DataType, rhs: &DataType) -> Result<DataType> {

Review Comment:
   Would it make sense to move this function into the `expr::type_coercion` module, where we have similar functions, such as `get_wider_decimal_type`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#issuecomment-1779308412

   Everyone seems to be busy so I will go ahead and merge this. If we somehow overlooked any issue(s), we will quickly address with a follow on PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1370366359


##########
datafusion/expr/src/tree_node/expr.rs:
##########
@@ -47,8 +48,19 @@ impl TreeNode for Expr {
             | Expr::TryCast(TryCast { expr, .. })
             | Expr::Sort(Sort { expr, .. })
             | Expr::InSubquery(InSubquery{ expr, .. }) => vec![expr.as_ref().clone()],
-            Expr::GetIndexedField(GetIndexedField { expr, .. }) => {
-                vec![expr.as_ref().clone()]
+            Expr::GetIndexedField(GetIndexedField { expr, field }) => {
+                let expr = expr.as_ref().clone();
+                match field {
+                    GetFieldAccess::ListIndex {key} => {
+                        vec![key.as_ref().clone(), expr]
+                    },
+                    GetFieldAccess::ListRange {start, stop} => {
+                        vec![start.as_ref().clone(), stop.as_ref().clone(), expr]
+                    }
+                    GetFieldAccess::NamedStructField {name: _name} => {
+                        vec![expr]
+                    }

Review Comment:
   `to_columns` method of the `Expr` were returning missing columns for `Expr::GetIndexedField`, this change fixes this problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1370355550


##########
datafusion/expr/src/built_in_function.rs:
##########
@@ -315,6 +320,72 @@ fn function_to_name() -> &'static HashMap<BuiltinScalarFunction, &'static str> {
     })
 }
 
+/// Returns the wider type among lhs and rhs.
+/// Wider type is the type that can safely represent the other type without information loss.
+/// Returns Error if types are incompatible.
+fn get_wider_type(lhs: &DataType, rhs: &DataType) -> Result<DataType> {

Review Comment:
   This makes sense, I moved the function under `type_coercion` module. Thanks!.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on code in PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#discussion_r1370250433


##########
datafusion/core/tests/sql/group_by.rs:
##########
@@ -231,13 +231,13 @@ async fn group_by_dictionary() {
         .expect("ran plan correctly");
 
         let expected = [
-            "+-------+------------------------+",
-            "| t.val | COUNT(DISTINCT t.dict) |",
-            "+-------+------------------------+",
-            "| 1     | 2                      |",
-            "| 2     | 2                      |",
-            "| 4     | 1                      |",
-            "+-------+------------------------+",
+            "+-----+------------------------+",
+            "| val | COUNT(DISTINCT t.dict) |",

Review Comment:
   :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Cleanup logical optimizer rules. [arrow-datafusion]

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on PR #7919:
URL: https://github.com/apache/arrow-datafusion/pull/7919#issuecomment-1778037260

   I will wait a little bit before merging this, I'd appreciate if we can get some more eyes on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org