You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/13 16:37:32 UTC

[GitHub] [arrow-datafusion] retikulum opened a new pull request, #3824: Add/Remove Division Rules

retikulum opened a new pull request, #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #3615.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   As it is discussed in issue, I believe that it will improve simplification rules
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   Added A / 0 division rule and corresponding test
   Added 0 / 0 division rule and corresponding test
   Removed A / A division rule and corresponding test
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   I don't think so
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] retikulum commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
retikulum commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r996403018


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   We can follow-up it on #3849.
   
   I did the necessary changes and control it with `cargo fmt`. Thanks for the help.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] retikulum commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
retikulum commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r996326507


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   > I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.
   > 
   > One way to check the existing behavior is to use `datafusion-cli`. In this case the result of `0/0` is null:
   > 
   > ```shell
   > $ datafusion-cli 
   > DataFusion CLI v13.0.0
   > ❯ select 0 / 0
   > ;
   > +---------------------+
   > | Int64(0) / Int64(0) |
   > +---------------------+
   > |                     |
   > +---------------------+
   > ```
   
   Thank you for your valuable feedback. If my understanding is wrong please correct me, we should return `null` for `0 / 0` to provide the same behavior with the current expression evaluator. So no changes are needed for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
ozankabak commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r995837975


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   If we are returning an error for the A / 0 case, shouldn't we also return an error in this case? It seems like they are both divide-by-zero instances. If you remove this block entirely, the next block will handle both cases in the said way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
ozankabak commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r995837975


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   If we are returning an error for the A / 0 case, shouldn't we also return an error in this case? It seems like they are both divide-by-zero instances.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] retikulum commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
retikulum commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r996327137


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null
             BinaryExpr {
                 left,
                 op: Divide,
                 right,
-            } if !info.nullable(&left)? && left == right => lit(1),
-
+            } if is_zero(&left) && is_zero(&right) => Expr::Literal(ScalarValue::Int32(None)),
+            // A / 0 --> DivideByZeroError 
+            BinaryExpr {
+                left: _,
+                op: Divide,
+                right,
+            } if is_zero(&right) => {

Review Comment:
   Thanks. I will add `!info.nullable(&left)? && is_zero(&right)` check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r996289974


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null
             BinaryExpr {
                 left,
                 op: Divide,
                 right,
-            } if !info.nullable(&left)? && left == right => lit(1),
-
+            } if is_zero(&left) && is_zero(&right) => Expr::Literal(ScalarValue::Int32(None)),
+            // A / 0 --> DivideByZeroError 
+            BinaryExpr {
+                left: _,
+                op: Divide,
+                right,
+            } if is_zero(&right) => {

Review Comment:
   ```sql
   ❯ create table foo as select * from (values (1), (2), (NULL)) as sql;
   ❯ select * from foo;
   +---------+
   | column1 |
   +---------+
   | 1       |
   | 2       |
   |         |
   +---------+
   3 rows in set. Query took 0.002 seconds.
   ❯ select column1 / 0 from foo;
   ArrowError(DivideByZero)
   ```
   
   👍 
   
   I agree on the should be checking for nullability



##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   I think the expected behavior of a rewrite rule is the same behavior as the current expression evaluator.
   
   One way to check the existing behavior is to use `datafusion-cli`. In this case the result of `0/0` is null: 
   
   ```shell
   $ datafusion-cli 
   DataFusion CLI v13.0.0
   ❯ select 0 / 0
   ;
   +---------------------+
   | Int64(0) / Int64(0) |
   +---------------------+
   |                     |
   +---------------------+
   ```



##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
ozankabak commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r995832991


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null
             BinaryExpr {
                 left,
                 op: Divide,
                 right,
-            } if !info.nullable(&left)? && left == right => lit(1),
-
+            } if is_zero(&left) && is_zero(&right) => Expr::Literal(ScalarValue::Int32(None)),
+            // A / 0 --> DivideByZeroError 
+            BinaryExpr {
+                left: _,
+                op: Divide,
+                right,
+            } if is_zero(&right) => {

Review Comment:
   Should this be `!info.nullable(&left)? && is_zero(&right)` like the modulo case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#issuecomment-1281540462

   It looks like this PR has accumulated some conflicts -- can you please merge up from master to resolve them and then I will merge
   
   ![Screen Shot 2022-10-17 at 5 52 39 PM](https://user-images.githubusercontent.com/490673/196290838-51590b79-9c9f-448c-9db0-a74cb1b2ba0e.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#issuecomment-1282341263

   Thanks again @retikulum 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] retikulum commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
retikulum commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r996327063


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   > BTW if the behavior (error or NULL) is inconsistent in DataFusion I think we should file that as a separate ticket / PR rather than try to make it consistent in this one
   
   I am a newbie rust developer and it is my first try to contribute to an open-source project. I am trying to understand the code base so I can't make appropriate comments on this one. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] retikulum commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
retikulum commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r995885123


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null
             BinaryExpr {
                 left,
                 op: Divide,
                 right,
-            } if !info.nullable(&left)? && left == right => lit(1),
-
+            } if is_zero(&left) && is_zero(&right) => Expr::Literal(ScalarValue::Int32(None)),
+            // A / 0 --> DivideByZeroError 
+            BinaryExpr {
+                left: _,
+                op: Divide,
+                right,
+            } if is_zero(&right) => {

Review Comment:
   You are right! I should have checked if it is nullable or not just like the modulo case. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] retikulum commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
retikulum commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r995911930


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   I approached it a little bit complex than I should have. A / 0 is undefined however 0/0 is indeterminate so I represent undefined as error and indeterminate as null. Also, expected behavior in the #3615 pointed that `col / col` should return `null` when `col = 0`. I checked other languages and see that python implemented both (A/0 and 0/0) to return an error. Moreover, I like the offered solution. If it is OK for others, I am ready to change necessary parts. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
alamb merged PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #3824: Add/Remove Division Rules

Posted by GitBox <gi...@apache.org>.
ozankabak commented on code in PR #3824:
URL: https://github.com/apache/arrow-datafusion/pull/3824#discussion_r996391068


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -794,13 +794,20 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {
                 op: Divide,
                 right,
             } if is_null(&right) => *right,
-            // A / A --> 1 (if a is not nullable)
+            // 0 / 0 -> null

Review Comment:
   It would be great if you can open an issue stating this discrepancy (A/0 -> Error vs. 0/0 -> NULL) so that the community can discuss/keep track of/fix it.
   
   My understanding is that if you fix the lint issues and make the null-check (as in modulo), this could be good to go.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org