You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "izveigor (via GitHub)" <gi...@apache.org> on 2023/03/31 19:30:37 UTC

[GitHub] [arrow-datafusion] izveigor opened a new pull request, #5816: feat: add optimization support to LOG and POWER functions

izveigor opened a new pull request, #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #5815
   
   # Rationale for this change
   
   # What changes are included in this PR?
   
   # Are these changes tested?
   Yes
   
   # Are there any user-facing changes?
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1154823149


##########
datafusion/optimizer/src/simplify_expressions/utils.rs:
##########
@@ -350,6 +351,73 @@ pub fn distribute_negation(expr: Expr) -> Expr {
     }
 }
 
+/// Simplify the `log` function by the relevant rules:
+/// 1. Log(a, 1) ===> 0
+/// 2. Log(a, a) ===> 1
+/// 3. Log(a, Power(a, b)) ===> b
+pub fn simpl_log(current_args: Vec<Expr>, info: &dyn SimplifyInfo) -> Result<Expr> {
+    let base = &current_args[0];
+    let number = &current_args[1];
+
+    match number {
+        Expr::Literal(value)
+            if value == &ScalarValue::new_one(&info.get_data_type(number)?)? =>
+        {
+            Ok(Expr::Literal(ScalarValue::new_zero(
+                &info.get_data_type(base)?,
+            )?))
+        }
+        Expr::ScalarFunction {
+            fun: BuiltinScalarFunction::Power,
+            args,
+        } if base == &args[0] => Ok(args[1].clone()),
+        _ => {
+            if number == base {
+                Ok(Expr::Literal(ScalarValue::new_one(
+                    &info.get_data_type(number)?,
+                )?))
+            } else {
+                Ok(Expr::ScalarFunction {
+                    fun: BuiltinScalarFunction::Log,
+                    args: current_args,
+                })
+            }
+        }
+    }
+}
+
+/// Simplify the `power` function by the relevant rules:
+/// 1. Power(a, 0) ===> 0
+/// 2. Power(a, 1) ===> a
+/// 3. Power(a, Log(a, b)) ===> b

Review Comment:
   Should we use the law: Power(a1, a2 * a3 * a4 * Log(a1, a5)) ===> a2 * a3 * a4 * a5? 🤔



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1155114869


##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1072,6 +1072,18 @@ impl<'a, S: SimplifyInfo> TreeNodeRewriter for Simplifier<'a, S> {
                 out_expr.rewrite(self)?
             }
 
+            // log

Review Comment:
   I think this file is too large. Should we split the file into several by their expressions (for example: scalar_function.rs, bitwise_operations.rs ...)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1154823428


##########
datafusion/optimizer/src/simplify_expressions/utils.rs:
##########
@@ -350,6 +351,73 @@ pub fn distribute_negation(expr: Expr) -> Expr {
     }
 }
 
+/// Simplify the `log` function by the relevant rules:
+/// 1. Log(a, 1) ===> 0
+/// 2. Log(a, a) ===> 1
+/// 3. Log(a, Power(a, b)) ===> b

Review Comment:
   Should we use the law: Log(a1, a2 * a3 * a4 * Power(a1, a5)) ===> a2 * a3 * a4 * a5? 🤔



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1155932650


##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -2210,6 +2222,68 @@ mod tests {
         assert_eq!(simplify(expr_eq), lit(true));
     }
 
+    #[test]
+    fn test_simplify_log() {
+        // Log(c3, 1) ===> 0
+        {
+            let expr = log(col("c3_non_null"), lit(1));
+            let expected = lit(0i64);
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, c3) ===> 1
+        {
+            let expr = log(col("c3_non_null"), col("c3_non_null"));
+            let expected = lit(1i64);
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, Power(c3, c4)) ===> c4
+        {
+            let expr = log(
+                col("c3_non_null"),
+                power(col("c3_non_null"), col("c4_non_null")),
+            );
+            let expected = col("c4_non_null");
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, c4) ===> c4

Review Comment:
   ```suggestion
           // Log(c3, c4) ===> Log(c3, c4) 
   ```



##########
datafusion/optimizer/src/simplify_expressions/utils.rs:
##########
@@ -350,6 +351,73 @@ pub fn distribute_negation(expr: Expr) -> Expr {
     }
 }
 
+/// Simplify the `log` function by the relevant rules:
+/// 1. Log(a, 1) ===> 0
+/// 2. Log(a, a) ===> 1
+/// 3. Log(a, Power(a, b)) ===> b

Review Comment:
   I think this is a good set of changes for now. 



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1072,6 +1072,18 @@ impl<'a, S: SimplifyInfo> TreeNodeRewriter for Simplifier<'a, S> {
                 out_expr.rewrite(self)?
             }
 
+            // log

Review Comment:
   I agree splitting it into smaller modules would be a great idea for a follow on PR



##########
datafusion/common/src/scalar.rs:
##########
@@ -1723,6 +1745,27 @@ impl ScalarValue {
         })
     }
 
+    pub fn new_ten(datatype: &DataType) -> Result<ScalarValue> {

Review Comment:
   It sounds like a reasonable idea. Thanks @izveigor 
   
   It might be possible to to use the traits defined in arrow-rs for this: https://docs.rs/arrow/latest/arrow/array/trait.ArrowPrimitiveType.html
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5816: feat: Simplify LOG and POWER functions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5816: feat: Simplify LOG and POWER functions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#issuecomment-1494288238

   I fixed a typo and merged up from main -- I plan to merge this PR once CI passes. Thanks again @izveigor 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1155114663


##########
datafusion/common/src/scalar.rs:
##########
@@ -1723,6 +1745,27 @@ impl ScalarValue {
         })
     }
 
+    pub fn new_ten(datatype: &DataType) -> Result<ScalarValue> {

Review Comment:
   Should we create the function `new_int`, that would receive the input integer number (I think i8), and the datatype and return the number with this datatype?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#issuecomment-1492978926

   Hello, @alamb!
   Thanks for the review.
   I have left some comments about improvement. I think they will be interesting by further PR ans issues.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1154823149


##########
datafusion/optimizer/src/simplify_expressions/utils.rs:
##########
@@ -350,6 +351,73 @@ pub fn distribute_negation(expr: Expr) -> Expr {
     }
 }
 
+/// Simplify the `log` function by the relevant rules:
+/// 1. Log(a, 1) ===> 0
+/// 2. Log(a, a) ===> 1
+/// 3. Log(a, Power(a, b)) ===> b
+pub fn simpl_log(current_args: Vec<Expr>, info: &dyn SimplifyInfo) -> Result<Expr> {
+    let base = &current_args[0];
+    let number = &current_args[1];
+
+    match number {
+        Expr::Literal(value)
+            if value == &ScalarValue::new_one(&info.get_data_type(number)?)? =>
+        {
+            Ok(Expr::Literal(ScalarValue::new_zero(
+                &info.get_data_type(base)?,
+            )?))
+        }
+        Expr::ScalarFunction {
+            fun: BuiltinScalarFunction::Power,
+            args,
+        } if base == &args[0] => Ok(args[1].clone()),
+        _ => {
+            if number == base {
+                Ok(Expr::Literal(ScalarValue::new_one(
+                    &info.get_data_type(number)?,
+                )?))
+            } else {
+                Ok(Expr::ScalarFunction {
+                    fun: BuiltinScalarFunction::Log,
+                    args: current_args,
+                })
+            }
+        }
+    }
+}
+
+/// Simplify the `power` function by the relevant rules:
+/// 1. Power(a, 0) ===> 0
+/// 2. Power(a, 1) ===> a
+/// 3. Power(a, Log(a, b)) ===> b

Review Comment:
   Should we use the law: Power(a1, a2*a3*a4*Log(a1, a5)) ===> a2*a3*a4*a5? 🤔



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1154833265


##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -2210,6 +2222,68 @@ mod tests {
         assert_eq!(simplify(expr_eq), lit(true));
     }
 
+    #[test]
+    fn test_simplify_log() {
+        // Log(c3, 1) ===> 0
+        {
+            let expr = log(col("c3_non_null"), lit(1));
+            let expected = lit(0i64);
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, c3) ===> 1
+        {
+            let expr = log(col("c3_non_null"), col("c3_non_null"));
+            let expected = lit(1i64);
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, Power(c3, c4)) ===> c4
+        {
+            let expr = log(
+                col("c3_non_null"),
+                power(col("c3_non_null"), col("c4_non_null")),
+            );
+            let expected = col("c4_non_null");
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, c4) ===> c4
+        {
+            let expr = log(col("c3_non_null"), col("c4_non_null"));
+            let expected = log(col("c3_non_null"), col("c4_non_null"));
+            assert_eq!(simplify(expr), expected);
+        }
+    }
+
+    #[test]
+    fn test_simplify_power() {
+        // Power(c3, 0) ===> 0
+        {
+            let expr = power(col("c3_non_null"), lit(0));
+            let expected = lit(0i64);
+            assert_eq!(simplify(expr), expected);
+        }
+        // Power(c3, 1) ===> a

Review Comment:
   ```suggestion
           // Power(c3, 1) ===> c3
   ```



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -2210,6 +2222,68 @@ mod tests {
         assert_eq!(simplify(expr_eq), lit(true));
     }
 
+    #[test]
+    fn test_simplify_log() {
+        // Log(c3, 1) ===> 0
+        {
+            let expr = log(col("c3_non_null"), lit(1));
+            let expected = lit(0i64);
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, c3) ===> 1
+        {
+            let expr = log(col("c3_non_null"), col("c3_non_null"));
+            let expected = lit(1i64);
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, Power(c3, c4)) ===> c4
+        {
+            let expr = log(
+                col("c3_non_null"),
+                power(col("c3_non_null"), col("c4_non_null")),
+            );
+            let expected = col("c4_non_null");
+            assert_eq!(simplify(expr), expected);
+        }
+        // Log(c3, c4) ===> c4
+        {
+            let expr = log(col("c3_non_null"), col("c4_non_null"));
+            let expected = log(col("c3_non_null"), col("c4_non_null"));
+            assert_eq!(simplify(expr), expected);
+        }
+    }
+
+    #[test]
+    fn test_simplify_power() {
+        // Power(c3, 0) ===> 0

Review Comment:
   Shouldn't `c^0` be `1` (not `0`)?



##########
datafusion/common/src/scalar.rs:
##########
@@ -1705,6 +1705,29 @@ impl ScalarValue {
         })
     }
 
+    /// Create an one value in the given type.
+    pub fn new_one(datatype: &DataType) -> Result<ScalarValue> {
+        assert!(datatype.is_primitive());
+        Ok(match datatype {
+            DataType::Boolean => ScalarValue::Boolean(Some(true)),

Review Comment:
   Boolean doesn't make sense to me as it isn't numeric. I think Boolean should also return an error



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#discussion_r1154823428


##########
datafusion/optimizer/src/simplify_expressions/utils.rs:
##########
@@ -350,6 +351,73 @@ pub fn distribute_negation(expr: Expr) -> Expr {
     }
 }
 
+/// Simplify the `log` function by the relevant rules:
+/// 1. Log(a, 1) ===> 0
+/// 2. Log(a, a) ===> 1
+/// 3. Log(a, Power(a, b)) ===> b

Review Comment:
   Should we use the law: Log(a1, a2*a3*a4*Power(a1, a5)) ===> a2*a3*a4*a5? 🤔



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5816: feat: add optimization support to LOG and POWER functions

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5816:
URL: https://github.com/apache/arrow-datafusion/pull/5816#issuecomment-1492940371

   CI appears to be failing now on this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org