You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/07 19:15:53 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #3758: Change public simplify API and the Public coerce API

alamb opened a new pull request, #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes https://github.com/apache/arrow-datafusion/issues/3708
   
   
    # Rationale for this change
   See https://github.com/apache/arrow-datafusion/issues/3708 
   
   # What changes are included in this PR?
   1. Create a `ExprSimplifier` struct
   2. Simplify and make public `SimplfyContext`
   3. move the Expr simplification API to `ExprSimplifier`
   4.  Add a `ExprSimplifier::coerce` function and tests
   
   Note I plan to add better documentation / examples via https://github.com/apache/arrow-datafusion/issues/3740 / https://github.com/apache/arrow-datafusion/pull/3741
   
   
   # Are there any user-facing changes?
   New API, changes to existing simplify API (I will higlight the changes)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r990602856


##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt
+    pub fn coerce(&self, expr: Expr, schema: DFSchemaRef) -> Result<Expr> {

Review Comment:
   TODO ticekt, typo?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r990602856


##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt
+    pub fn coerce(&self, expr: Expr, schema: DFSchemaRef) -> Result<Expr> {

Review Comment:
   ticekt, typo?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#issuecomment-1275982011

   Benchmark runs are scheduled for baseline = fb39d5d9fcab5cdab69e35505b877048a73a2e2e and contender = 61c38b7114e802f9f289bf5364a031395f5799a6. 61c38b7114e802f9f289bf5364a031395f5799a6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7228d5c0f059428da11ffc863a8d536c...14aca0ad9a404613814132b8976f21a4/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/fb6dbbe047934e67a2ddfc0bd84db71d...6eb8b1f73ad148689f87a0f4359aa870/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bbedffab261d4aa5a453790e5585ec60...a8f6c0559199464db31bfeccdf20b19a/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2cbefedba463423cab380457f6ba28f6...d74ee8367b7d4620b7cabcf9c87d1804/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#issuecomment-1274449216

   > I plan on reviewing this one tomorrow @alamb
   
   
   Thank you @andygrove  - I will have it ready and I'll also help try and review the outstanding DataFusion PRs as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r990601461


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -950,30 +909,12 @@ macro_rules! assert_contains {
     };
 }
 
-/// Apply simplification and constant propagation to ([Expr]).

Review Comment:
   Looks great. It is more flexible, since user can define their `SimplifyInfo` other than inner `SimplifyContext` now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r992120262


##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt
+    pub fn coerce(&self, expr: Expr, schema: DFSchemaRef) -> Result<Expr> {

Review Comment:
   Good call -- will file ticket.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r990601461


##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -950,30 +909,12 @@ macro_rules! assert_contains {
     };
 }
 
-/// Apply simplification and constant propagation to ([Expr]).

Review Comment:
   Looks great. It is more flexible to use, since user can define their `SimplifyInfo` other than inner `SimplifyContext` now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r992147134


##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt
+    pub fn coerce(&self, expr: Expr, schema: DFSchemaRef) -> Result<Expr> {

Review Comment:
   https://github.com/apache/arrow-datafusion/issues/3793



##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt
+    pub fn coerce(&self, expr: Expr, schema: DFSchemaRef) -> Result<Expr> {
+        let mut expr_rewrite = TypeCoercionRewriter { schema };
+
+        expr.rewrite(&mut expr_rewrite)
+    }
+}
+
+/// Provides simplification information based on DFSchema and
+/// [`ExecutionProps`]. This is the default implementation used by DataFusion
+///
+/// For example:
+/// ```
+/// use arrow::datatypes::{Schema, Field, DataType};
+/// use datafusion_expr::{col, lit};
+/// use datafusion_common::{DataFusionError, ToDFSchema};
+/// use datafusion_physical_expr::execution_props::ExecutionProps;
+/// use datafusion_optimizer::expr_simplifier::{SimplifyContext, ExprSimplifier};
+///
+/// // Create the schema
+/// let schema = Schema::new(vec![
+///     Field::new("i", DataType::Int64, false),
+///   ])
+///   .to_dfschema_ref().unwrap();
+///
+/// // Create the simplifier
+/// let props = ExecutionProps::new();
+/// let context = SimplifyContext::new(&props)
+///    .with_schema(schema);
+/// let simplifier = ExprSimplifier::new(context);
+///
+/// // Use the simplifier
+///
+/// // b < 2 or (1 > 3)
+/// let expr = col("b").lt(lit(2)).or(lit(1).gt(lit(3)));
+///
+/// // b < 2
+/// let simplified = simplifier.simplify(expr).unwrap();
+/// assert_eq!(simplified, col("b").lt(lit(2)));
+/// ```
+pub struct SimplifyContext<'a> {
+    schemas: Vec<DFSchemaRef>,
+    props: &'a ExecutionProps,
+}
+
+impl<'a> SimplifyContext<'a> {
+    /// Create a new SimplifyContext
+    pub fn new(props: &'a ExecutionProps) -> Self {
+        Self {
+            schemas: vec![],
+            props,
+        }
+    }
+
+    /// Register a [`DFSchemaRef`] with this context
+    pub fn with_schema(mut self, schema: DFSchemaRef) -> Self {
+        self.schemas.push(schema);
+        self
+    }
+}
+
+impl<'a> SimplifyInfo for SimplifyContext<'a> {
+    /// returns true if this Expr has boolean type
+    fn is_boolean_type(&self, expr: &Expr) -> Result<bool> {
+        for schema in &self.schemas {
+            if let Ok(DataType::Boolean) = expr.get_type(schema) {
+                return Ok(true);
+            }
+        }
+
+        Ok(false)
+    }
+    /// Returns true if expr is nullable
+    fn nullable(&self, expr: &Expr) -> Result<bool> {
+        self.schemas
+            .iter()
+            .find_map(|schema| {
+                // expr may be from another input, so ignore errors
+                // by converting to None to keep trying
+                expr.nullable(schema.as_ref()).ok()
+            })
+            .ok_or_else(|| {
+                // This means we weren't able to compute `Expr::nullable` with
+                // *any* input schemas, signalling a problem
+                DataFusionError::Internal(format!(
+                    "Could not find columns in '{}' during simplify",
+                    expr
+                ))
+            })
+    }
+
+    fn execution_props(&self) -> &ExecutionProps {
+        self.props
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use arrow::datatypes::{Field, Schema};
+    use datafusion_common::ToDFSchema;
+    use datafusion_expr::{col, lit};
+
+    #[test]
+    fn api_basic() {
+        let props = ExecutionProps::new();
+        let simplifier =
+            ExprSimplifier::new(SimplifyContext::new(&props).with_schema(test_schema()));
+
+        let expr = lit(1) + lit(2);
+        let expected = lit(3);
+        assert_eq!(expected, simplifier.simplify(expr).unwrap());
+    }
+
+    #[test]
+    fn basic_coercion() {
+        let schema = test_schema();
+        let props = ExecutionProps::new();
+        let simplifier =
+            ExprSimplifier::new(SimplifyContext::new(&props).with_schema(schema.clone()));
+
+        // Note expr type is int32 (not int64)
+        // (1i64 + 2i32) < i
+        let expr = (lit(1i64) + lit(2i32)).lt(col("i"));
+        // should fully simplify to 3 < i (though i has been coerced to i64)
+        let expected = lit(3i64).lt(col("i"));
+
+        // Would be nice if this API could use the SimplifyInfo
+        // rather than creating an DFSchemaRef coerces rather than doing
+        // it manually.
+        // TODO ticekt

Review Comment:
   ```suggestion
           // https://github.com/apache/arrow-datafusion/issues/3793
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r990440142


##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo

Review Comment:
   If this PR is approved, I will file a ticket to remove the `schema` argument in this function call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#issuecomment-1273949747

   I plan on reviewing this one tomorrow @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb merged PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r992146901


##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo

Review Comment:
   https://github.com/apache/arrow-datafusion/issues/3793



##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt

Review Comment:
   ```suggestion
       // https://github.com/apache/arrow-datafusion/issues/3793
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#issuecomment-1274949433

   This is blocking me upgrading DataFusion in my project (IOx). I know @andygrove  plans to review today. Unless I hear otherwise I'll plan to merge PR tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3758: Change public simplify API and add a public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#issuecomment-1275972375

   I am happy to respond to comments / make changes as a follow on PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3758: Change public simplify API and the Public coerce API

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3758:
URL: https://github.com/apache/arrow-datafusion/pull/3758#discussion_r990434857


##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -15,17 +15,24 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Expression simplifier
+//! Expression simplification API

Review Comment:
   This PR looks bigger than it really is -- it moves a bunch of code around and there are a significant number of docstrings



##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt
+    pub fn coerce(&self, expr: Expr, schema: DFSchemaRef) -> Result<Expr> {
+        let mut expr_rewrite = TypeCoercionRewriter { schema };
+
+        expr.rewrite(&mut expr_rewrite)
+    }
+}
+
+/// Provides simplification information based on DFSchema and
+/// [`ExecutionProps`]. This is the default implementation used by DataFusion
+///
+/// For example:
+/// ```
+/// use arrow::datatypes::{Schema, Field, DataType};
+/// use datafusion_expr::{col, lit};
+/// use datafusion_common::{DataFusionError, ToDFSchema};
+/// use datafusion_physical_expr::execution_props::ExecutionProps;
+/// use datafusion_optimizer::expr_simplifier::{SimplifyContext, ExprSimplifier};
+///
+/// // Create the schema
+/// let schema = Schema::new(vec![
+///     Field::new("i", DataType::Int64, false),
+///   ])
+///   .to_dfschema_ref().unwrap();
+///
+/// // Create the simplifier
+/// let props = ExecutionProps::new();
+/// let context = SimplifyContext::new(&props)
+///    .with_schema(schema);
+/// let simplifier = ExprSimplifier::new(context);
+///
+/// // Use the simplifier
+///
+/// // b < 2 or (1 > 3)
+/// let expr = col("b").lt(lit(2)).or(lit(1).gt(lit(3)));
+///
+/// // b < 2
+/// let simplified = simplifier.simplify(expr).unwrap();
+/// assert_eq!(simplified, col("b").lt(lit(2)));
+/// ```
+pub struct SimplifyContext<'a> {

Review Comment:
   This was moved into the public API and I added some docs and a builder interface `with_schema`



##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -15,73 +15,25 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Simplify expressions optimizer rule
+//! Simplify expressions optimizer rule and implementation
 
-use crate::expr_simplifier::ExprSimplifiable;
+use crate::expr_simplifier::{ExprSimplifier, SimplifyContext};
 use crate::{expr_simplifier::SimplifyInfo, OptimizerConfig, OptimizerRule};
 use arrow::array::new_null_array;
 use arrow::datatypes::{DataType, Field, Schema};
 use arrow::error::ArrowError;
 use arrow::record_batch::RecordBatch;
-use datafusion_common::{DFSchema, DFSchemaRef, DataFusionError, Result, ScalarValue};
+use datafusion_common::{DFSchema, DataFusionError, Result, ScalarValue};
 use datafusion_expr::{
     expr_fn::{and, or},
     expr_rewriter::{ExprRewritable, ExprRewriter, RewriteRecursion},
     lit,
     logical_plan::LogicalPlan,
     utils::from_plan,
-    BuiltinScalarFunction, ColumnarValue, Expr, ExprSchemable, Operator, Volatility,
+    BuiltinScalarFunction, ColumnarValue, Expr, Operator, Volatility,
 };
 use datafusion_physical_expr::{create_physical_expr, execution_props::ExecutionProps};
 
-/// Provides simplification information based on schema and properties

Review Comment:
   moved



##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -76,24 +96,181 @@ impl ExprSimplifiable for Expr {
     ///   }
     /// }
     ///
+    /// // Create the simplifier
+    /// let simplifier = ExprSimplifier::new(Info::default());
+    ///
     /// // b < 2
     /// let b_lt_2 = col("b").gt(lit(2));
     ///
     /// // (b < 2) OR (b < 2)
     /// let expr = b_lt_2.clone().or(b_lt_2.clone());
     ///
     /// // (b < 2) OR (b < 2) --> (b < 2)
-    /// let expr = expr.simplify(&Info::default()).unwrap();
+    /// let expr = simplifier.simplify(expr).unwrap();
     /// assert_eq!(expr, b_lt_2);
     /// ```
-    fn simplify<S: SimplifyInfo>(self, info: &S) -> Result<Self> {
-        let mut rewriter = Simplifier::new(info);
-        let mut const_evaluator = ConstEvaluator::try_new(info.execution_props())?;
+    pub fn simplify(&self, expr: Expr) -> Result<Expr> {
+        let mut rewriter = Simplifier::new(&self.info);
+        let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
 
         // TODO iterate until no changes are made during rewrite
         // (evaluating constants can enable new simplifications and
         // simplifications can enable new constant evaluation)
         // https://github.com/apache/arrow-datafusion/issues/1160
-        self.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+        expr.rewrite(&mut const_evaluator)?.rewrite(&mut rewriter)
+    }
+
+    /// Apply type coercion to an [`Expr`] so that it can be
+    /// evaluated as a [`PhysicalExpr`](datafusion_physical_expr::PhysicalExpr).
+    ///
+    /// See the [type coercion module](datafusion_expr::type_coercion)
+    /// documentation for more details on type coercion
+    ///
+    // Would be nice if this API could use the SimplifyInfo
+    // rather than creating an DFSchemaRef coerces rather than doing
+    // it manually.
+    // TODO ticekt
+    pub fn coerce(&self, expr: Expr, schema: DFSchemaRef) -> Result<Expr> {

Review Comment:
   Here is the new API for coercion



##########
datafusion/optimizer/src/expr_simplifier.rs:
##########
@@ -37,28 +44,41 @@ pub trait SimplifyInfo {
     fn execution_props(&self) -> &ExecutionProps;
 }
 
-/// trait for types that can be simplified

Review Comment:
   Removed in favor of calling `ExprSimplifier::simplify` directly



##########
datafusion/optimizer/src/simplify_expressions.rs:
##########
@@ -950,30 +909,12 @@ macro_rules! assert_contains {
     };
 }
 
-/// Apply simplification and constant propagation to ([Expr]).

Review Comment:
   Instead of adding a `simplify` method on to `Expr` via this trait, I propose to have an `ExprSimplifier` struct that has a simplify method.
   
   
   cc @ygf11
   I found it made the examples in `expr_api.rs` less awkward to write because the schema wasn't needed 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org