You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/02 19:45:06 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3912: Expression boundary analysis framework

alamb commented on code in PR #3912:
URL: https://github.com/apache/arrow-datafusion/pull/3912#discussion_r1012217058


##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -665,87 +656,81 @@ impl PhysicalExpr for BinaryExpr {
             children[1].clone(),
         )))
     }
-}
-
-impl PartialEq<dyn Any> for BinaryExpr {
-    fn eq(&self, other: &dyn Any) -> bool {
-        down_cast_any_ref(other)
-            .downcast_ref::<Self>()
-            .map(|x| self.left.eq(&x.left) && self.op == x.op && self.right.eq(&x.right))
-            .unwrap_or(false)
-    }
-}
-
-struct BinaryExprStats {
-    op: Operator,
-    left: Arc<dyn PhysicalExpr>,
-    right: Arc<dyn PhysicalExpr>,
-}
 
-impl PhysicalExprStats for BinaryExprStats {
-    fn boundaries(&self, columns: &[ColumnStatistics]) -> Option<ExprBoundaries> {
+    /// Return the boundaries of this binary expression's result. If the expression itself
+    /// is a comparison which changes the boundaries of one of its inputs (a = 20 would pin
+    /// a to 20), then it might update the input's boundaries directly on the context.
+    fn analyze(&self, context: &mut AnalysisContext) -> Option<ExprBoundaries> {

Review Comment:
   Since AnalysisContext is in terms of the "current schema" it will likely almost always end up being modified if the execution plan modifies the schema (which most do).
   
   I find that the mix of "input" and "output" of a `mut &AnalysisContext` to be confusing to explain.
   
   
   Since an `AnalysisContext` has `ExprBoundaries` in it anyways, I recommend the signature be
   
   
   ```rust
   fn analyze(&self) -> AnalysisContext {
   ...
   }
   ```
   
   And then pass each input can produce the context / boundary analysis for itself and the parent operators can combine them
   
   Given that the inputs will produced an owned AnalysisContext it will be clear what is input and what is output and I think it can be made quite fast in Rust



##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -665,87 +656,81 @@ impl PhysicalExpr for BinaryExpr {
             children[1].clone(),
         )))
     }
-}
-
-impl PartialEq<dyn Any> for BinaryExpr {
-    fn eq(&self, other: &dyn Any) -> bool {
-        down_cast_any_ref(other)
-            .downcast_ref::<Self>()
-            .map(|x| self.left.eq(&x.left) && self.op == x.op && self.right.eq(&x.right))
-            .unwrap_or(false)
-    }
-}
-
-struct BinaryExprStats {
-    op: Operator,
-    left: Arc<dyn PhysicalExpr>,
-    right: Arc<dyn PhysicalExpr>,
-}
 
-impl PhysicalExprStats for BinaryExprStats {
-    fn boundaries(&self, columns: &[ColumnStatistics]) -> Option<ExprBoundaries> {
+    /// Return the boundaries of this binary expression's result. If the expression itself
+    /// is a comparison which changes the boundaries of one of its inputs (a = 20 would pin
+    /// a to 20), then it might update the input's boundaries directly on the context.
+    fn analyze(&self, context: &mut AnalysisContext) -> Option<ExprBoundaries> {

Review Comment:
   Maybe AnalysisContext could be expanded to be:
   
   ```rust
   /// A context for collecting and aggregating known boundaries of an expression
   /// tree.
   #[derive(Clone, Debug, PartialEq)]
   pub struct AnalysisContext {
       /// the boundaries of this particular PhysicalExpr's output
       pub boundaries: ExprBoundaries,
   
       /// A list of known column boundaries, ordered by column index
       /// in the current schema.
       pub column_boundaries: Vec<Option<ExprBoundaries>>,
   }
   ```
   
   As I have mentioned before, I don't fully understand specially how `column_boundaries` can be used, but I am interested in seeing an example



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org