You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/20 18:51:09 UTC

[GitHub] [arrow] alamb opened a new pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encod

alamb opened a new pull request #9278:
URL: https://github.com/apache/arrow/pull/9278


   ## Problem:
   * There are several places in the DataFusion codebase where a walk of an Expression tree is needed
   * The logic of how to walk the tree is replicated
   * Adding new expression types often require many mechanically different but semantically the same changes in many places where no special treatment of such types is needed
   
   This PR introduces a `ExpressionVisitor` trait and the `Expr::accept` function to consolidate this walking of the expression tree. It does not intend to change any functionality. 
   
   If folks like this pattern, I have ideas for a similar type of trait `ExpressionRewriter` which can be used to rewrite expressions (much like `clone_with_replacement`) as a subsquent PR. I think this was mentioned by @Dandandan  in the [Rust roadmap](https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit#heading=h.kstb571j5g5j)
   
   
   cc @jorgecarleitao @Dandandan and @andygrove 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#issuecomment-764011355


   Thanks @jorgecarleitao  and @Dandandan . I am personally quite excited at using this same idea for expression rewriting. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#discussion_r561229060



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -422,6 +422,136 @@ impl Expr {
             nulls_first,
         }
     }
+
+    /// Performs a depth first depth first walk of an expression and

Review comment:
       ```suggestion
       /// Performs a depth first walk of an expression and
   ```

##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -422,6 +422,136 @@ impl Expr {
             nulls_first,
         }
     }
+
+    /// Performs a depth first depth first walk of an expression and
+    /// its children, calling `visitor.pre_visit` and

Review comment:
       ```suggestion
       /// its children, calling [`ExpressionVisitor::pre_visit`] and
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#issuecomment-763885202


   I took a quick look and this looks good to me. I will try and review in more detail tonight.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#discussion_r561197743



##########
File path: rust/datafusion/src/optimizer/utils.rs
##########
@@ -46,75 +50,48 @@ pub fn exprlist_to_column_names(
 
 /// Recursively walk an expression tree, collecting the unique set of column names
 /// referenced in the expression
-pub fn expr_to_column_names(expr: &Expr, accum: &mut HashSet<String>) -> Result<()> {

Review comment:
       This is a pretty good example of the kind of repetition that can be removed using this visitor pattern.
   Note that I still left all expr types enumerated so that anyone who adds a new Expr type need to update this code, and (hopefully) think if they need to add special handling for that new expr types
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#discussion_r561229060



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -422,6 +422,136 @@ impl Expr {
             nulls_first,
         }
     }
+
+    /// Performs a depth first depth first walk of an expression and

Review comment:
       ```suggestion
       /// Performs a depth first walk of an expression and
   ```

##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -422,6 +422,136 @@ impl Expr {
             nulls_first,
         }
     }
+
+    /// Performs a depth first depth first walk of an expression and
+    /// its children, calling `visitor.pre_visit` and

Review comment:
       ```suggestion
       /// its children, calling [`ExpressionVisitor::pre_visit`] and
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9278:
URL: https://github.com/apache/arrow/pull/9278


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#issuecomment-764011355


   Thanks @jorgecarleitao  and @Dandandan . I am personally quite excited at using this same idea for expression rewriting. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#discussion_r561366255



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -422,6 +422,136 @@ impl Expr {
             nulls_first,
         }
     }
+
+    /// Performs a depth first depth first walk of an expression and
+    /// its children, calling `visitor.pre_visit` and
+    /// `visitor.post_visit`.
+    ///
+    /// Implements the [visitor pattern](https://en.wikipedia.org/wiki/Visitor_pattern) to
+    /// separate expression algorithms from the structure of the
+    /// `Expr` tree and make it easier to add new types of expressions
+    /// and algorithms that walk the tree.
+    ///
+    /// For an expression tree such as
+    /// BinaryExpr (GT)
+    ///    left: Column("foo")
+    ///    right: Column("bar")
+    ///
+    /// The nodes are visited using the following order
+    /// ```text
+    /// pre_visit(BinaryExpr(GT))
+    /// pre_visit(Column("foo"))
+    /// pre_visit(Column("bar"))
+    /// post_visit(Column("bar"))
+    /// post_visit(Column("bar"))
+    /// post_visit(BinaryExpr(GT))
+    /// ```
+    ///
+    /// If an Err result is returned, recursion is stopped immediately
+    ///
+    /// If `Recursion::Stop` is returned on a call to pre_visit, no
+    /// children of that expression are visited, nor is post_visit
+    /// called on that expression
+    ///
+    pub fn accept<V: ExpressionVisitor>(&self, visitor: V) -> Result<V> {
+        let visitor = match visitor.pre_visit(self)? {
+            Recursion::Continue(visitor) => visitor,
+            // If the recursion should stop, do not visit children
+            Recursion::Stop(visitor) => return Ok(visitor),
+        };
+
+        // recurse (and cover all expression types)
+        let visitor = match self {
+            Expr::Alias(expr, _) => expr.accept(visitor),
+            Expr::Column(..) => Ok(visitor),
+            Expr::ScalarVariable(..) => Ok(visitor),
+            Expr::Literal(..) => Ok(visitor),
+            Expr::BinaryExpr { left, right, .. } => {
+                let visitor = left.accept(visitor)?;
+                right.accept(visitor)
+            }
+            Expr::Not(expr) => expr.accept(visitor),
+            Expr::IsNotNull(expr) => expr.accept(visitor),
+            Expr::IsNull(expr) => expr.accept(visitor),
+            Expr::Negative(expr) => expr.accept(visitor),
+            Expr::Between {
+                expr, low, high, ..
+            } => {
+                let visitor = expr.accept(visitor)?;
+                let visitor = low.accept(visitor)?;
+                high.accept(visitor)
+            }
+            Expr::Case {
+                expr,
+                when_then_expr,
+                else_expr,
+            } => {
+                let visitor = if let Some(expr) = expr.as_ref() {
+                    expr.accept(visitor)
+                } else {
+                    Ok(visitor)
+                }?;
+                let visitor = when_then_expr.iter().try_fold(
+                    visitor,
+                    |visitor, (when, then)| {
+                        let visitor = when.accept(visitor)?;
+                        then.accept(visitor)
+                    },
+                )?;
+                if let Some(else_expr) = else_expr.as_ref() {
+                    else_expr.accept(visitor)
+                } else {
+                    Ok(visitor)
+                }
+            }
+            Expr::Cast { expr, .. } => expr.accept(visitor),
+            Expr::Sort { expr, .. } => expr.accept(visitor),
+            Expr::ScalarFunction { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::ScalarUDF { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::AggregateFunction { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::AggregateUDF { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::InList { expr, list, .. } => {
+                let visitor = expr.accept(visitor)?;
+                list.iter()
+                    .try_fold(visitor, |visitor, arg| arg.accept(visitor))
+            }
+            Expr::Wildcard => Ok(visitor),
+        }?;
+
+        visitor.post_visit(self)
+    }
+}
+
+/// Controls how the visitor recursion should proceed.
+pub enum Recursion<V: ExpressionVisitor> {
+    /// Attempt to visit all the children, recursively, of this expression.
+    Continue(V),
+    /// Do not visit the children of this expression, though the walk
+    /// of parents of this expression will not be affected
+    Stop(V),
+}
+
+/// Encode the traversal of an expression tree. When passed to
+/// `visit_expression`, `ExpressionVisitor::visit` is invoked

Review comment:
       ```suggestion
   /// `Expr::accept`, `ExpressionVisitor::visit` is invoked
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#issuecomment-763902393


   I just wanted to add that there is no need to wait for me to review before merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#issuecomment-763902393


   I just wanted to add that there is no need to wait for me to review before merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#discussion_r561366255



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -422,6 +422,136 @@ impl Expr {
             nulls_first,
         }
     }
+
+    /// Performs a depth first depth first walk of an expression and
+    /// its children, calling `visitor.pre_visit` and
+    /// `visitor.post_visit`.
+    ///
+    /// Implements the [visitor pattern](https://en.wikipedia.org/wiki/Visitor_pattern) to
+    /// separate expression algorithms from the structure of the
+    /// `Expr` tree and make it easier to add new types of expressions
+    /// and algorithms that walk the tree.
+    ///
+    /// For an expression tree such as
+    /// BinaryExpr (GT)
+    ///    left: Column("foo")
+    ///    right: Column("bar")
+    ///
+    /// The nodes are visited using the following order
+    /// ```text
+    /// pre_visit(BinaryExpr(GT))
+    /// pre_visit(Column("foo"))
+    /// pre_visit(Column("bar"))
+    /// post_visit(Column("bar"))
+    /// post_visit(Column("bar"))
+    /// post_visit(BinaryExpr(GT))
+    /// ```
+    ///
+    /// If an Err result is returned, recursion is stopped immediately
+    ///
+    /// If `Recursion::Stop` is returned on a call to pre_visit, no
+    /// children of that expression are visited, nor is post_visit
+    /// called on that expression
+    ///
+    pub fn accept<V: ExpressionVisitor>(&self, visitor: V) -> Result<V> {
+        let visitor = match visitor.pre_visit(self)? {
+            Recursion::Continue(visitor) => visitor,
+            // If the recursion should stop, do not visit children
+            Recursion::Stop(visitor) => return Ok(visitor),
+        };
+
+        // recurse (and cover all expression types)
+        let visitor = match self {
+            Expr::Alias(expr, _) => expr.accept(visitor),
+            Expr::Column(..) => Ok(visitor),
+            Expr::ScalarVariable(..) => Ok(visitor),
+            Expr::Literal(..) => Ok(visitor),
+            Expr::BinaryExpr { left, right, .. } => {
+                let visitor = left.accept(visitor)?;
+                right.accept(visitor)
+            }
+            Expr::Not(expr) => expr.accept(visitor),
+            Expr::IsNotNull(expr) => expr.accept(visitor),
+            Expr::IsNull(expr) => expr.accept(visitor),
+            Expr::Negative(expr) => expr.accept(visitor),
+            Expr::Between {
+                expr, low, high, ..
+            } => {
+                let visitor = expr.accept(visitor)?;
+                let visitor = low.accept(visitor)?;
+                high.accept(visitor)
+            }
+            Expr::Case {
+                expr,
+                when_then_expr,
+                else_expr,
+            } => {
+                let visitor = if let Some(expr) = expr.as_ref() {
+                    expr.accept(visitor)
+                } else {
+                    Ok(visitor)
+                }?;
+                let visitor = when_then_expr.iter().try_fold(
+                    visitor,
+                    |visitor, (when, then)| {
+                        let visitor = when.accept(visitor)?;
+                        then.accept(visitor)
+                    },
+                )?;
+                if let Some(else_expr) = else_expr.as_ref() {
+                    else_expr.accept(visitor)
+                } else {
+                    Ok(visitor)
+                }
+            }
+            Expr::Cast { expr, .. } => expr.accept(visitor),
+            Expr::Sort { expr, .. } => expr.accept(visitor),
+            Expr::ScalarFunction { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::ScalarUDF { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::AggregateFunction { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::AggregateUDF { args, .. } => args
+                .iter()
+                .try_fold(visitor, |visitor, arg| arg.accept(visitor)),
+            Expr::InList { expr, list, .. } => {
+                let visitor = expr.accept(visitor)?;
+                list.iter()
+                    .try_fold(visitor, |visitor, arg| arg.accept(visitor))
+            }
+            Expr::Wildcard => Ok(visitor),
+        }?;
+
+        visitor.post_visit(self)
+    }
+}
+
+/// Controls how the visitor recursion should proceed.
+pub enum Recursion<V: ExpressionVisitor> {
+    /// Attempt to visit all the children, recursively, of this expression.
+    Continue(V),
+    /// Do not visit the children of this expression, though the walk
+    /// of parents of this expression will not be affected
+    Stop(V),
+}
+
+/// Encode the traversal of an expression tree. When passed to
+/// `visit_expression`, `ExpressionVisitor::visit` is invoked

Review comment:
       ```suggestion
   /// `Expr::accept`, `ExpressionVisitor::visit` is invoked
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9278:
URL: https://github.com/apache/arrow/pull/9278#issuecomment-763867008


   https://issues.apache.org/jira/browse/ARROW-11330


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9278:
URL: https://github.com/apache/arrow/pull/9278


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org