You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/18 11:38:28 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5630: Introduce a common trait TreeNode for ExecutionPlan, PhysicalExpr, LogicalExpr, LogicalPlan

alamb commented on code in PR #5630:
URL: https://github.com/apache/arrow-datafusion/pull/5630#discussion_r1140998574


##########
datafusion/common/src/tree_node.rs:
##########


Review Comment:
   This API looks very nice 👌 



##########
datafusion/core/src/datasource/listing/helpers.rs:
##########
@@ -36,48 +36,31 @@ use crate::{
 
 use super::PartitionedFile;
 use crate::datasource::listing::ListingTableUrl;
+use datafusion_common::tree_node::{Recursion, TreeNode};
 use datafusion_common::{
     cast::{as_date64_array, as_string_array, as_uint64_array},
     Column, DataFusionError,
 };
-use datafusion_expr::{
-    expr_visitor::{ExprVisitable, ExpressionVisitor, Recursion},
-    Expr, Volatility,
-};
+use datafusion_expr::{Expr, Volatility};
 use object_store::path::Path;
 use object_store::{ObjectMeta, ObjectStore};
 
 const FILE_SIZE_COLUMN_NAME: &str = "_df_part_file_size_";
 const FILE_PATH_COLUMN_NAME: &str = "_df_part_file_path_";
 const FILE_MODIFIED_COLUMN_NAME: &str = "_df_part_file_modified_";
 
-/// The `ExpressionVisitor` for `expr_applicable_for_cols`. Walks the tree to
-/// validate that the given expression is applicable with only the `col_names`
-/// set of columns.
-struct ApplicabilityVisitor<'a> {
-    col_names: &'a [String],
-    is_applicable: &'a mut bool,
-}
-
-impl ApplicabilityVisitor<'_> {
-    fn visit_volatility(self, volatility: Volatility) -> Recursion<Self> {
-        match volatility {
-            Volatility::Immutable => Recursion::Continue(self),
-            // TODO: Stable functions could be `applicable`, but that would require access to the context
-            Volatility::Stable | Volatility::Volatile => {
-                *self.is_applicable = false;
-                Recursion::Stop(self)
-            }
-        }
-    }
-}
-
-impl ExpressionVisitor for ApplicabilityVisitor<'_> {
-    fn pre_visit(self, expr: &Expr) -> Result<Recursion<Self>> {
-        let rec = match expr {
+/// Check whether the given expression can be resolved using only the columns `col_names`.
+/// This means that if this function returns true:
+/// - the table provider can filter the table partition values with this expression
+/// - the expression can be marked as `TableProviderFilterPushDown::Exact` once this filtering
+/// was performed
+pub fn expr_applicable_for_cols(col_names: &[String], expr: &Expr) -> bool {
+    let mut is_applicable = true;
+    expr.collect(&mut |expr| {

Review Comment:
   👍 



##########
datafusion/physical-expr/src/tree_node/mod.rs:
##########
@@ -0,0 +1,266 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This module provides common traits for visiting or rewriting tree nodes easily.
+//!
+//! It's a duplication of the one in the crate `datafusion-common`.
+//! In the future, if the Orphan rule is relaxed for Arc<dyn T>, these duplicated codes can be removed.

Review Comment:
   It would be great if we could figure out how to avoid this duplication and the one in physical-expr?
   
   Maybe we could create a blanket impl in `datafusion-common` like
   
   ```rust
   impl<T: TreeNode> TreeNode for Arc<T> {
   ...
   }
   ```
   ?  



##########
datafusion/core/src/physical_optimizer/dist_enforcement.rs:
##########
@@ -925,7 +925,11 @@ impl PlanWithKeyRequirements {
     }
 }
 
-impl TreeNodeRewritable for PlanWithKeyRequirements {
+impl TreeNode for PlanWithKeyRequirements {
+    fn get_children(&self) -> Vec<Self> {
+        unimplemented!()

Review Comment:
   Should this be implemented?



##########
datafusion/common/src/tree_node.rs:
##########
@@ -0,0 +1,261 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This module provides common traits for visiting or rewriting tree nodes easily.
+
+use crate::{DataFusionError, Result};
+
+/// Trait for tree node. It can be [`ExecutionPlan`], [`PhysicalExpr`], [`LogicalExpr`], etc.
+pub trait TreeNode: Clone {
+    /// Return the children of this tree node
+    fn get_children(&self) -> Vec<Self>;
+
+    /// Use preorder to iterate the node on the tree so that we can stop fast for some cases.
+    ///
+    /// `op` can be used to collect some info from the tree node.
+    fn collect<F>(&self, op: &mut F) -> Result<()>
+    where
+        F: FnMut(&Self) -> Result<Recursion>,
+    {
+        match op(self)? {
+            Recursion::Continue => {}
+            // If the recursion should stop, do not visit children
+            Recursion::Stop => return Ok(()),
+            r => {
+                return Err(DataFusionError::Execution(format!(
+                    "Recursion {r:?} is not supported for collect"
+                )))
+            }
+        };
+
+        for child in self.get_children() {
+            child.collect(op)?;
+        }
+
+        Ok(())
+    }
+
+    /// Visit the tree node using the given [TreeNodeVisitor]
+    /// It performs a depth first walk of an node and its children.
+    ///
+    /// For an node tree such as
+    /// ```text
+    /// ParentNode
+    ///    left: ChildNode1
+    ///    right: ChildNode2
+    /// ```
+    ///
+    /// The nodes are visited using the following order
+    /// ```text
+    /// pre_visit(ParentNode)
+    /// pre_visit(ChildNode1)
+    /// post_visit(ChildNode1)
+    /// pre_visit(ChildNode2)
+    /// post_visit(ChildNode2)
+    /// post_visit(ParentNode)
+    /// ```
+    ///
+    /// If an Err result is returned, recursion is stopped immediately
+    ///
+    /// If [`Recursion::Stop`] is returned on a call to pre_visit, no
+    /// children of that node will be visited, nor is post_visit
+    /// called on that node
+    ///
+    /// If using the default [`post_visit`] with nothing to do, the [`collect`] should be preferred
+    fn visit<V: TreeNodeVisitor<N = Self>>(&self, visitor: &mut V) -> Result<Recursion> {
+        match visitor.pre_visit(self)? {
+            Recursion::Continue => {}
+            // If the recursion should stop, do not visit children
+            Recursion::Stop => return Ok(Recursion::Stop),
+            r => {
+                return Err(DataFusionError::Execution(format!(
+                    "Recursion {r:?} is not supported for collect_using"
+                )))
+            }
+        };
+
+        for child in self.get_children() {
+            match child.visit(visitor)? {
+                Recursion::Continue => {}
+                // If the recursion should stop, do not visit children
+                Recursion::Stop => return Ok(Recursion::Stop),
+                r => {
+                    return Err(DataFusionError::Execution(format!(
+                        "Recursion {r:?} is not supported for collect_using"
+                    )))
+                }
+            }
+        }
+
+        visitor.post_visit(self)
+    }
+
+    /// Convenience utils for writing optimizers rule: recursively apply the given `op` to the node tree.
+    /// When `op` does not apply to a given node, it is left unchanged.
+    /// The default tree traversal direction is transform_up(Postorder Traversal).
+    fn transform<F>(self, op: &F) -> Result<Self>
+    where
+        F: Fn(Self) -> Result<Option<Self>>,
+    {
+        self.transform_up(op)
+    }
+
+    /// Convenience utils for writing optimizers rule: recursively apply the given 'op' to the node and all of its
+    /// children(Preorder Traversal).
+    /// When the `op` does not apply to a given node, it is left unchanged.
+    fn transform_down<F>(self, op: &F) -> Result<Self>
+    where
+        F: Fn(Self) -> Result<Option<Self>>,
+    {
+        let node_cloned = self.clone();
+        let after_op = match op(node_cloned)? {
+            Some(value) => value,
+            None => self,
+        };
+        after_op.map_children(|node| node.transform_down(op))
+    }
+
+    /// Convenience utils for writing optimizers rule: recursively apply the given 'op' first to all of its
+    /// children and then itself(Postorder Traversal).
+    /// When the `op` does not apply to a given node, it is left unchanged.
+    fn transform_up<F>(self, op: &F) -> Result<Self>
+    where
+        F: Fn(Self) -> Result<Option<Self>>,
+    {
+        let after_op_children = self.map_children(|node| node.transform_up(op))?;
+
+        let after_op_children_clone = after_op_children.clone();
+        let new_node = match op(after_op_children)? {
+            Some(value) => value,
+            None => after_op_children_clone,
+        };
+        Ok(new_node)
+    }
+
+    /// Transform the tree node using the given [TreeNodeRewriter]
+    /// It performs a depth first walk of an node and its children.
+    ///
+    /// For an node tree such as
+    /// ```text
+    /// ParentNode
+    ///    left: ChildNode1
+    ///    right: ChildNode2
+    /// ```
+    ///
+    /// The nodes are visited using the following order
+    /// ```text
+    /// pre_visit(ParentNode)
+    /// pre_visit(ChildNode1)
+    /// mutate(ChildNode1)
+    /// pre_visit(ChildNode2)
+    /// mutate(ChildNode2)
+    /// mutate(ParentNode)
+    /// ```
+    ///
+    /// If an Err result is returned, recursion is stopped immediately
+    ///
+    /// If [`false`] is returned on a call to pre_visit, no
+    /// children of that node will be visited, nor is mutate
+    /// called on that node
+    ///
+    /// If using the default [`pre_visit`] with [`true`] returned, the [`transform`] should be preferred
+    fn rewrite<R: TreeNodeRewriter<N = Self>>(self, rewriter: &mut R) -> Result<Self> {
+        let need_mutate = match rewriter.pre_visit(&self)? {
+            Recursion::Mutate => return rewriter.mutate(self),
+            Recursion::Stop => return Ok(self),
+            Recursion::Continue => true,
+            Recursion::Skip => false,
+        };
+
+        let after_op_children = self.map_children(|node| node.rewrite(rewriter))?;
+
+        // now rewrite this node itself
+        if need_mutate {
+            rewriter.mutate(after_op_children)
+        } else {
+            Ok(after_op_children)
+        }
+    }
+
+    /// Apply transform `F` to the node's children, the transform `F` might have a direction(Preorder or Postorder)
+    fn map_children<F>(self, transform: F) -> Result<Self>
+    where
+        F: FnMut(Self) -> Result<Self>;
+}
+
+/// Implements the [visitor
+/// pattern](https://en.wikipedia.org/wiki/Visitor_pattern) for recursively walking [`TreeNode`]s.
+///
+/// [`TreeNodeVisitor`] allows keeping the algorithms
+/// separate from the code to traverse the structure of the `TreeNode`
+/// tree and makes it easier to add new types of tree node and
+/// algorithms by.

Review Comment:
   ```suggestion
   /// algorithms.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org