You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/03 05:17:59 UTC

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8097: ARROW-9821: [Rust][DataFusion] Support for User Defined ExtensionNodes in the LogicalPlan

jorgecarleitao commented on a change in pull request #8097:
URL: https://github.com/apache/arrow/pull/8097#discussion_r482703011



##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
                 let schema_ref = Arc::new(schema.as_ref().clone());
                 Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
             }
+            LogicalPlan::Extension { node } => {
+                let inputs = node
+                    .inputs()
+                    .into_iter()
+                    .map(|input_plan| self.create_physical_plan(input_plan, ctx_state))
+                    .collect::<Result<Vec<_>>>()?;
+
+                self.extension_planner

Review comment:
       If I understand correctly, this implies that a single `ExtensionPlanner` needs to be able to handle all custom nodes.
   
   Thus, if two extensions are shared, e.g. in cargo crates, consumers of those extensions can't just add them, as they need to re-write their extension planner to incorporate each of them.
   
   An alternative:
   
   * make `extension_planner` 's interface returns `Result<Option<Arc<dyn ExecutionPlan>>>`, and `None` represent that the planner does not know how to plan that node
   * change the logic here to loop over extensions until a non-None is found
   * make extensions return `None` for any node/nodes that they do not support
   * make ` self.extension_planner` be `self.extension_planners`
   * make `with_extension_planner` be `with_extension_planners(Vec)` (or `add_extension_planner()`)
   
   This would allow extensions to be shared along with their respective planner (or planners if they support multiple execution architectures), and users can just add it to their list of extensions and pass them to `with_extension_planners`.

##########
File path: rust/datafusion/tests/user_defined_plan.rs
##########
@@ -0,0 +1,500 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       Savage! 💯💯

##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
                 let schema_ref = Arc::new(schema.as_ref().clone());
                 Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
             }
+            LogicalPlan::Extension { node } => {
+                let inputs = node
+                    .inputs()
+                    .into_iter()
+                    .map(|input_plan| self.create_physical_plan(input_plan, ctx_state))
+                    .collect::<Result<Vec<_>>>()?;
+
+                self.extension_planner
+                    .plan_extension(node.as_ref(), inputs, ctx_state)

Review comment:
       I think that we should have a check around here to verify that the logical and physical node has the same output `schema`.

##########
File path: rust/datafusion/src/logical_plan/mod.rs
##########
@@ -755,8 +756,72 @@ impl fmt::Debug for Expr {
     }
 }
 
-/// The LogicalPlan represents different types of relations (such as Projection,
-/// Filter, etc) and can be created by the SQL query planner and the DataFrame API.
+/// This defines the interface for `LogicalPlan` nodes that can be
+/// used to extend DataFusion with custom relational operators.
+///
+/// See the example in
+/// [user_defined_plan.rs](../../tests/user_defined_plan.rs) for an
+/// example of how to use this extenison API
+pub trait ExtensionPlanNode: Debug {

Review comment:
       Another idea: `UserDefinedNode` (UDN), to be aligned with user defined function (UDF) that most people are familiar with and already know what it means.

##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -417,7 +445,16 @@ impl ExecutionConfig {
         mut self,
         physical_planner: Arc<dyn PhysicalPlanner>,
     ) -> Self {
-        self.physical_planner = Some(physical_planner);
+        self.physical_planner = physical_planner;
+        self
+    }
+
+    /// Optional source of additional optimization passes
+    pub fn with_optimizer_rule_source(

Review comment:
       Going in the same lines as my comment about the planning, one idea is to make `optimizer_rule_source` be a vector of rules, and instantiate `ExecutionConfig` with a `vec` of those rules, instead of having an extra `OptimizerRuleSource` to hold those rules.
   
   The rules themselves are `Arc<dyn OptimizerRule>`, and users write them by the trait.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org