You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/09 08:28:06 UTC

[GitHub] [arrow-datafusion] mingmwang opened a new issue #1965: Fast path of with_new_children() in ExecutionPlan

mingmwang opened a new issue #1965:
URL: https://github.com/apache/arrow-datafusion/issues/1965


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   The current with_new_children() method in ExecutionPlan trait is an abstract method and each physical plan implements the method with their specific versions. But a common optimization is omitted by almost all the implementations is if the new children provided in the params effectively share the same references with the existing children, there is no need to re-create the struct. And recreate new struct might cause its parent plan node to recreate itself which will cause many unnecessary plan node creation during plan/optimization time. Besides performance impact, it will make the plan debugging very hard for example If I want to
   check why the a new plan node(like shuffle writer) was added to the plan tree, and I add a debug breakpoint to the shuffle writer node's constructor, I will see mostly it was invoked by  with_new_children().
    
   ```
    pub trait ExecutionPlan: Debug + Send + Sync {
   
       /// Returns a new plan where all children were replaced by new plans.
       /// The size of `children` must be equal to the size of `ExecutionPlan::children()`.
       fn with_new_children(
           &self,
           children: Vec<Arc<dyn ExecutionPlan>>,
       ) -> Result<Arc<dyn ExecutionPlan>>;
   
   ```
   
   A new approach is proposed here is to have a default implement in ExecutionPlan to include some common logic and optimizations like children size check the fast path check and then delegate to the specific version:
   
   ```
    pub trait ExecutionPlan: Debug + Send + Sync {
   
       /// Returns a new plan where all children were replaced by new plans.
       /// The size of `children` must be equal to the size of `ExecutionPlan::children()`.
       fn with_new_children(
           self: Arc<Self>
           children: Vec<Arc<dyn ExecutionPlan>>,
       ) -> Result<Arc<dyn ExecutionPlan>> {
        /// children size check here
       
        /// children reference check, if the input children's Arc share the same raw point with current existing children, return self.
   
        with_new_children_internal()
   
       }
   
      fn with_new_children_internal(
           self: Arc<Self>
           children: Vec<Arc<dyn ExecutionPlan>>,
       ) -> Result<Arc<dyn ExecutionPlan>>;
   
   
   ```
   
   We can leverage the Arc::ptr_eq() to do the reference check.
   
   https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.ptr_eq
   
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org