You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/07 07:22:23 UTC

[GitHub] [arrow-datafusion] mingmwang opened a new issue, #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

mingmwang opened a new issue, #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

In current DataFusion code base, `Expr`, `LogicalPlan` and `ExecutionPlan` are represented in different ways. Actually they are all tree based structures. `Expr` and `LogicalPlan` are Enums, `ExecutionPlan` is a Trait.
The LogicalPlan enum wrapped the different logical operator structs like Projection and Filter, etc.

````
pub enum LogicalPlan {
Projection(Projection),
Filter(Filter),
............
}
````

But the Expr enum doesn't wrap the expression structs and define the different expressions directly in the enum.

````
pub enum Expr {
Alias(Box<Expr>, String),
Column(Column),
............
AggregateFunction {
fun: aggregate_function::AggregateFunction,
args: Vec<Expr>,
distinct: bool,
},
WindowFunction {
fun: window_function::WindowFunction,
args: Vec<Expr>,
partition_by: Vec<Expr>,
order_by: Vec<Expr>,
window_frame: Option<window_frame::WindowFrame>,
},
............
}
````

I think we should unify the coding style, at least the Expr and LogicalPlan representations should follow the same style. And for physical ExecutionPlan, it is Trait/Trait Objects, I would prefer to use Enum also. Want to hear thoughts from other members.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

andygrove commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1264379554

   I have a proposal in https://github.com/apache/arrow-datafusion/pull/3677 to add a `PhysicalPlan` enum that avoids making any breaking API changes. 
   
   I agree that the `Expr` enum should be made more consistent with the `LogicalPlan` enum by creating structs that each enum variant would wrap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262606590

   @tustvold 
   One example here to compare two expressions are equal or not. If the expressions are purely Enum/Struct, I can easily
   use '==' to do the comparing, just like any other OO programming languages. 
   
   ````
        let expr1 = Expr::BinaryExpr {
               left: Box::new(lit(1)),
               op: Operator::Plus,
               right: Box::new(lit(2)),
         };
   
         let expr2 =  let expr2 = Expr::BinaryExpr {
               left: Box::new(lit(2)),
               op: Operator::Minus,
               right: Box::new(lit(1)),
         };
   
         if expr1 == expr2 {
           // do something
         }
   
   ````
   
   But if the expressions are Trait, because the type is unknown, how to write code to compare the expressions?
   The expressions are also tree like structures and we have to downcast and destructuring and to do the comparing recursively.
   
   fn traverse_expression_and_compare(expr1: dyn PhysicalExpr,  expr2: dyn PhysicalExpr) -> bool {
     if expr1.as_any().downcast_ref<BinaryExpr>.is_some() 
       &&  expr2.as_any().downcast_ref<BinaryExpr>.is_some() {
         // compare the op
         traverse_expression_and_compare()
       }
   }
   
   as_any().downcast_ref<RepartitionExec>.is_some() 
   
   pub struct CaseExpr {
       /// Optional base expression that can be compared to literal values in the "when" expressions
       expr: Option<Arc<dyn PhysicalExpr>>,
       /// One or more when/then expressions
       when_then_expr: Vec<WhenThen>,
       /// Optional "else" expression
       else_expr: Option<Arc<dyn PhysicalExpr>>,
   }
   
   
   
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262554350

   > A major problem is because it is trait, we do not have a way to compare or check whether two 'PhysicalExpr'
   equal or not.
   
   `as_any().downcast_ref<RepartitionExec>.is_some()` ?
   
   I do have a couple of concerns:
   
   * This will be a monumental amount of effort and downstream churn
   * Static typing is a double-edged sword, whilst it makes some code that needs to handle a specific set of things, significantly simpler, it makes other things that need to handle arbitrary things, significantly more complicated
   * My 2 cents from arrow-rs is that there are use-cases for static typing, but it becomes unwieldy incredibly fast and has a somewhat frustrating habit of blowing up compile times
   
   I think what would help convince me is a use-case that cannot be done with the current approach, a massive rework of the entire execution stack in the interest of ergonomics is a bit of a tough sell imo... That being said if people are happy to undertake this work, I'm not going to stand in the way of that :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] xudong963 commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

xudong963 commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1093974086

   Though they are all tree based structures, each module has some qualities of each module itself. The code style, although important, but not the most important :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262630212

   > I wonder if we could extend PhysicalExpr to require `PartialEq`
   > 
   > So like change
   > 
   > https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/physical-expr/src/physical_expr.rs#L37
   > 
   > to be
   > 
   > ```rust
   >  pub trait PhysicalExpr: Send + Sync + Display + Debug + PartialEq { 
   > ```
   > 
   > This would certainly still require code churn downstream (to implement PartialEq) but it wouldn't be nearly as large as changing to an `enum`
   
   Thanks for the suggestion. I can try this way first. 
   But since Logical Expressions are already enums,  why we have the Physical Exprs stick to Trait. In the current code base, there are only a few physical optimization rules,  I think when people
   try to add more and more optimization rules, we will see more inconvenience to the Trait.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] thinkharderdev commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

thinkharderdev commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262936765

   One thing to consider is that by making `ExecutionPlan` an enum we still end up having to have a trait for extension plans (something that is quite important for us) so you end up having many of the same problems because you end up with many of the problems of both approaches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

alamb commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1260738813

   If I was going to change the style of Expr, LogicalPlan and ExecutionPlan I would make ExectionPlan an enum for the reasons that @mingmwang  explain, and have one variant that allowed for extension:
   
   ```rust
   enumExecutionPlan {
     ...
     Extension {
       extension: Arc<dyn ExecutionPlanExtension>,
     }
     ... 
   ```
   
   Similar to `LogicalPlan::Extension`: https://github.com/apache/arrow-datafusion/blob/81b5794/datafusion/expr/src/logical_plan/plan.rs#L101-L102
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1094950365

   > with_new_children_if_necessary
   
   FWIW you could use an associated trait function (or a free function as the PR does)
   
   ```
   fn with_new_children_if_necessary(s: Arc<dyn ExecutionPlan>, children: Vec<Arc<dyn ExecutionPlan>>) -> Result<Arc<dyn ExecutionPlan>>;
   ```
   
   Which can be invoked as `ExecutionPlan::with_new_children_if_necessary(s, children)`. Whilst perhaps not as ergonomic as `s.with_new_children_if_necessary(children)`, its effectively a constructor so... :shrug:  
   
   > There is other pitfall to use Trait Objects, for example:
   
   I think this is a tad disingenuous, this is a pitfall of using pointer equality on fat pointers, of which trait objects are just one type. If you wish to perform pointer equality, as opposed to actually implementing PartialEq, you should thin the pointers first. I don't think it is fair to say this is an issue with trait objects, but rather one of the many pitfalls of using raw pointers.
   
   > That means the original type which implements the ExecutionPlan Trait can not hold any non-static references
   
   This limitation would still exist in the enumeration approach, unless a lifetime parameter were introduced into the enum type definition, which I think would be a very tough sell.
   
   > And we have to implement the as_any() method for all the Trait implementations, so that in the places where the trait
   is consumed, it can be downcast to the original type.
   
   In most cases I suspect you are downcasting to a single concrete type in which case
   
   ```
   if let Some(foo) = plan.as_any().downcast_ref::<Sum>() {
   
   }
   ```
   
   vs
   
   ```
   if let Sum(foo) = &plan {
   
   }
   ```
   
   Is not a major difference imo... The major pain comes when matching multiple types, in many cases I think this would be better avoided by lifting into a member function on the trait, but lets say there is some use-case where this is not possible. Perhaps we could introduce an enum like
   
   ```
   enum ConcreteExecutionPlan<'a> {
     Sum(&'a Sum),
     Max(&'a Max),
     ...
   }
   ```
   
   And then add something like
   
   ```
   fn downcast(&dyn ExecutionPlan) -> Option<ConcreteExecutionPlan<'_>> {}
   ```
   
   I dunno, I'm not really sure I understand the use-case for enumerating the concrete ExecutionPlan types...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1096398500

   Aah yes, I missed off the `where Self: Sized` that marks it as [non-dispatchable](https://doc.rust-lang.org/reference/items/traits.html#object-safety). [Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=2d882594817563f2f2c94ae62c38cea2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

andygrove commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1276195254

   I created a new epic issue for making `Expr` consistent with `LogicalPlan`: https://github.com/apache/arrow-datafusion/issues/3807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] yahoNanJing commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

yahoNanJing commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1261832504

   Hi @alamb, @thinkharderdev, @mingmwang, I just created another issue #3651 to deal with this. And I will work on this in next few days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262662676

   Let me do some tests tomorrow. Anyway, I like the discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262625793

   > I wonder if we could extend PhysicalExpr to require PartialEq
   
   PartialEq is not object-safe, it takes Self, so this isn't possible (I don't think)
   
   > But if the expressions are Trait, because the type is unknown, how to write code to compare the expressions?
   
   `Expr` is a good example of where Rust's support for structured matching is fantastic, and I would definitely not propose switching `BinaryExpr` to be a trait object. 
   
   Is `ExecutionPlan` a similar use-case though? Most places it gets used actively don't care what variant is used? Is it possible that the fact they are represented differently is actually a strength, and represents the differing requirements of the different representations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1091202273

   @alamb @yjshen @andygrove @thinkharderdev  @liukun4515 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] thinkharderdev commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

thinkharderdev commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1092720791

   I agree with @alamb that if we want to go for consistency then it would be better to make `LogicalPlan` a trait rather than to make `ExecutionPlan` and enum. Since this is a potential extension point it seems a little awkward to model it as a sum type where we have add an additional layer of indirection for defining user-defined physical/logical plan nodes. In our project, we do create a custom `UserDefinedLogicalNode` and an associated custom `ExecutionPlan` and it feels much more ergonomic to create the custom `ExecutionPlan` than it does to create the `UserDefinedLogicalNode`. 
   
   That said, in things like serde logic where we need to handle all possible variants it is nicer (and will be even nicer still once `deref patterns` lands :)) to be able to pattern match all the variants rather than do the long chain of "if downcast_ref" conditions, so I don't feel strongly either way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262691588

   Not saying it is a good idea, I'm not sure of why you would be doing this on an plan node, but you can do something like this and implement `PartialEq<&dyn Any>`
   
   https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1907b2dd3e335e2395487e0a5cbc63e4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

alamb commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262615820

   I wonder if we could extend PhysicalExpr to require `PartialEq`
   
   So like change
   
   https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/physical-expr/src/physical_expr.rs#L37
   
   to be 
   
   ```rust
    pub trait PhysicalExpr: Send + Sync + Display + Debug + PartialEq { 
   ```
   
   This would certainly still require code churn downstream (to implement PartialEq) but it wouldn't be nearly as large as changing to an `enum`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] jackwener commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

jackwener commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1091208594

   > But the Expr enum doesn't wrap the expression structs and define the different expressions directly in the enum.
   
   I think that it may be due to keep similar with the `sqlparser-rs`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] jackwener commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

jackwener commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1091202907

   > And for physical ExecutionPlan, it is Trait/Trait Objects, I would prefer to use Enum also
   
   I also agree this point. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1094883368

   Besides the code style,  the reason that I prefer to use Enum than Trait/Trait Objects is because Trait is simple but Trait Object is a quite complex concept in Rust and not all the Traits can be used as Trait Objects: the return type of the Trait can not be the 'Self' and there should be no generic types parameters.
   
   For example in this PR [https://github.com/apache/arrow-datafusion/pull/2168](https://github.com/apache/arrow-datafusion/pull/2168),  the new method `with_new_children_if_necessary` can not be implemented as a Trait method
   
   ````
   /// Can not compile
   fn with_new_children_if_necessary(
       self: Arc<Self>,
       children: Vec<Arc<dyn ExecutionPlan>>,
   ) -> Result<Arc<dyn ExecutionPlan>> {
       if children.len() != self.children().len() {
           Err(DataFusionError::Internal(
               "Wrong number of children".to_string(),
           ))
       } else if children.is_empty()
           || children
               .iter()
               .zip(self.children().iter())
               .any(|(c1, c2)| !Arc::ptr_eq(c1, c2))
       {
           self.with_new_children(children)
       } else {
           Ok(self)
       }
   }
   
   ````
   
   ````
   /// Can not compile 
   fn with_new_children_if_necessary(
       &self,
       children: Vec<Arc<dyn ExecutionPlan>>,
   ) -> Result<Arc<dyn ExecutionPlan>> {
       if children.len() != self.children().len() {
           Err(DataFusionError::Internal(
               "Wrong number of children".to_string(),
           ))
       } else if children.is_empty()
           || children
               .iter()
               .zip(self.children().iter())
               .any(|(c1, c2)| !Arc::ptr_eq(c1, c2))
       {
           self.with_new_children(children)
       } else {
           Ok(Arc::new(self.clone()))
       }
   }
   
   ````
   
   ````
   /// Can not compile 
   fn with_new_children_if_necessary(
       self: Arc<Self>,
       children: Vec<Arc<dyn ExecutionPlan>>,
   ) -> Result<Arc<Self>> {
       if children.len() != self.children().len() {
           Err(DataFusionError::Internal(
               "Wrong number of children".to_string(),
           ))
       } else if children.is_empty()
           || children
               .iter()
               .zip(self.children().iter())
               .any(|(c1, c2)| !Arc::ptr_eq(c1, c2))
       {
           self.with_new_children(children)
       } else {
           Ok(self)
       }
   }
   
   ````
   
   And we have to implement the as_any() method for all the Trait implementations, so that in the places where the trait 
   is consumed, it can be downcast to the original type.  And the Any Trait is qualified with 'static lifecycle. That means the original type which implements the ExecutionPlan Trait can not hold any non-static references, the Struct that implements the Trait must be owned type. 
   
   ````
    fn as_any(&self) -> &dyn Any;
   
   ````
   
   There is other pitfall to use Trait Objects, for example:
   [https://stackoverflow.com/questions/67109860/how-to-compare-trait-objects-within-an-arc](https://stackoverflow.com/questions/67109860/how-to-compare-trait-objects-within-an-arc)
   
   I searched the internet and find this blog,  I think the points are quite valid.
   https://bennetthardwick.com/dont-use-boxed-trait-objects-for-struct-internals/
   
   From extension point of view, using Enum is not that straightforward compared with Trait, but I think we can follow the same way as the LogicalPlan, just like the `UserDefinedLogicalNode` we already have for the LogicalPlan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] thinkharderdev commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

thinkharderdev commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262945356

   > @tustvold One example here to compare two expressions are equal or not. If the expressions are purely Enum/Struct, I can easily use '==' to do the comparing, just like any other OO programming languages.
   > 
   > ```
   >      let expr1 = Expr::BinaryExpr {
   >             left: Box::new(lit(1)),
   >             op: Operator::Plus,
   >             right: Box::new(lit(2)),
   >       };
   > 
   >       let expr2 = Expr::BinaryExpr {
   >             left: Box::new(lit(2)),
   >             op: Operator::Minus,
   >             right: Box::new(lit(1)),
   >       };
   > 
   >       if expr1 == expr2 {
   >         // do something
   >       }
   > ```
   > 
   > But if the expressions are Trait, because the type is unknown, how to write code to compare the expressions? The expressions are also tree like structures and we have to downcast and destructuring and to do the comparing recursively.
   > 
   > ```
   > fn traverse_expression_and_compare(expr1: dyn PhysicalExpr,  expr2: dyn PhysicalExpr) -> bool {
   >   if expr1.as_any().downcast_ref<BinaryExpr>.is_some() 
   >     &&  expr2.as_any().downcast_ref<BinaryExpr>.is_some() {
   >       let binary_expr1 =  expr1.as_any().downcast<BinaryExpr>;
   >       let binary_expr2 =  expr2.as_any().downcast<BinaryExpr>;
   >       return binary_expr1.op == binary_expr2.op && traverse_expression_and_compare(binary_expr1.left, binary_expr2.left)
   >       && traverse_expression_and_compare(binary_expr1.right, binary_expr2.right)
   >     }
   >  if expr1.as_any().downcast_ref<CaseExpr>.is_some() &&  expr2.as_any().downcast_ref<CaseExpr>.is_some() {
   >      ..............................
   >   }
   >      ...........................................
   >      ..............
   > }
   > 
   > pub struct CaseExpr {
   >     /// Optional base expression that can be compared to literal values in the "when" expressions
   >     expr: Option<Arc<dyn PhysicalExpr>>,
   >     /// One or more when/then expressions
   >     when_then_expr: Vec<WhenThen>,
   >     /// Optional "else" expression
   >     else_expr: Option<Arc<dyn PhysicalExpr>>,
   > }
   > ```
   > 
   > Maybe thousands of lines of code just to compare the equality.
   
   Couldn't we just add a method to `PhysicalExpr`?
   
   ```
   trait PhysicalExpr {
    fn eq(other: Arc<dyn PhysicalExpr>) -> bool;
   }
   ```
   
   if `PartialEq` isn't object-safe?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262540503

   @alamb @thinkharderdev @yahoNanJing 
   
   I'm afraid the 'PhysicalExpr' Trait need to be changed to Enum also. 
   A major problem is because it is trait, we do not have a way to compare or check whether two 'PhysicalExpr'
   equal or not.  
   In the physical plan phase, we need to check a plan's required children distribution and the current children's output partition, if the distribution is not satisfied, we should add the RepartitionExec, if it is satisfied, RepartitionExec can be
   skipped.  Currently, because the PhysicalExpr is a Trait, it blocks some important optimizations.
   
   ````
   pub trait PhysicalExpr: Send + Sync + Display + Debug {
       /// Returns the physical expression as [`Any`](std::any::Any) so that it can be
       /// downcast to a specific implementation.
       fn as_any(&self) -> &dyn Any;
       /// Get the data type of this expression, given the schema of the input
       fn data_type(&self, input_schema: &Schema) -> Result<DataType>;
       /// Determine whether this expression is nullable, given the schema of the input
       fn nullable(&self, input_schema: &Schema) -> Result<bool>;
       /// Evaluate an expression against a RecordBatch
       fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>;
       /// Evaluate an expression against a RecordBatch after first applying a
       /// validity array
   ................
   ````
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

alamb commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262598954

   I agree this would cause a lot of churn. Perhaps we can add additional methods to traits to unblock whatever optimizations are needed? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262641211

   > But since Logical Expressions are already enums, why we have the Physical Exprs stick to Trait.
   
   Perhaps, because they are representations optimised for different use-cases? I'd personally argue LogicalPlan is a bit of a toss up, just take a look at all the various methods we have for walking the tree in different ways `LogicalPlan::expressions` or `LogicalPlan::inputs`, `LogicalPlan::accept`, etc... by comparison we have `ExecutionPlan::children` and we're done. Definitely not saying we should change `LogicalPlan` to be a trait object, but perhaps different approaches is fine?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1260747597

   @thinkharderdev @yahoNanJing @Dandandan @andygrove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] thinkharderdev commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

thinkharderdev commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1260927705

   > If I was going to change the style of Expr, LogicalPlan and ExecutionPlan I would make ExectionPlan an enum for the reasons that @mingmwang explain, and have one variant that allowed for extension:
   > 
   > ```rust
   > enumExecutionPlan {
   >   ...
   >   Extension {
   >     extension: Arc<dyn ExecutionPlanExtension>,
   >   }
   >   ... 
   > ```
   > 
   > Similar to `LogicalPlan::Extension`: https://github.com/apache/arrow-datafusion/blob/81b5794/datafusion/expr/src/logical_plan/plan.rs#L101-L102
   
   I think that makes sense


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1092102331

   Thank you for raising this, its really good to have these discussions. I would definitely support making the enumeration style consistent, and the expression struct pattern has the non-trivial advantage of being able to name the variant types :+1: 
   
   With respect to making `ExecutionPlan` an enumeration:
   
   * `ExecutionPlan` has a fairly large number of member functions, and having a massive `match` block for each of these seems somewhat sub-optimal
   * We would still need to have a `dyn Trait` in the topology for extensions, much like we have `UserDefinedLogicalNode` for `LogicalPlan`
   * The major advantage of an enumeration is being able to enumerate all the possible states, I'm struggling to devise a use-case that wouldn't be equally well served by a member function
   * Traits provide encapsulation by object, whereas enumerations provide encapsulation by function. Taking the example of a serialization routine, with trait objects each operator would define its serialization routine alongside its other functionality, with an enumeration you would instead define serialization for all operators in one function. It's possibly subjective but I prefer the former
   * I'm currently working on scheduling of plan execution, with an eventual hope to implement something akin to [morsel-driven parallelism](https://db.in.tum.de/~leis/papers/morsels.pdf). Currently this fudges around the interface of ExecutionPlan, but may eventually warrant non-trivial changes to it (e.g. making it less async), churning this interface multiple times in quick succession is probably something we want to avoid if possible
   
   That being said, I don't really feel strongly about any particular design, just putting the information out there :smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

alamb commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1092157017

   > And for physical ExecutionPlan, it is Trait/Trait Objects, I would prefer to use Enum also
   
   I agree this would be nice for code consistency. The fact it is a trait has always bothered me (mostly as I can't explain any good reason isn't the same as `LogicalPlan`. 
   
   I remember discussion in the past (maybe from @jorgecarleitao  and @andygrove?)  that using trait objects would allow for people to plug in their own physical plan / physical expr implementations. However, I haven't seen anyone try to do that recently (though I also haven't looked).
   
   > I think we should unify the coding style, at least the Expr and LogicalPlan representations should follow the same style.
   
   I think the rationale for extracting variants from `LogicalPlan` was that many of the enum variants had several fields and it was convenient in some places to pass a known type of LogicalPlan.
   
   I am not as convinced that adding an extra level of wrappers to `Expr` would make the code easier to work with
   
   For example, when I did out an example of what I think this proposal would lead to for `Expr::Not` and `Expr::Column` I got something like:
   
   ```rust
   struct Not(Expr);
   
   struct Column(schema::Column);
   
   enum Expr {
     ...
     Column(Column),
     Not(Not)
     ...
   }
   
   ```
   
   And I think it makes the code significantly *harder* to read and work with (as now `.0` would get sprinkled around)
   
   If we feel a deep need for consistency, I would personally prefer to expand `LogicalPlan` back to direct variants


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] yjshen commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

yjshen commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1091237149

   The LogicalPlan moved to its current style in https://github.com/apache/arrow-datafusion/issues/1228 thanks to @xudong963 and @liukun4515. I agree we could apply this style to expressions as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1095793355

   > > with_new_children_if_necessary
   > 
   > FWIW you could use an associated trait function (or a free function as the PR does)
   > 
   > ```
   > fn with_new_children_if_necessary(s: Arc<dyn ExecutionPlan>, children: Vec<Arc<dyn ExecutionPlan>>) -> Result<Arc<dyn ExecutionPlan>>;
   > ```
   > 
   > Which can be invoked as `ExecutionPlan::with_new_children_if_necessary(s, children)`. Whilst perhaps not as ergonomic as `s.with_new_children_if_necessary(children)`, its effectively a constructor so... 🤷
   > 
   
   I think this can not compile either. For Trait Objects, it requires the first params of the trait method must be the Self type or can be derefed to self
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

tustvold commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262674648

   > BTW, do you mean, it is impossible to declare a Trait and let it implements the PartialEq ?
   
   You can't do dynamic dispatch to a function that is parameterized on the concrete type - https://doc.rust-lang.org/reference/items/traits.html#object-safety
   
   > Here is a sample of how to make visit ExecutionPlan more generic and we
   can get ride of the visit pattern and tree traversing from the rule implementations
   
   I'm confused, if you're saying trait objects allow you to eliminate lots of dispatch logic, I 100% agree. Why then do you want to switch away from them?
   
   > I'm not sure why visiting a logical plan is a toss up
   
   Consider some function that takes a `LogicalPlan` and wants to perform some operation on `op()` on a set of variants `{var1...varn}`.
   
   If the set is small, both approaches are basically equivalent
   
   ```
   fn do_fn(plan: LogicalPlan) {
       match plan {
          v1 => {...}
          v2 => {...}
           _ => 
       }
   }
   ```
   
   ```
   fn do_foo(plan: dyn LogicalPlan) {
       if let Some(x) = plan.downcast_ref<v1> {
           ...
       } else if let Some(y) = plan.downcast_ref<v2> {
           ...
       }
       else {
           ...
       }
   }
   ```
   
   However, as the set of variants grows the size of the enum dispatch logic grows linearly. Meanwhile with the `dyn LogicalPlan` we can "lift" op onto the trait as a method. 
   
   ```
   pub trait LogicalPlan {
       fn foo() {
           ...
       }
   }
   ```
   
   We get the dispatch logic for free :tada:
   
   What I'm trying to suggest is that whilst structured matching is great, you pay for it by having to implement the dispatch logic yourself. You can hide it using visitor patterns, or using macros, but it never reaches the ease of trait objects where it is generated automatically. This means a one-size fits-all approach, may not make sense
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1260323719

   There are other downsides with Trait/Trait Objects, we can not use pattern-matching and destructuring with Trait Objects. Pattern-matching and destructuring are the most important feature of FP and they are valuable for optimizer rules implementation.
   
   https://stackoverflow.com/questions/26126683/how-to-match-trait-implementors
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Posted by GitBox <gi...@apache.org>.

mingmwang commented on issue #2175:
URL: https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262648962

   > > I wonder if we could extend PhysicalExpr to require PartialEq
   > 
   > PartialEq is not object-safe, it takes Self, so this isn't possible (I don't think)
   > 
   > > But if the expressions are Trait, because the type is unknown, how to write code to compare the expressions?
   > 
   > `Expr` is a good example of where Rust's support for structured matching is fantastic, and I would definitely not propose switching `BinaryExpr` to be a trait object 👍
   > 
   > Is `ExecutionPlan` a similar use-case though? Most places it gets used currently don't care what variant is used, and I don't see any that are performing nested matching? This could be because it would currently be obnoxious, but is it possible that the fact they are represented differently is actually a strength, and represents the differing requirements of the different representations?
   > 
   > > The expressions are also tree like structures and we have to downcast and destructuring and to do the comparing recursively.
   > 
   > This is actually one of the weaknesses of the enum approach, type erasure has to be implemented via a visitor which makes walking the tree significantly more complicated. If you want to match a specific expression, it works well, if you want to do anything more generic it becomes incredibly painful.
   > 
   > You are therefore left with a tradeoff based on the common use-case, for BinaryExpr the enumeration is clearly superior, LogicalPlan I'd argue its a bit of a toss up, I have found the lack of type-erasure rather obnoxious at times, for ExecutionPlan, eh...
   
   I'm not sure why visiting a logical plan is a toss up. Here is a sample of how to make visit ExecutionPlan more generic and we
   can get ride of the visit pattern and tree traversing from the rule implementations
   
   ```` 
   plan_utils.rs
   
   pub fn map_children<F>(
       plan: Arc<dyn ExecutionPlan>,
       transform: F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Result<Arc<dyn ExecutionPlan>>,
   {
       if !plan.children().is_empty() {
           let new_children: Result<Vec<_>> =
               plan.children().into_iter().map(transform).collect();
           with_new_children_if_necessary(plan, new_children?)
       } else {
           Ok(plan)
       }
   }
   
   pub fn transform<F>(
       plan: Arc<dyn ExecutionPlan>,
       op: &F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
   {
       transform_down(plan, op)
   }
   
   pub fn transform_down<F>(
       plan: Arc<dyn ExecutionPlan>,
       op: &F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
   {
       let plan_cloned = plan.clone();
       let after_op = match op(plan_cloned) {
           Some(value) => value,
           None => plan,
       };
       map_children(after_op.clone(), |plan: Arc<dyn ExecutionPlan>| {
           transform_down(plan, op)
       })
   }
   
   pub fn transform_up<F>(
       plan: Arc<dyn ExecutionPlan>,
       op: &F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
   {
       let plan_cloned = plan.clone();
       let after_op_children = map_children(plan_cloned, |plan: Arc<dyn ExecutionPlan>| {
           transform_down(plan, op)
       });
       let new_plan = match op(after_op_children?) {
           Some(value) => value,
           None => plan,
       };
       Ok(new_plan)
   }
   
   --------------------------------------------------
   impl PhysicalOptimizerRule for CoalesceBatches2 {
       fn optimize(
           &self,
           plan: Arc<dyn crate::physical_plan::ExecutionPlan>,
           _config: &crate::execution::context::SessionConfig,
       ) -> Result<Arc<dyn crate::physical_plan::ExecutionPlan>> {
           transform(plan, &apply_patterns(self.target_batch_size))
       }
   
       fn name(&self) -> &str {
           "coalesce_batches2"
       }
   }
   
   fn apply_patterns(
       target_batch_size: usize,
   ) -> impl Fn(
       Arc<dyn crate::physical_plan::ExecutionPlan>,
   ) -> Option<Arc<dyn crate::physical_plan::ExecutionPlan>> {
       move |plan| {
           let plan_any = plan.as_any();
           let wrap_in_coalesce = plan_any.downcast_ref::<FilterExec>().is_some()
               || plan_any.downcast_ref::<HashJoinExec>().is_some()
               || plan_any.downcast_ref::<RepartitionExec>().is_some();
           if wrap_in_coalesce {
               Some(Arc::new(CoalesceBatchesExec::new(
                   plan.clone(),
                   target_batch_size,
               )))
           } else {
               None
           }
       }
   }
   
   ```` 
   
   The same utils can be implemented for Enums/LogicalPlans I think.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org