You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/16 13:29:43 UTC

[GitHub] [arrow-datafusion] jackwener opened a new issue #2022: Filter operator support multi expr

jackwener opened a new issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I find that current `Filter operator` just can include one `expr`.
   
   I understand it could be because `predicate` is applied one by one instead of apply together in one `tuple/batch`.
   
   But in my opinion, we should allow the  `Filter operator` include multi expr. Because it's more clear, and it don't affect the execution.
   
   After I implement this change, I will do the `merge adjacent filter`. in the future, we also can implement  `filter reorder applied` in physical execution based on this.
   
   How do you think about it? @alamb @Dandandan 
   
   **Describe the solution you'd like**
   Replace the `expr` with `vec expr`, and change some related code.
   
   **Describe alternatives you've considered**
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener closed issue #2022: Filter operator support multi expr

Posted by GitBox <gi...@apache.org>.
jackwener closed issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener edited a comment on issue #2022: Filter operator support multi expr

Posted by GitBox <gi...@apache.org>.
jackwener edited a comment on issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022#issuecomment-1073217627


   Yes, It's equivalent logically. But, Depending on the scenario, the choice of database will have a preference.
   
   For `vectorized` and `columnar storage`, It's common use the `vec<Expr>` way, because the execution layer generally executes expressions one by one.
   
   As for the `Row oriented storage`, we can apply the `conjunction` expression together in a `row`.
   
   In conclusion, this may not be a very important thing, maybe we can consider it in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on issue #2022: Filter operator support multi expr

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022#issuecomment-1073217627


   Yes, It's equivalent logically. But, Depending on the scenario, the choice of database will have a preference.
   
   For `vectorized` and `columnar storage`, It's common use the `vec<Expr>` way, because the execution layer generally executes expressions one by one.
   
   As for the `Row oriented storage`, we can apply the `conjunction` expression in a `row`  together.
   
   In conclusion, this may not be a very important thing, maybe we can consider it in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on issue #2022: Filter operator support multi expr

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022#issuecomment-1069144070


   What' more, `Filter` include the `BinaryExpr` which can be used as `Expr And Expr`. It can also have same function/ability with Vec Expr.
   
   But, I don't think it is a good way, because `BinaryExpr` is usually used for the logic operation.
   
   Put multi expr in it is easy, but it isn't easy to get Expr from it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #2022: Filter operator support multi expr

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022#issuecomment-1073215060


   FWIW I do think it is common practice for convenience in optimizers / database to represent filters as `Vec<Expr>` that logically represent expr `AND`ed together.  However, I agree that logically there is no difference to using an `AND`ed list


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #2022: Filter operator support multi expr

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022#issuecomment-1069164492


   I think the filter already support multiple predicates using `And`, `Or` etc.
   
   The planner / optimizers could be changed to use this fact better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #2022: Filter operator support multi expr

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2022:
URL: https://github.com/apache/arrow-datafusion/issues/2022#issuecomment-1073221610


   > Yes, It's equivalent logically. But, Depending on the scenario, the choice of database will have a preference.
   
   Indeed -- in fact in the filter pushdown logic, there is a function that takes apart a `AND` expression into a `Vec<Expr>` (and then there is code to put it back again). 
   
   https://github.com/apache/arrow-datafusion/blob/74bf7ab4f578edda8dcb4fe70ec43560992f5bab/datafusion/src/optimizer/filter_push_down.rs#L162-L177
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org