You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ianmcook (via GitHub)" <gi...@apache.org> on 2023/02/01 15:59:32 UTC

[GitHub] [arrow] ianmcook opened a new issue, #33985: [C++] Support specifying filters and projections with Substrait expressions

ianmcook opened a new issue, #33985:
URL: https://github.com/apache/arrow/issues/33985

   ### Describe the enhancement requested
   
   In addition to representing full plans, Substrait can also be used to represent expressions (see https://github.com/substrait-io/substrait/pull/405/). It would be nice if Acero could consume Substrait expressions and use them to specify filters and projections. 
   
   I would love to see us expose functions that:
   - Receive a Boolean-valued Substrait scalar expression and use it to add a Filter node to the ExecPlan
   - Receive a list of Substrait scalar expressions and use it to add a Project node to the ExecPlan
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #33985: [C++] Support specifying filters and projections with Substrait expressions

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #33985:
URL: https://github.com/apache/arrow/issues/33985#issuecomment-1435356988

   You would need to use [this variant](https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset14ScannerBuilder7ProjectENSt6vectorIN7compute10ExpressionEEENSt6vectorINSt6stringEEE) of Project() but otherwise yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ianmcook commented on issue #33985: [C++] Support specifying filters and projections with Substrait expressions

Posted by "ianmcook (via GitHub)" <gi...@apache.org>.
ianmcook commented on issue #33985:
URL: https://github.com/apache/arrow/issues/33985#issuecomment-1435319522

   @westonpace with the implementation you envision, could this also give us the ability to pass Substrait expressions to [`arrow::dataset::ScannerBuilder::Filter()`](https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset14ScannerBuilder6FilterERKN7compute10ExpressionE) and [`arrow::dataset::ScannerBuilder::Project()`](https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset14ScannerBuilder7ProjectENSt6vectorINSt6stringEEE)? That would be very, very cool indeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #33985: [C++] Support specifying filters and projections with Substrait expressions

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #33985:
URL: https://github.com/apache/arrow/issues/33985#issuecomment-1412409926

   The Arrow equivalent of Substrait's expression is `arrow::compute::Expression`.  So I think an extended expression proto file would roughly translate to `std::vector<std::pair<std::string, arrow::compute::Expression>>` (not actually suggesting we use this API, just describing).
   
   If we added an API for that then those compute expressions could be used when building Acero filters & projects.
   
   Note: this sort of implies the user is not using Substrait to express their actual queries.  This is fine, we have non-Substrait APIs for filter & project in pyarrow (e.g. Array.filter) and R (dplyr) so there is certainly room for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace closed issue #33985: [C++] Support specifying filters and projections with Substrait expressions

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace closed issue #33985: [C++] Support specifying filters and projections with Substrait expressions
URL: https://github.com/apache/arrow/issues/33985


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ianmcook commented on issue #33985: [C++] Support specifying filters and projections with Substrait expressions

Posted by "ianmcook (via GitHub)" <gi...@apache.org>.
ianmcook commented on issue #33985:
URL: https://github.com/apache/arrow/issues/33985#issuecomment-1412377054

   This plus https://github.com/substrait-io/substrait-java/issues/128 would allow users to specify filters and projections as SQL expressions and execute them with Acero.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org