You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/30 13:59:46 UTC

[GitHub] [arrow] NoahFournier opened a new issue, #14782: Push-down filtering in Java

NoahFournier opened a new issue, #14782:
URL: https://github.com/apache/arrow/issues/14782

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   Hey all,
   
   I'm exploring adding push-down filtering to the Java Datasets API, as it currently only support push-down projection.
   Just had a couple of questions before I start this work:
   
   - Is this planned on any current roadmap? I'd like to avoid duplicating work that's already begun
   - Does anyone have any smart ideas as to how to pass the filter predicate into ScanOptions? As far as I can tell, the only complication to make this happen would be figuring out some way to call into the Expression object from Java without creating future maintenance effort.
   
   Thanks!   
   
   ### Component(s)
   
   C++, Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] NoahFournier commented on issue #14782: [Java] Push-down filtering in Java

Posted by "NoahFournier (via GitHub)" <gi...@apache.org>.
NoahFournier commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1436739862

   Agreed @lidavidm - Something like https://github.com/substrait-io/substrait-java/issues/128 would provide the necessary functionality here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ianmcook commented on issue #14782: [Java] Push-down filtering in Java

Posted by "ianmcook (via GitHub)" <gi...@apache.org>.
ianmcook commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1730033866

   https://github.com/apache/arrow/issues/34252 is now complete, and the Arrow Java Datasets API now supports pushdown projection and filtering using Substrait expressions.
   
   More details here: https://github.com/apache/arrow/blob/main/docs/source/java/dataset.rst#projection-produce-new-columns-and-filters
   
   Example here: https://github.com/apache/arrow/blob/main/docs/source/java/substrait.rst#executing-projections-and-filters-using-extended-expressions
   
   However: this capability is not really ready for practical applications yet, because we do not yet have any user-friendly tools to create Substrait expressions in Java. I hope we can achieve that in https://github.com/substrait-io/substrait-java/issues/128.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ianmcook commented on issue #14782: [Java] Push-down filtering in Java

Posted by "ianmcook (via GitHub)" <gi...@apache.org>.
ianmcook commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1435687366

   Ok, thanks. To clarify: my question is not about Substrait _plans_, it's about Substrait _expressions_ which we now have a way to represent independent of plans. I opened a separate issue to request this feature: https://github.com/apache/arrow/issues/34252


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] NoahFournier commented on issue #14782: Push-down filtering in Java

Posted by GitBox <gi...@apache.org>.
NoahFournier commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1333885994

   Thanks for your help @lidavidm - that parsing PR looks like exactly what I wanted!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #14782: [Java] Push-down filtering in Java

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1435369372

   I think this is just about having a convenient user-facing API to the existing Dataset functionality, a Substrait API to Acero in Java would be a separate project


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #14782: Push-down filtering in Java

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1332341917

   CC @lwhite1 @davisusanibar 
   
   An alternative might be to accept Substrait plans instead, binding to Acero as a whole instead of just Dataset. This would give you the full power of the query engine and avoid having to create bindings to individual C++ components. But of course this is more complex and Substrait is a bit of a moving target.
   
   Another alternative is to track the discussion about a text format for expressions. See https://lists.apache.org/thread/7vch27t3gfz1hmv7d8w69n50gfc1nswf and #14287.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #14782: [Java] Push-down filtering in Java

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1435716382

   I think without a convenient API to build Substrait expressions in Java, it'd still not quite meet the goals here right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ianmcook commented on issue #14782: Push-down filtering in Java

Posted by "ianmcook (via GitHub)" <gi...@apache.org>.
ianmcook commented on issue #14782:
URL: https://github.com/apache/arrow/issues/14782#issuecomment-1435327638

   There is a discussion here about passing Substrait expressions to the Dataset Project and Filter methods: https://github.com/apache/arrow/issues/33985#issuecomment-1435319522
   
   If this gets implemented in C++, can it be exposed to Java through the JNI bindings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org