You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by SHI BEI <sh...@foxmail.com> on 2023/01/11 02:34:02 UTC

Predicate Pushdown/Arrow-rs Usage Question

Hi arrow community,




I'm new to the arrow project and am trying to use arrow and parquet in a C/C++ project. To improve the query peformance, I plan to take the advantage of parquet row-group level and page level statistics when querying data, but GLib/C++ SDK is lack of implement for parquet predicates pushdown. I have noticed that some works are in process to support parquet predicates pushdown, but it will take some time. So I want to know whether if it's possible to use arrow-rs instead, and is there any one have some pricate in the same&nbsp;scene. Any one can help will be appricated!



SHI&nbsp;BEI
shibei.lh@foxmail.com

Re: Predicate Pushdown/Arrow-rs Usage Question

Posted by Adam Lippai <ad...@rigo.sk>.
Row group level predicate pushdowns should be supported in both C++ and
Rust. What’s the use case / query you want to speed up?

Page index and bloom filters are brand new and low level in arrow-rs, but
there is support for them. AFAIK C++ doesn’t have full standard coverage
for either.

Best regards,
Adam Lippai

On Tue, Jan 10, 2023 at 9:35 PM SHI BEI <sh...@foxmail.com> wrote:

> Hi arrow community,
>
>
>
>
> I'm new to the arrow project and am trying to use arrow and parquet in a
> C/C++ project. To improve the query peformance, I plan to take the
> advantage of parquet row-group level and page level statistics when
> querying data, but GLib/C++ SDK is lack of implement for parquet predicates
> pushdown. I have noticed that some works are in process to support parquet
> predicates pushdown, but it will take some time. So I want to know whether
> if it's possible to use arrow-rs instead, and is there any one have some
> pricate in the same&nbsp;scene. Any one can help will be appricated!
>
>
>
> SHI&nbsp;BEI
> shibei.lh@foxmail.com

Re: Predicate Pushdown/Arrow-rs Usage Question

Posted by Raphael Taylor-Davies <r....@googlemail.com.INVALID>.
Hi Shi

Arrow-rs has full support for predicate pushdown and late materialisation. You can find some more information about it here [1]

You can possibly also use DataFusion for inspiration [2]

Feel free to get in touch should you run into any issues

Kind Regards,

Raphael

[1]: https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/
[2]: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/file_format/parquet.rs

On 11 January 2023 03:34:02 CET, SHI BEI <sh...@foxmail.com> wrote:
>Hi arrow community,
>
>
>
>
>I'm new to the arrow project and am trying to use arrow and parquet in a C/C++ project. To improve the query peformance, I plan to take the advantage of parquet row-group level and page level statistics when querying data, but GLib/C++ SDK is lack of implement for parquet predicates pushdown. I have noticed that some works are in process to support parquet predicates pushdown, but it will take some time. So I want to know whether if it's possible to use arrow-rs instead, and is there any one have some pricate in the same&nbsp;scene. Any one can help will be appricated!
>
>
>
>SHI&nbsp;BEI
>shibei.lh@foxmail.com