You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Shen, Xiangxiang" <xi...@intel.com> on 2021/11/26 05:57:11 UTC

About support ORC Filter pushdown

Hi all,

               In arrow dataset, Filter pushdown can improve reading files performance greatly. We notice parquet has implemented, https://github.com/apache/arrow/blob/35b3567e73423420a99dbe6116f000e3c77d2a4c/cpp/src/arrow/dataset/file_parquet.cc#L465-L484.
               But ORC fileformat has not supported Filter pushdown. It ignores the "filter" of  ScanOptions now.
               Would you have plans to support ORC Filter pushdown?


Thanks!

Re: About support ORC Filter pushdown

Posted by Joris Van den Bossche <jo...@gmail.com>.
For reference, the issue opened for this is
https://issues.apache.org/jira/browse/ARROW-14890

As mentioned on the JIRA, I am not aware of direct plans to implement
this, but it would certainly be nice to have this functionality for
ORC and contributions are welcome.


On Fri, 26 Nov 2021 at 21:37, Shen, Xiangxiang
<xi...@intel.com> wrote:
>
> Hi all,
>
>                In arrow dataset, Filter pushdown can improve reading files performance greatly. We notice parquet has implemented, https://github.com/apache/arrow/blob/35b3567e73423420a99dbe6116f000e3c77d2a4c/cpp/src/arrow/dataset/file_parquet.cc#L465-L484.
>                But ORC fileformat has not supported Filter pushdown. It ignores the "filter" of  ScanOptions now.
>                Would you have plans to support ORC Filter pushdown?
>
>
> Thanks!