You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@orc.apache.org by "Richard Zhang (Jira)" <ji...@apache.org> on 2019/12/14 21:50:00 UTC

[jira] [Assigned] (ORC-577) Allow row-level filtering

     [ https://issues.apache.org/jira/browse/ORC-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Zhang reassigned ORC-577:
---------------------------------

    Assignee: Richard Zhang

> Allow row-level filtering
> -------------------------
>
>                 Key: ORC-577
>                 URL: https://issues.apache.org/jira/browse/ORC-577
>             Project: ORC
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Assignee: Richard Zhang
>            Priority: Major
>
> Currently, ORC filters at three levels:
>  * File level
>  * Stripe (64 to 256mb) level
>  * Row group (10k row) level
> The filters are specified as Sargs (Search Arguments), which have a relatively small vocabulary. Furthermore, they only filter sets of rows if they can guarantee that none of the rows can pass the filter.
> There are some use cases where the user needs to read a subset of the columns and apply more detailed row level filters. I'd suggest that we add a new method in Reader.Options
> {{setFilter(String columnNames, Predicate<VectorizedRowBatch> filter)}}
> Where the columns named in columnNames are read expanded first, then the filter is run and the rest of the data is read only if the predicate returns true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)