You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/10/12 23:23:00 UTC

[jira] [Commented] (ORC-1027) Filter processing to allow filter injections that cannot be represented via SArgs

    [ https://issues.apache.org/jira/browse/ORC-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427955#comment-17427955 ] 

Dongjoon Hyun commented on ORC-1027:
------------------------------------

Thank you, [~planka].

> Filter processing to allow filter injections that cannot be represented via SArgs
> ---------------------------------------------------------------------------------
>
>                 Key: ORC-1027
>                 URL: https://issues.apache.org/jira/browse/ORC-1027
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>    Affects Versions: 1.7.0, 1.8.0
>            Reporter: Pavan Lanka
>            Assignee: Pavan Lanka
>            Priority: Major
>
> Currently in the ORCRecordReader the filter logic that perform LazyIO receives the following inputs:
>  * SearchArgument as passed by the client using `{color:#ff0000}Reader.Options.getSearchArgument{color}`
>  * Input filter as passed by the client using `{color:#ff0000}Reader.Options.getFilterCallback{color}`
> The SearchArgument is particularly convenient in allowing for easy integration with the existing engines such as Spark without necessitating any code changes on the engine. However this push down is limited to what can be represented via SearchArguments as an example if we take any predicate that uses a function this cannot be pushed down.
> {quote}SELECT * FROM table WHERE {color:#ff0000}lower{color}(f1) IN ... {color:#FF0000}OR{color} f2 IN ... {color:#FF0000}OR{color} f3 IN ...
> {quote}
> For the above query none of the filters are pushed down to ORC from the engine as we have no means for representing Functions and the use of OR to combine the predicates.
> An additional input mechanism is requested for supplying filters that is plugable without requiring a change in the clients directly. We are proposing the use of Java **ServiceLoader** to dynamically determine the desired filters for a given fully qualified file path.
> This filter if determined is applied as an AND in conjunction with the other available filters. It is understood that the plugin filter cannot differentiate multiple aliases for the same table.
> This generic capability will allow us to represent complex filters that currently cannot be pushed down to the storage layer from the existing engines allowing us to reap the benefits of LazyIO in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)