You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Panagiotis Garefalakis (Jira)" <ji...@apache.org> on 2021/06/04 20:54:00 UTC

[jira] [Comment Edited] (ORC-743) Conversion of SArg into Filters, to take advantage of LazyIO

    [ https://issues.apache.org/jira/browse/ORC-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357616#comment-17357616 ] 

Panagiotis Garefalakis edited comment on ORC-743 at 6/4/21, 8:53 PM:
---------------------------------------------------------------------

Sounds great [~planka] ! Lets make sure we push the doc as well on the last impl ticket.
Will keep an eye to get these pushed ;) 

bq. Panagiotis Garefalakis and Owen O'Malley any suggestions on how to initiate that? For now the PR will assume that  HIVE-24458 is missing.

Oh wow, we will need a new storage-api release for this one.
Do you mind sending a mail on the hive-dev list? I can +1 on that but we will need a PMC to initiate the release.


was (Author: pgaref):
Sounds great [~planka] ! Lets make sure we push the doc as well on the last impl ticket.
Will keep an eye to get these pushed ;) 

> Conversion of SArg into Filters, to take advantage of LazyIO
> ------------------------------------------------------------
>
>                 Key: ORC-743
>                 URL: https://issues.apache.org/jira/browse/ORC-743
>             Project: ORC
>          Issue Type: Sub-task
>          Components: Reader
>            Reporter: Pavan Lanka
>            Assignee: Pavan Lanka
>            Priority: Major
>
> ORC-742 introduces lazy evaluation of the non-filter columns in the presence of filters. This builds further on that to convert SArg into filters.
> h3. SArg to Filter
> SArg to Filter converts the passed SArg into a filter. This enables automatic compatibility with both Spark and Hive as they already push down Search Arguments down to ORC.
> The SArg is automatically converted into a Vector Filter. Which is applied during the read process.
> The builder for search argument should allow skipping normalization during the [build|https://github.com/apache/hive/blob/storage-branch-2.7/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java#L491]. This has already been proposed as part of HIVE-24458.
> Normalization is very poor in performance in the presence of multilevel predicates.
> ||Benchmark||(fSize)||(fType)||(normalize)||Mode||Cnt||Score||Error||Units||
> |ComplexFilterBench.filter|2|vector|true|avgt|20|74.321|± 0.156|us/op|
> |ComplexFilterBench.filter|2|vector|false|avgt|20|78.119|± 0.351|us/op|
> |ComplexFilterBench.filter|4|vector|true|avgt|20|267.405|± 1.202|us/op|
> |ComplexFilterBench.filter|4|vector|false|avgt|20|136.284|± 0.637|us/op|
> |ComplexFilterBench.filter|8|vector|true|avgt|20|9907.765|± 49.208|us/op|
> |ComplexFilterBench.filter|8|vector|false|avgt|20|247.714|± 0.651|us/op|
> Explanation:
>  * *fSize* identifies the size of the OR clause that will be normalized.
>  * *normalize* identifies whether normalize was carried out on the Search Argument.
> Observations:
>  * Normalizing the search argument results in a significant performance penalty given the explosion of the operator tree
>  ** In case where an AND includes 8 ORs, the unnormalized version is faster by *97.32%*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)