You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/12/22 04:25:00 UTC

[jira] [Assigned] (SPARK-33700) Try to push down filters for parquet and orc should add extra `filters.nonEmpty` condition

     [ https://issues.apache.org/jira/browse/SPARK-33700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun reassigned SPARK-33700:
-------------------------------------

    Assignee: Yang Jie

> Try to push down filters for parquet and orc should add extra `filters.nonEmpty` condition
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-33700
>                 URL: https://issues.apache.org/jira/browse/SPARK-33700
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Yang Jie
>            Assignee: Yang Jie
>            Priority: Minor
>
>  
> {code:java}
> lazy val footerFileMetaData =
>   ParquetFileReader.readFooter(conf, filePath, SKIP_ROW_GROUPS).getFileMetaData
> // Try to push zdown filters when filter push-down is enabled.
> val pushed = if (enableParquetFilterPushDown) {
>   val parquetSchema = footerFileMetaData.getSchema
>   val parquetFilters = new ParquetFilters(parquetSchema, pushDownDate, pushDownTimestamp,
>     pushDownDecimal, pushDownStringStartWith, pushDownInFilterThreshold, isCaseSensitive)
>   filters
>     // Collects all converted Parquet filter predicates. Notice that not all predicates can be
>     // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap`
>     // is used here.
>     .flatMap(parquetFilters.createFilter)
>     .reduceOption(FilterApi.and)
> } else {
>   None
> }
> {code}
>  
>  
> Should add extra condition `filters.nonEmpty` when try to push down filters for parquet to avoid unnecessary file reading (parquet footer), ORC has similar problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org