You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/12/22 04:25:00 UTC
[jira] [Assigned] (SPARK-33700) Try to push down filters for
parquet and orc should add extra `filters.nonEmpty` condition
[ https://issues.apache.org/jira/browse/SPARK-33700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun reassigned SPARK-33700:
-------------------------------------
Assignee: Yang Jie
> Try to push down filters for parquet and orc should add extra `filters.nonEmpty` condition
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-33700
> URL: https://issues.apache.org/jira/browse/SPARK-33700
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Yang Jie
> Assignee: Yang Jie
> Priority: Minor
>
>
> {code:java}
> lazy val footerFileMetaData =
> ParquetFileReader.readFooter(conf, filePath, SKIP_ROW_GROUPS).getFileMetaData
> // Try to push zdown filters when filter push-down is enabled.
> val pushed = if (enableParquetFilterPushDown) {
> val parquetSchema = footerFileMetaData.getSchema
> val parquetFilters = new ParquetFilters(parquetSchema, pushDownDate, pushDownTimestamp,
> pushDownDecimal, pushDownStringStartWith, pushDownInFilterThreshold, isCaseSensitive)
> filters
> // Collects all converted Parquet filter predicates. Notice that not all predicates can be
> // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap`
> // is used here.
> .flatMap(parquetFilters.createFilter)
> .reduceOption(FilterApi.and)
> } else {
> None
> }
> {code}
>
>
> Should add extra condition `filters.nonEmpty` when try to push down filters for parquet to avoid unnecessary file reading (parquet footer), ORC has similar problems.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org