You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (JIRA)" <ji...@apache.org> on 2019/05/14 08:21:00 UTC

[jira] [Updated] (SPARK-27698) Add new method for getting pushed down filters in Parquet file reader

     [ https://issues.apache.org/jira/browse/SPARK-27698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gengliang Wang updated SPARK-27698:
-----------------------------------
    Description: 
To return accurate pushed filters in Parquet file scan(https://github.com/apache/spark/pull/24327#pullrequestreview-234775673), we can process the original data source filters in the following way:
1. For "And" operators, split the conjunctive predicates and try converting each of them. After that
1.1 if partially predicate pushed down is allowed, return convertible results; 
1.2 otherwise, return the whole predicate if convertible, or empty result if not convertible.

2. For other operators, they are not able to be partially pushed down. 
2.1 if the entire predicate is convertible, return itself
2.2 otherwise, return an empty result.

This PR also contains code refactoring. Currently `ParquetFilters. createFilter ` accepts parameter `schema: MessageType` and create field mapping for every input filter. We can make it a class member and avoid creating the `nameToParquetField` mapping for every input filter.

  was:
To return accurate pushed filters in Parquet file scan(https://github.com/apache/spark/pull/24327#pullrequestreview-234775673), we can process the original data source filters in the following way:
1. For "And" operators, split the conjunctive predicates and try converting each of them. After that
:
1.1 if partially predicate pushed down is allowed, return convertible results; 
1.2 otherwise, return the whole predicate if convertible, or empty result if not convertible.

2. For other operators, it is either entirely pushed down, or not pushed down. In the current push down strategy, the "Non-And" operators are not able to be partially pushed down.


> Add new method for getting pushed down filters in Parquet file reader
> ---------------------------------------------------------------------
>
>                 Key: SPARK-27698
>                 URL: https://issues.apache.org/jira/browse/SPARK-27698
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> To return accurate pushed filters in Parquet file scan(https://github.com/apache/spark/pull/24327#pullrequestreview-234775673), we can process the original data source filters in the following way:
> 1. For "And" operators, split the conjunctive predicates and try converting each of them. After that
> 1.1 if partially predicate pushed down is allowed, return convertible results; 
> 1.2 otherwise, return the whole predicate if convertible, or empty result if not convertible.
> 2. For other operators, they are not able to be partially pushed down. 
> 2.1 if the entire predicate is convertible, return itself
> 2.2 otherwise, return an empty result.
> This PR also contains code refactoring. Currently `ParquetFilters. createFilter ` accepts parameter `schema: MessageType` and create field mapping for every input filter. We can make it a class member and avoid creating the `nameToParquetField` mapping for every input filter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org