You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Anoop Johnson (Jira)" <ji...@apache.org> on 2020/12/11 17:09:00 UTC

[jira] [Created] (SPARK-33760) Extend Dynamic Partition Pruning Support to DataSources

Anoop Johnson created SPARK-33760:
-------------------------------------

             Summary: Extend Dynamic Partition Pruning Support to DataSources
                 Key: SPARK-33760
                 URL: https://issues.apache.org/jira/browse/SPARK-33760
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1
            Reporter: Anoop Johnson


The implementation of Dynamic Partition Pruning  (DPP) in Spark is [specific|https://github.com/apache/spark/blob/fb2e3af4b5d92398d57e61b766466cc7efd9d7cb/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L59-L64] to HadoopFSRelation. As a result, DPP is not triggered for queries that use data sources. 

The DataSource v2 readers can expose the partition metadata. Can we use this metadata and extend DPP to work on data sources as well?

Would appreciate thoughts or corner cases we need to handle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org