You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Haifeng Chen (Jira)" <ji...@apache.org> on 2020/01/06 03:07:00 UTC

[jira] [Created] (SPARK-30425) FileScan of Data Source V2 doesn't implement Partition Pruning

Haifeng Chen created SPARK-30425:
------------------------------------

             Summary: FileScan of Data Source V2 doesn't implement Partition Pruning
                 Key: SPARK-30425
                 URL: https://issues.apache.org/jira/browse/SPARK-30425
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Haifeng Chen


I was trying to understand how Data Source V2 handling partition pruning,  I didn't find the code anywhere which filtering out the unnecessary files in current Data Source V2 implementation. For a File data source, the base class FileScan of Data Source V2 possibly should handle this in "partitions" method. But the current implementation is like the following:

protected def partitions: Seq[FilePartition] = {
 val selectedPartitions = fileIndex.listFiles(Seq.empty, Seq.empty)

 

listFiles passed to empty sequence where no files will be filtered by the partition filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org