You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/30 06:54:51 UTC

[GitHub] [spark] maropu edited a comment on pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()

maropu edited a comment on pull request #29831:
URL: https://github.com/apache/spark/pull/29831#issuecomment-701197040


   Probably, you'd be better to describe a bit more in the PR description;
   example) currently, actual partition pruning is executed in the optimizer phase (`PruneFileSourcePartitions`) if an input relation has a catalog file index. The current code assumes the same partition filters are generated again in `FileSourceStrategy` and passed into `FileSourceScanExec`. `FileSourceScanExec` uses the partition filters when listing files, but [the filters do nothing](https://github.com/apache/spark/blob/cc06266ade5a4eb35089501a3b32736624208d4c/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L211-L213) because unnecessary partitions are already pruned in advance, so the filters are mainly used for explain output in this case. If a `WEHRE` clause has DNF-ed predicates, `FileSourceStrategy` cannot extract the same filters with `PruneFileSourcePartitions` and then `PartitionFilters` is not shown in explain output. In this PR, brabrabra....


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org