You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chao Sun (Jira)" <ji...@apache.org> on 2021/07/14 04:25:00 UTC

[jira] [Comment Edited] (SPARK-36128) CatalogFileIndex.filterPartitions should respect spark.sql.hive.metastorePartitionPruning

    [ https://issues.apache.org/jira/browse/SPARK-36128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380299#comment-17380299 ] 

Chao Sun edited comment on SPARK-36128 at 7/14/21, 4:24 AM:
------------------------------------------------------------

[~hyukjin.kwon] you are right - I didn't know this config is designed to be only used by Hive table scan, but this poses a few issues:
1. by default, data source tables also manage their partitions through HMS, via config {{spark.sql.hive.manageFilesourcePartitions}}. This config also says "When partition management is enabled, datasource tables store partition in the Hive metastore, and use the metastore to prune partitions during query planning", so it sounds like they should have the same partition pruning mechanism as Hive tables, including the flag.
2. there is effectively only one implementation for {{ExternalCatalog}} which is HMS, so I'm not sure why we treat Hive table scans differently than data source table scans, even though both of them are reading partition metadata from HMS.


was (Author: csun):
[~hyukjin.kwon] you are right - I didn't know this config is designed to be only used by Hive table scan, but this poses a few issues:
1. by default, data source tables also manage their partitions through HMS, via config {{spark.sql.hive.manageFilesourcePartitions}}. This config also says "When partition management is enabled, datasource tables store partition in the Hive metastore, and use the metastore to prune partitions during query planning", so it sounds like they should have the same partition pruning mechanism as Hive tables.
2. there is effectively only one implementation for {{ExternalCatalog}} which is HMS, so I'm not sure why we treat Hive table scans differently than data source table scans, even though both of them are reading partition metadata from HMS.

> CatalogFileIndex.filterPartitions should respect spark.sql.hive.metastorePartitionPruning
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-36128
>                 URL: https://issues.apache.org/jira/browse/SPARK-36128
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Chao Sun
>            Priority: Major
>
> Currently the config {{spark.sql.hive.metastorePartitionPruning}} is only used in {{PruneHiveTablePartitions}} but not {{PruneFileSourcePartitions}}. The latter calls {{CatalogFileIndex.filterPartitions}} which calls {{listPartitionsByFilter}} regardless of whether the above config is set or not. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org