You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wecharyu (via GitHub)" <gi...@apache.org> on 2023/12/01 17:09:23 UTC

[PR] [SPARK-46203][SQL][HIVE] Support get partitions by names in partition filter [spark]

wecharyu opened a new pull request, #44111:
URL: https://github.com/apache/spark/pull/44111

   ### What changes were proposed in this pull request?
   In this path, we introduce a new switch that enable filtering partitions in Spark side, and then get target partitions by the high performance API `Hive#getPartitionsByNames`.
   1. Add a switch `spark.sql.hive.getPartitionByName.enabled` that enable doing partition filter in Spark and get partitions by name through HMS.
   2. Unify the `listPartitionsByFilter` call through `ExternalCatalogUtils` to make sure most partition prunes can use the new switch.
   3. Implement `listPartitionsByNames` api in different catalogs.
   
   ### Why are the changes needed?
   `Hive#getPartitionsByFilter` API is low-performance and would cause Hive MetaStore backend DBS suffer heavy load if there are many calls to tables containing many partitions. There are mainly two advantages of this path:
   1. Improve the performance of `listPartitionsByFilter` when querying tables containing large partitions.
   2. Reduce the load on the db behind Hive MetaStore and maintain the health of HMS.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the user can turn on `spark.sql.hive.getPartitionByName.enabled` if the spark app needs do partition filter on tables containing large number of partitions. 
   
   
   ### How was this patch tested?
   Add a unit test.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46203][SQL][HIVE] Support get partitions by names in partition filter [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #44111: [SPARK-46203][SQL][HIVE] Support get partitions by names in partition filter
URL: https://github.com/apache/spark/pull/44111


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46203][SQL][HIVE] Support get partitions by names in partition filter [spark]

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on PR #44111:
URL: https://github.com/apache/spark/pull/44111#issuecomment-1842613788

   cc: @pan3793 @LuciferYang Could you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46203][SQL][HIVE] Support get partitions by names in partition filter [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #44111:
URL: https://github.com/apache/spark/pull/44111#issuecomment-2000748923

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org