You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rajesh Balamohan (Jira)" <ji...@apache.org> on 2023/02/02 09:01:00 UTC
[jira] [Created] (HIVE-27014) Iceberg: getSplits/planTasks should filter out relevant folders instead of scanning entire table
Rajesh Balamohan created HIVE-27014:
---------------------------------------
Summary: Iceberg: getSplits/planTasks should filter out relevant folders instead of scanning entire table
Key: HIVE-27014
URL: https://issues.apache.org/jira/browse/HIVE-27014
Project: Hive
Issue Type: Improvement
Components: Iceberg integration
Reporter: Rajesh Balamohan
With dynamic partition pruning, only relevant folders in fact tables are scanned.
In tez, DynamicPartitionPruner will set the relevant filters.In iceberg, these filters are applied after "Table:planTasks()" is invoked in iceberg. This forces entire table metadata to be scanned and then throw off the unwanted partitions.
This makes split computation expensive (e.g for store_sales, it has to look at all 1800+ partitions and throw off unwanted partitions).
For short running queries, it takes 3-5+ seconds for split computation. Creating this ticket as a placeholder to make use of the relevant filters from DPP.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)