You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rajesh Balamohan (Jira)" <ji...@apache.org> on 2023/02/02 09:01:00 UTC

[jira] [Created] (HIVE-27014) Iceberg: getSplits/planTasks should filter out relevant folders instead of scanning entire table

Rajesh Balamohan created HIVE-27014:
---------------------------------------

             Summary: Iceberg: getSplits/planTasks should filter out relevant folders instead of scanning entire table
                 Key: HIVE-27014
                 URL: https://issues.apache.org/jira/browse/HIVE-27014
             Project: Hive
          Issue Type: Improvement
          Components: Iceberg integration
            Reporter: Rajesh Balamohan


With dynamic partition pruning, only relevant folders in fact tables are scanned.

In tez, DynamicPartitionPruner will set the relevant filters.In iceberg, these filters are applied after "Table:planTasks()" is invoked in iceberg. This forces entire table metadata to be scanned and then throw off the unwanted partitions. 

This makes split computation expensive (e.g for store_sales, it has to look at all 1800+ partitions and throw off unwanted partitions).

For short running queries, it takes 3-5+ seconds for split computation. Creating this ticket as a placeholder to make use of the relevant filters from DPP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)