You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ZachDischner (via GitHub)" <gi...@apache.org> on 2023/06/23 14:52:20 UTC

[GitHub] [iceberg] ZachDischner opened a new issue, #7892: [Feature Request] Inspect partitions Metadata for Tables with Many Partitions

ZachDischner opened a new issue, #7892:
URL: https://github.com/apache/iceberg/issues/7892

   ### Feature Request / Improvement
   
   I wish to inspect the `catalog.db.table.partitions` metadata table, but cannot do so for large tables. Even on extremely large clusters I receive out-of-memory errors. Happens for tables with as few as 4 million partitions. 
   
   **Using `partitions` table directly**
   
   Times out, results in out of memory errors. Spark shows that there is only one task allocated, so this appears to not treat partitions as a big-data problem. 
   ```
   spark.read.format("iceberg").load("catalog.db.table.partitions").count
   ```
   
   **Indirectly obtaining `partitions` information via `files` metadata table**
   
   Inspecting the `files` metadata is a sufficient workaround. `files` metadata is treated as a big data problem so we can sufficiently parallelize 
   ```
   spark.read.format("iceberg").load("catalog.db.table.files").agg(count("*").as("FileCount"), count_distinct(col("partition"))).as("PartitionCount").show
   +---------+----------------+
   |FileCount|count(partition)|
   +---------+----------------+
   |  4773395|         4302859|
   +---------+----------------+
   ```
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org