You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "szehon-ho (via GitHub)" <gi...@apache.org> on 2023/05/19 18:08:43 UTC

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7589: Docs: RewritePositionDeleteFiles procedure

szehon-ho commented on code in PR #7589:
URL: https://github.com/apache/iceberg/pull/7589#discussion_r1199239910


##########
docs/spark-procedures.md:
##########
@@ -364,6 +364,53 @@ Rewrite the manifests in table `db.sample` and disable the use of Spark caching.
 CALL catalog_name.system.rewrite_manifests('db.sample', false)
 ```
 
+### `rewrite_position_delete_files`
+
+Iceberg can rewrite position delete files, which serves two purposes:
+* Minor Compaction: Compact small position delete files into larger ones.  This reduces size of metadata stored in manifest files and overhead of opening small delete files.
+* Remove Dangling Deletes: Filter out position delete records that refer to data files that are no longer live.  After rewrite_data_files, position delete records pointing to the rewritten data files are not immediately marked for removeal and remain tracked by the table's live snapshot metadata.  This is known as the 'dangling delete' problem, and is because a single position delete file can apply to more than one data file, and not all applicable data files are removed during rewrite.
+
+Iceberg can rewrite position delete files in parallel using Spark with the `rewritePositionDeletes` action.

Review Comment:
   Sure, I'll remove this.  I copied it from the rewrite data files procedure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org