You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2020/04/03 17:24:49 UTC

[incubator-iceberg] branch master updated: Spec: Add file and position delete file (#887)

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-iceberg.git


The following commit(s) were added to refs/heads/master by this push:
     new 4ae5b30  Spec: Add file and position delete file (#887)
4ae5b30 is described below

commit 4ae5b30269e2638c0268722fe2e59d603517f0eb
Author: Chen, Junjie <ch...@gmail.com>
AuthorDate: Sat Apr 4 01:24:41 2020 +0800

    Spec: Add file and position delete file (#887)
    
    Co-authored-by: Ryan Blue <rd...@users.noreply.github.com>
---
 site/docs/spec.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/site/docs/spec.md b/site/docs/spec.md
index df1c0c4..921c100 100644
--- a/site/docs/spec.md
+++ b/site/docs/spec.md
@@ -333,6 +333,27 @@ Table metadata is stored as JSON. Each table metadata change creates a new table
 
 The atomic operation used to commit metadata depends on how tables are tracked and is not standardized by this spec. See the sections below for examples.
 
+### Delete Format
+
+This section details how to encode row-level deletes in Iceberg metadata. Row-level deletes are not supported in the current format version 1. This part of the spec is not yet complete and will be completed as format version 2.
+
+#### Position-based Delete Files
+
+Position-based delete files identify rows in one or more data files that have been deleted.
+
+Position-based delete files store `file_position_delete`, a struct with the following fields:
+
+| Field id, name          | Type                            | Description                                                                                                              |
+|-------------------------|---------------------------------|--------------------------------------------------------------------------------------------------------------------------|
+| **`1  file_path`**     | `required string`               | The full URI of a data file with FS scheme. This must match the `file_path` of the target data file in a manifest entry.   |
+| **`2  position`**      | `required long`                 | The ordinal position of a deleted row in the target data file identified by `file_path`, starting at `0`.                    |
+
+The rows in the delete file must be sorted by `file_path` then `position` to optimize filtering rows while scanning. 
+
+*  Sorting by `file_path` allows filter pushdown by file in columnar storage formats.
+*  Sorting by `position` allows filtering rows while scanning, to avoid keeping deletes in memory.
+ 
+Though the delete files can be written using any supported data file format in Iceberg, it is recommended to write delete files with same file format as the table's file format.
 
 #### Commit Conflict Resolution and Retry