You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/09/29 20:57:53 UTC

[GitHub] [iceberg] edgarRd commented on a change in pull request #1499: Update the Iceberg spec for row-level deletes

edgarRd commented on a change in pull request #1499:
URL: https://github.com/apache/iceberg/pull/1499#discussion_r497051730



##########
File path: site/docs/spec.md
##########
@@ -416,25 +530,91 @@ Notes:
 
 ### Delete Formats
 
-This section details how to encode row-level deletes in Iceberg metadata. Row-level deletes are not supported in the current format version 1. This part of the spec is not yet complete and will be completed as format version 2.
+This section details how to encode row-level deletes in Iceberg delete files. Row-level deletes are not supported in v1.
+
+Row-level delete files are valid Iceberg data files: files must use valid Iceberg formats, schemas, and column projection. It is recommended that delete files are written using the table's default file format.
+
+Row-level delete files are tracked by manifests, like data files. A separate set of manifests is used for delete files, but the manifest schemas are identical.
 
-#### Position-based Delete Files
+Both position and equality deletes allow encoding deleted row values with a delete. This can be used to reconstruct a stream of changes to a table.
 
-Position-based delete files identify rows in one or more data files that have been deleted.
+
+#### Position Delete Files
+
+Position-based delete files identify deleted rows by file and position in one or more data files, and may optionally contain the deleted row.
+
+A data row is deleted if there is an entry in a position delete file for the row's file and position in the data file, starting at 0.
 
 Position-based delete files store `file_position_delete`, a struct with the following fields:
 
-| Field id, name          | Type                            | Description                                                                                                              |
-|-------------------------|---------------------------------|--------------------------------------------------------------------------------------------------------------------------|
-| **`1  file_path`**     | `required string`               | The full URI of a data file with FS scheme. This must match the `file_path` of the target data file in a manifest entry.   |
-| **`2  position`**      | `required long`                 | The ordinal position of a deleted row in the target data file identified by `file_path`, starting at `0`.                    |
+| Field id, name              | Type                   | Description |
+|-----------------------------|------------------------|-------------|
+| **`2147483546  file_path`** | `string`               | Full URI or a data file with FS scheme. This must match the `file_path` of the target data file in a manifest entry |
+| **`2147483545  pos`**       | `long`                 | Ordinal position of a deleted row in the target data file identified by `file_path`, starting at `0` |
+| **`2147483544  row`**       | `required struct<...>` | Deleted row values. Omit the column when not storing deleted rows. |

Review comment:
       I think this is a bit confusing, it's a `required struct` that could be omitted?
   > Omit the column when not storing deleted rows.
   
   Also a above it mentions:
   > and may optionally contain the deleted row.
   
   Is this column `required` or am I missing something?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org