You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2021/09/10 15:22:15 UTC

[iceberg] branch master updated: Spec: Update spec to show that version 2 is adopted (#3037)

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git


The following commit(s) were added to refs/heads/master by this push:
     new 09584aa  Spec: Update spec to show that version 2 is adopted (#3037)
09584aa is described below

commit 09584aa788292fcfe09d297e1400a3ed08e4eff2
Author: Ryan Blue <bl...@apache.org>
AuthorDate: Fri Sep 10 08:22:05 2021 -0700

    Spec: Update spec to show that version 2 is adopted (#3037)
---
 site/docs/spec.md | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/site/docs/spec.md b/site/docs/spec.md
index 1fcd389..e5ff2e7 100644
--- a/site/docs/spec.md
+++ b/site/docs/spec.md
@@ -19,15 +19,25 @@
 
 This is a specification for the Iceberg table format that is designed to manage a large, slow-changing collection of files in a distributed file system or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg spec are complete and adopted by the community.
+
+The format version number is incremented when new features are added that will break forward-compatibility---that is, when older readers would not read newer table features correctly. Tables may continue to be written with an older version of the spec to ensure compatibility by not using features that are not yet implemented by processing engines.
+
 #### Version 1: Analytic Data Tables
 
-**Iceberg format version 1 is the current version**. It defines how to manage large analytic tables using immutable file formats: Parquet, Avro, and ORC.
+Version 1 of the Iceberg spec defines how to manage large analytic tables using immutable file formats: Parquet, Avro, and ORC.
+
+All version 1 data and metadata files are valid after upgrading a table to version 2. [Appendix E](#version-2) documents how to default version 2 fields when reading version 1 metadata.
 
 #### Version 2: Row-level Deletes
 
-The Iceberg community is currently working on version 2 of the Iceberg format that supports encoding row-level deletes. **The v2 specification is incomplete and may change until it is finished and adopted.** This document includes tentative v2 format requirements, but there are currently no compatibility guarantees with the unfinished v2 spec.
+Version 2 of the Iceberg spec adds row-level updates and deletes for analytic tables with immutable files.
+
+The primary change in version 2 adds delete files to encode that rows that are deleted in existing data files. This version can be used to delete or replace individual rows in immutable data files without rewriting the files.
 
-The primary goal of version 2 is to provide a way to encode row-level deletes. This update can be used to delete or replace individual rows in an immutable data file without rewriting the file.
+In addition to row-level deletes, version 2 makes some requirements stricter for writers. The full set of changes are listed in [Appendix E](#version-2).
 
 
 ## Goals
@@ -1002,7 +1012,7 @@ This serialization scheme is for storing single values as individual binary valu
 | **`map`**                    | Not supported                                                                                                |
 
 
-## Format version changes
+## Appendix E: Format version changes
 
 ### Version 2