You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/09 23:43:13 UTC

[GitHub] [iceberg] danielcweeks commented on a change in pull request #3037: Spec: Update spec to show that version 2 is adopted

danielcweeks commented on a change in pull request #3037:
URL: https://github.com/apache/iceberg/pull/3037#discussion_r705788319



##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
 
 This is a specification for the Iceberg table format that is designed to manage a large, slow-changing collection of files in a distributed file system or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the community.
+
+The format version number is incremented when new features are added that will break forward-compatibility---that is, when older readers would not read newer table features correctly. Tables may continue to be written with an older version of the spec to ensure compatibility by not using features that are not yet implemented by processing engines.

Review comment:
       I think the statement that we guarantee compatibility across all versions is a little strong.  We wouldn't want to have to walk that back at some point in the future if there is a good reason for a breaking change.  

##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
 
 This is a specification for the Iceberg table format that is designed to manage a large, slow-changing collection of files in a distributed file system or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the community.

Review comment:
       I agree with adding "specification", but we might just want to simplify this statement to read:
   
   Versions 1 and 2 of the Iceberg format specification are complete and adopted by the community.

##########
File path: site/docs/spec.md
##########
@@ -19,15 +19,23 @@
 
 This is a specification for the Iceberg table format that is designed to manage a large, slow-changing collection of files in a distributed file system or key-value store as a table.
 
+## Format Versioning
+
+Versions 1 and 2 of the Iceberg format are finished and supported by the community.
+
+The format version number is incremented when new features are added that will break forward-compatibility---that is, when older readers would not read newer table features correctly. Tables may continue to be written with an older version of the spec to ensure compatibility by not using features that are not yet implemented by processing engines.
+
 #### Version 1: Analytic Data Tables
 
-**Iceberg format version 1 is the current version**. It defines how to manage large analytic tables using immutable file formats: Parquet, Avro, and ORC.
+Iceberg format version 1 defines how to manage large analytic tables using immutable file formats: Parquet, Avro, and ORC.
 
 #### Version 2: Row-level Deletes
 
-The Iceberg community is currently working on version 2 of the Iceberg format that supports encoding row-level deletes. **The v2 specification is incomplete and may change until it is finished and adopted.** This document includes tentative v2 format requirements, but there are currently no compatibility guarantees with the unfinished v2 spec.
+Iceberg format version 2 adds row-level deletes for analytic tables with immutable files.
+
+The primary change in version 2 adds delete files to encode that rows that are deleted in existing data files. This version can be used to delete or replace individual rows in immutable data files without rewriting the files.
 
-The primary goal of version 2 is to provide a way to encode row-level deletes. This update can be used to delete or replace individual rows in an immutable data file without rewriting the file.
+In addition to row-level deletes, version 2 makes some requirements stricter for writers. For example, multiple schemas can be tracked in v1 metadata using an optional `schemas` list. In v2, the schemas list is required. The full set of changes are listed in [Appendix D: Format version changes, Version 2](#version-2).

Review comment:
       You might want to pull out the example from here and just state that write requirements are more strict and refer to the appendix.  I feel like the example is odd and doesn't really provide specific value in this context.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org