You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by ja...@apache.org on 2022/02/04 18:52:43 UTC

[iceberg-docs] branch main updated: Add all docs PR's merged before the split out to this iceberg-docs repo (#31)

This is an automated email from the ASF dual-hosted git repository.

jackye pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-docs.git


The following commit(s) were added to refs/heads/main by this push:
     new a508bf2  Add all docs PR's merged before the split out to this iceberg-docs repo (#31)
a508bf2 is described below

commit a508bf2c256319e0dbe97b9c27dc18c65f892997
Author: Samuel Redai <43...@users.noreply.github.com>
AuthorDate: Fri Feb 4 13:50:16 2022 -0500

    Add all docs PR's merged before the split out to this iceberg-docs repo (#31)
    
    * PR #3375: Spark: CALL procedure for rewrite_data_files
    
    * PR #3422: Doc: Update Files metadata table
    
    * PR #3425: Spec: Add snapshot tagging and branching
    
    * PR #3469: Spark: Add test cases for SparkActions and remove deprecated Actions and its test cases
    
    * PR #3479: Add a How To Verify A Release page
    
    * PR #3544: Docs: Add Iceberg indexing blog post
    
    * PR #3556: Spec: Document NameMapping
    
    * PR #3602: Docs: Add ververica blog - Using Flink CDC to synchronize data from MySQL sharding tables and build real-time data lake
    
    * PR #3680: Site Docs - Fix Flink batch job incorrectly labeled as streaming
    
    * PR #3683: Spark: clarify alter column of complex types
    
    * PR #3694: Doc : Modify the attribute description
    
    * PR #3706: Docs: update Nessie version to 0.17.0
    
    * PR #3712: Docs: add spark create clause CTAS/RTAS example
---
 docs/config.toml                                   |   2 +-
 docs/content/docs/flink/flink-getting-started.md   |   2 +-
 docs/content/docs/spark/spark-ddl.md               |  29 ++++
 docs/content/docs/spark/spark-procedures.md        |  53 +++++++
 docs/content/docs/spark/spark-queries.md           |   5 +
 docs/content/docs/tables/configuration.md          |   5 +-
 docs/content/docs/tables/maintenance.md            |  36 +++--
 landing-page/content/common/community/blogs.md     |   8 ++
 landing-page/content/common/format/spec.md         |  76 +++++++++-
 .../common/releases/how-to-verify-a-release.md     | 154 +++++++++++++++++++++
 10 files changed, 350 insertions(+), 20 deletions(-)

diff --git a/docs/config.toml b/docs/config.toml
index 5fb300e..2155c43 100644
--- a/docs/config.toml
+++ b/docs/config.toml
@@ -7,6 +7,6 @@ theme= "hugo-book"
   BookTheme = 'auto'
   BookLogo = "img/iceberg-logo-icon.png"
   versions.iceberg = "" # This is populated by the github deploy workflow and is equal to the branch name
-  versions.nessie = "0.15.1"
+  versions.nessie = "0.17.0"
   latestVersions.iceberg = "0.12.1"  # This is used for the version badge on the "latest" site version
   BookSection='docs' # This determines which directory will inform the left navigation menu
diff --git a/docs/content/docs/flink/flink-getting-started.md b/docs/content/docs/flink/flink-getting-started.md
index 3ec92c0..b4d5518 100644
--- a/docs/content/docs/flink/flink-getting-started.md
+++ b/docs/content/docs/flink/flink-getting-started.md
@@ -478,7 +478,7 @@ DataStream<RowData> stream = FlinkSource.forRowData()
 stream.print();
 
 // Submit and execute this streaming read job.
-env.execute("Test Iceberg Batch Read");
+env.execute("Test Iceberg Streaming Read");
 ```
 
 There are other options that we could set by Java API, please see the [FlinkSource#Builder](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/flink/source/FlinkSource.html).
diff --git a/docs/content/docs/spark/spark-ddl.md b/docs/content/docs/spark/spark-ddl.md
index 1422444..55e7bfe 100644
--- a/docs/content/docs/spark/spark-ddl.md
+++ b/docs/content/docs/spark/spark-ddl.md
@@ -111,6 +111,13 @@ USING iceberg
 AS SELECT ...
 ```
 ```sql
+REPLACE TABLE prod.db.sample
+USING iceberg
+PARTITIONED BY (part)
+TBLPROPERTIES ('key'='value')
+AS SELECT ...
+```
+```sql
 CREATE OR REPLACE TABLE prod.db.sample
 USING iceberg
 AS SELECT ...
@@ -189,6 +196,28 @@ ALTER TABLE prod.db.sample
 ADD COLUMN point.z double
 ```
 
+```sql
+-- create a nested array column of struct
+ALTER TABLE prod.db.sample
+ADD COLUMN points array<struct<x: double, y: double>>;
+
+-- add a field to the struct within an array. Using keyword 'element' to access the array's element column.
+ALTER TABLE prod.db.sample
+ADD COLUMN points.element.z double
+```
+
+```sql
+-- create a map column of struct key and struct value
+ALTER TABLE prod.db.sample
+ADD COLUMN points map<struct<x: int>, struct<a: int>>;
+
+-- add a field to the value struct in a map. Using keyword 'value' to access the map's value column.
+ALTER TABLE prod.db.sample
+ADD COLUMN points.value.b int
+```
+
+Note: Altering a map 'key' column by adding columns is not allowed. Only map values can be updated.
+
 In Spark 2.4.4 and later, you can add columns in any position by adding `FIRST` or `AFTER` clauses:
 
 ```sql
diff --git a/docs/content/docs/spark/spark-procedures.md b/docs/content/docs/spark/spark-procedures.md
index 36ab02a..d4649a2 100644
--- a/docs/content/docs/spark/spark-procedures.md
+++ b/docs/content/docs/spark/spark-procedures.md
@@ -246,6 +246,59 @@ Remove any files in the `tablelocation/data` folder which are not known to the t
 CALL catalog_name.system.remove_orphan_files(table => 'db.sample', location => 'tablelocation/data')
 ```
 
+### `rewrite_data_files`
+
+Iceberg tracks each data file in a table. More data files leads to more metadata stored in manifest files, and small data files causes an unnecessary amount of metadata and less efficient queries from file open costs.
+
+Iceberg can compact data files in parallel using Spark with the `rewriteDataFiles` action. This will combine small files into larger files to reduce metadata overhead and runtime file open cost.
+
+#### Usage
+
+| Argument Name | Required? | Type | Description |
+|---------------|-----------|------|-------------|
+| `table`       | ✔️  | string | Name of the table to update |
+| `strategy`    |    | string | Name of the strategy - binpack or sort. Defaults to binpack strategy |
+| `sort_order`  |    | string | Comma separated sort_order_column. Where sort_order_column is a space separated sort order info per column (ColumnName SortDirection NullOrder). <br/> SortDirection can be ASC or DESC. NullOrder can be NULLS FIRST or NULLS LAST |
+| `options`     | ️   | map<string, string> | Options to be used for actions|
+| `where`       | ️   | string | predicate as a string used for filtering the files. Note that all files that may contain data matching the filter will be selected for rewriting|
+
+
+See the [`RewriteDataFiles` Javadoc](./javadoc/{{ versions.iceberg }}/org/apache/iceberg/actions/RewriteDataFiles.html#field.summary),
+<br/>  [`BinPackStrategy` Javadoc](./javadoc/{{ versions.iceberg }}/org/apache/iceberg/actions/BinPackStrategy.html#field.summary)
+and <br/> [`SortStrategy` Javadoc](./javadoc/{{ versions.iceberg }}/org/apache/iceberg/actions/SortStrategy.html#field.summary)
+for list of all the supported options for this action.
+
+#### Output
+
+| Output Name | Type | Description |
+| ------------|------|-------------|
+| `rewritten_data_files_count` | int | Number of data which were re-written by this command |
+| `added_data_files_count`     | int | Number of new data files which were written by this command |
+
+#### Examples
+
+Rewrite the data files in table `db.sample` using the default rewrite algorithm of bin-packing to combine small files 
+and also split large files according to the default write size of the table.
+```sql
+CALL catalog_name.system.rewrite_data_files('db.sample')
+```
+
+Rewrite the data files in table `db.sample` by sorting all the data on id and name 
+using the same defaults as bin-pack to determine which files to rewrite.
+```sql
+CALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy => 'sort', sort_order => 'id DESC NULLS LAST,name ASC NULLS FIRST')
+```
+
+Rewrite the data files in table `db.sample` using bin-pack strategy in any partition where more than 2 or more files need to be rewritten.
+```sql
+CALL catalog_name.system.rewrite_data_files(table => 'db.sample', options => map('min-input-files','2'))
+```
+
+Rewrite the data files in table `db.sample` and select the files that may contain data matching the filter (id = 3 and name = "foo") to be rewritten.
+```sql
+CALL catalog_name.system.rewrite_data_files(table => 'db.sample', where => 'id = 3 and name = "foo"')
+```
+
 ### `rewrite_manifests`
 
 Rewrite manifests for a table to optimize scan planning.
diff --git a/docs/content/docs/spark/spark-queries.md b/docs/content/docs/spark/spark-queries.md
index 4e2f634..5b5017d 100644
--- a/docs/content/docs/spark/spark-queries.md
+++ b/docs/content/docs/spark/spark-queries.md
@@ -53,6 +53,11 @@ For example, to read from the `files` metadata table for `prod.db.table`, run:
 ```
 SELECT * FROM prod.db.table.files
 ```
+|content|file_path                                                                                                                                   |file_format|spec_id|partition|record_count|file_size_in_bytes|column_sizes      |value_counts    |null_value_counts|nan_value_counts|lower_bounds           |upper_bounds           |key_metadata|split_offsets|equality_ids|sort_order_id|
+| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
+| 0 | s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET   | 0  | {1999-01-01, 01} | 1            | 597                | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0]  | []               | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null         | [4]           | null | null |
+| 0 | s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET   | 0  | {1999-01-01, 02} | 1            | 597                | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0]  | []               | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null         | [4]           | null | null |
+| 0 | s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET   | 0  | {1999-01-01, 03} | 1            | 597                | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0]  | []               | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null         | [4]           | null | null |
 
 ## Querying with DataFrames
 
diff --git a/docs/content/docs/tables/configuration.md b/docs/content/docs/tables/configuration.md
index 8c6c32f..78752e8 100644
--- a/docs/content/docs/tables/configuration.md
+++ b/docs/content/docs/tables/configuration.md
@@ -65,16 +65,17 @@ Iceberg tables support table properties to configure table behavior, like the de
 | commit.retry.num-retries           | 4                | Number of times to retry a commit before failing              |
 | commit.retry.min-wait-ms           | 100              | Minimum time in milliseconds to wait before retrying a commit |
 | commit.retry.max-wait-ms           | 60000 (1 min)    | Maximum time in milliseconds to wait before retrying a commit |
-| commit.retry.total-timeout-ms      | 1800000 (30 min) | Maximum time in milliseconds to wait before retrying a commit |
+| commit.retry.total-timeout-ms      | 1800000 (30 min) | Total retry timeout period in milliseconds for a commit |
 | commit.status-check.num-retries    | 3                | Number of times to check whether a commit succeeded after a connection is lost before failing due to an unknown commit state |
 | commit.status-check.min-wait-ms    | 1000 (1s)        | Minimum time in milliseconds to wait before retrying a status-check |
 | commit.status-check.max-wait-ms    | 60000 (1 min)    | Maximum time in milliseconds to wait before retrying a status-check |
-| commit.status-check.total-timeout-ms| 1800000 (30 min) | Maximum time in milliseconds to wait before retrying a status-check |
+| commit.status-check.total-timeout-ms| 1800000 (30 min) | Total timeout period in which the commit status-check must succeed, in milliseconds |
 | commit.manifest.target-size-bytes  | 8388608 (8 MB)   | Target size when merging manifest files                       |
 | commit.manifest.min-count-to-merge | 100              | Minimum number of manifests to accumulate before merging      |
 | commit.manifest-merge.enabled      | true             | Controls whether to automatically merge manifests on writes   |
 | history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age of snapshots to keep while expiring snapshots    |
 | history.expire.min-snapshots-to-keep | 1                | Default min number of snapshots to keep while expiring snapshots |
+| history.expire.max-ref-age-ms      | `Long.MAX_VALUE` (forever) | For snapshot references except the `main` branch, default max age of snapshot references to keep while expiring snapshots. The `main` branch never expires. |
 
 ### Compatibility flags
 
diff --git a/docs/content/docs/tables/maintenance.md b/docs/content/docs/tables/maintenance.md
index 7f7e62f..26c4bdb 100644
--- a/docs/content/docs/tables/maintenance.md
+++ b/docs/content/docs/tables/maintenance.md
@@ -48,8 +48,10 @@ See the [`ExpireSnapshots` Javadoc](../../../javadoc/{{% icebergVersion %}}/org/
 There is also a Spark action that can run table expiration in parallel for large tables:
 
 ```java
-Actions.forTable(table)
-    .expireSnapshots()
+Table table = ...
+SparkActions
+    .get()
+    .expireSnapshots(table)
     .expireOlderThan(tsToExpire)
     .execute();
 ```
@@ -76,20 +78,21 @@ To automatically clean metadata files, set `write.metadata.delete-after-commit.e
 
 See [table write properties](../configuration/#write-properties) for more details.
 
-### Remove orphan files
+### Delete orphan files
 
 In Spark and other distributed processing engines, task or job failures can leave files that are not referenced by table metadata, and in some cases normal snapshot expiration may not be able to determine a file is no longer needed and delete it.
 
-To clean up these "orphan" files under a table location, use the `removeOrphanFiles` action.
+To clean up these "orphan" files under a table location, use the `deleteOrphanFiles` action.
 
 ```java
 Table table = ...
-Actions.forTable(table)
-    .removeOrphanFiles()
+SparkActions
+    .get()
+    .deleteOrphanFiles(table)
     .execute();
 ```
 
-See the [RemoveOrphanFilesAction Javadoc](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/actions/RemoveOrphanFilesAction.html) to see more configuration options.
+See the [DeleteOrphanFiles Javadoc](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/actions/DeleteOrphanFiles.html) to see more configuration options.
 
 This action may take a long time to finish if you have lots of files in data and metadata directories. It is recommended to execute this periodically, but you may not need to execute this often.
 
@@ -118,15 +121,17 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile
 
 ```java
 Table table = ...
-Actions.forTable(table).rewriteDataFiles()
+SparkActions
+    .get()
+    .rewriteDataFiles(table)
     .filter(Expressions.equal("date", "2020-08-18"))
-    .targetSizeInBytes(500 * 1024 * 1024) // 500 MB
+    .option("target-file-size-bytes", Long.toString(500 * 1024 * 1024)) // 500 MB
     .execute();
 ```
 
-The `files` metadata table is useful for inspecting data file sizes and determining when to compact partitons.
+The `files` metadata table is useful for inspecting data file sizes and determining when to compact partitions.
 
-See the [`RewriteDataFilesAction` Javadoc](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/actions/RewriteDataFilesAction.html) to see more configuration options.
+See the [`RewriteDataFiles` Javadoc](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/actions/RewriteDataFiles.html) to see more configuration options.
 
 ### Rewrite manifests
 
@@ -140,10 +145,11 @@ This example rewrites small manifests and groups data files by the first partiti
 
 ```java
 Table table = ...
-table.rewriteManifests()
+SparkActions
+    .get()
+    .rewriteManifests(table)
     .rewriteIf(file -> file.length() < 10 * 1024 * 1024) // 10 MB
-    .clusterBy(file -> file.partition().get(0, Integer.class))
-    .commit();
+    .execute();
 ```
 
-See the [`RewriteManifestsAction` Javadoc](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/actions/RewriteManifestsAction.html) to see more configuration options.
+See the [`RewriteManifests` Javadoc](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/actions/RewriteManifests.html) to see more configuration options.
diff --git a/landing-page/content/common/community/blogs.md b/landing-page/content/common/community/blogs.md
index 462894e..287dc34 100644
--- a/landing-page/content/common/community/blogs.md
+++ b/landing-page/content/common/community/blogs.md
@@ -23,6 +23,14 @@ weight: 200
 
 Here is a list of company blogs that talk about Iceberg. The blogs are ordered from most recent to oldest.
 
+### [Using Flink CDC to synchronize data from MySQL sharding tables and build real-time data lake](https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/build-real-time-data-lake-tutorial.html)
+**Date**: 11 November 2021, **Company**: Ververica, Alibaba Could
+**Author**: [Yuxia Luo](https://github.com/luoyuxia), [Jark Wu](https://github.com/wuchong), [Zheng Hu](https://www.linkedin.com/in/zheng-hu-37017683/)
+
+### [Metadata Indexing in Iceberg](https://tabular.io/blog/iceberg-metadata-indexing/)
+**Date**: 10 October 2021, **Company**: Tabular
+**Author**: [Ryan Blue](https://www.linkedin.com/in/rdblue/)
+
 ### [Using Debezium to Create a Data Lake with Apache Iceberg](https://debezium.io/blog/2021/10/20/using-debezium-create-data-lake-with-apache-iceberg/)
 **Date**: October 20th, 2021, **Company**: Memiiso Community
 **Author**: [Ismail Simsek](https://www.linkedin.com/in/ismailsimsek/)
diff --git a/landing-page/content/common/format/spec.md b/landing-page/content/common/format/spec.md
index 1527273..da4d085 100644
--- a/landing-page/content/common/format/spec.md
+++ b/landing-page/content/common/format/spec.md
@@ -217,6 +217,26 @@ Columns in Iceberg data files are selected by field id. The table schema's colum
 
 For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order.
 
+For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order.
+
+Tables may also define a property `schema.name-mapping.default` with a JSON name mapping containing a list of field mapping objects. These mappings provide fallback field ids to be used when a data file does not contain field id information. Each object should contain
+
+* `names`: A required list of 0 or more names for a field. 
+* `field-id`: An optional Iceberg field ID used when a field's name is present in `names`
+* `fields`: An optional list of field mappings for child field of structs, maps, and lists.
+
+Field mapping fields are constrained by the following rules:
+
+* A name may contain `.` but this refers to a literal name, not a nested field. For example, `a.b` refers to a field named `a.b`, not child field `b` of field `a`. 
+* Each child field should be defined with their own field mapping under `fields`. 
+* Multiple values for `names` may be mapped to a single field ID to support cases where a field may have different names in different data files. For example, all Avro field aliases should be listed in `names`.
+* Fields which exist only in the Iceberg schema and not in imported data files may use an empty `names` list.
+* Fields that exist in imported files but not in the Iceberg schema may omit `field-id`.
+* List types should contain a mapping in `fields` for `element`. 
+* Map types should contain mappings in `fields` for `key` and `value`. 
+* Struct types should contain mappings in `fields` for their child fields.
+
+For details on serialization, see [Appendix C](#name-mapping-serialization).
 
 #### Identifier Field IDs
 
@@ -553,6 +573,38 @@ Notes:
 1. An alternative, *strict projection*, creates a partition predicate that will match a file if all of the rows in the file must match the scan predicate. These projections are used to calculate the residual predicates for each file in a scan.
 2. For example, if `file_a` has rows with `id` between 1 and 10 and a delete file contains rows with `id` between 1 and 4, a scan for `id = 9` may ignore the delete file because none of the deletes can match a row that will be selected.
 
+#### Snapshot Reference
+
+Iceberg tables keep track of branches and tags using snapshot references. 
+Tags are labels for individual snapshots. Branches are mutable named references that can be updated by committing a new snapshot as the branch's referenced snapshot using the [Commit Conflict Resolution and Retry](#commit-conflict-resolution-and-retry) procedures.
+
+The snapshot reference object records all the information of a reference including snapshot ID, reference type and [Snapshot Retention Policy](#snapshot-retention-policy).
+
+| v1         | v2         | Field name                   | Type      | Description |
+| ---------- | ---------- | ---------------------------- | --------- | ----------- |
+| _required_ | _required_ | **`snapshot-id`**            | `long`    | A reference's snapshot ID. The tagged snapshot or latest snapshot of a branch. |
+| _required_ | _required_ | **`type`**                   | `string`  | Type of the reference, `tag` or `branch` |
+| _optional_ | _optional_ | **`min-snapshots-to-keep`**  | `int`     | For `branch` type only, a positive number for the minimum number of snapshots to keep in a branch while expiring snapshots. Defaults to table property `history.expire.min-snapshots-to-keep`. |
+| _optional_ | _optional_ | **`max-snapshot-age-ms`**    | `long`    | For `branch` type only, a positive number for the max age of snapshots to keep when expiring, including the latest snapshot. Defaults to table property `history.expire.max-snapshot-age-ms`. |
+| _optional_ | _optional_ | **`max-ref-age-ms`**         | `long`    | For snapshot references except the `main` branch, a positive number for the max age of the snapshot reference to keep while expiring snapshots. Defaults to table property `history.expire.max-ref-age-ms`. The `main` branch never expires. |
+
+Valid snapshot references are stored as the values of the `refs` map in table metadata. For serialization, see Appendix C.
+
+#### Snapshot Retention Policy
+
+Table snapshots expire and are removed from metadata to allow removed or replaced data files to be physically deleted.
+The snapshot expiration procedure removes snapshots from table metadata and applies the table's retention policy.
+Retention policy can be configured both globally and on snapshot reference through properties `min-snapshots-to-keep`, `max-snapshot-age-ms` and `max-ref-age-ms`.
+
+When expiring snapshots, retention policies in table and snapshot references are evaluated in the following way:
+
+1. Start with an empty set of snapshots to retain
+2. Remove any refs (other than main) where the referenced snapshot is older than `max-ref-age-ms`
+3. For each branch and tag, add the referenced snapshot to the retained set
+4. For each branch, add its ancestors to the retained set until:
+    1. The snapshot is older than `max-snapshot-age-ms`, AND
+    2. The snapshot is not one of the first `min-snapshots-to-keep` in the branch (including the branch's referenced snapshot)
+5. Expire any snapshot not in the set of snapshots to retain.
 
 ### Table Metadata
 
@@ -580,12 +632,13 @@ Table metadata consists of the following fields:
 | _optional_ | _required_ | **`default-spec-id`**| ID of the "current" spec that writers should use by default. |
 | _optional_ | _required_ | **`last-partition-id`**| An integer; the highest assigned partition field ID across all partition specs for the table. This is used to ensure partition fields are always assigned an unused ID when evolving specs. |
 | _optional_ | _optional_ | **`properties`**| A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, `commit.retry.num-retries` is used to control the number of commit retries. |
-| _optional_ | _optional_ | **`current-snapshot-id`**| `long` ID of the current table snapshot. |
+| _optional_ | _optional_ | **`current-snapshot-id`**| `long` ID of the current table snapshot; must be the same as the current ID of the `main` branch in `refs`. |
 | _optional_ | _optional_ | **`snapshots`**| A list of valid snapshots. Valid snapshots are snapshots for which all data files exist in the file system. A data file must not be deleted from the file system until the last snapshot in which it was listed is garbage collected. |
 | _optional_ | _optional_ | **`snapshot-log`**| A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed. |
 | _optional_ | _optional_ | **`metadata-log`**| A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. |
 | _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored as full sort order objects. |
 | _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. |
+|            | _optional_ | **`refs`** | A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a `main` branch reference pointing to the `current-snapshot-id` even if the `refs` map is null. |
 
 For serialization details, see Appendix C.
 
@@ -993,6 +1046,27 @@ Table metadata is serialized as a JSON object according to the following table.
 |**`metadata-log`**|`JSON list of objects: [`<br />&nbsp;&nbsp;`{`<br />&nbsp;&nbsp;`"metadata-file": ,`<br />&nbsp;&nbsp;`"timestamp-ms": `<br />&nbsp;&nbsp;`},`<br />&nbsp;&nbsp;`...`<br />`]`|`[ {`<br />&nbsp;&nbsp;`"metadata-file": "s3://bucket/.../v1.json",`<br />&nbsp;&nbsp;`"timestamp-ms": 1515100...`<br />`} ]` |
 |**`sort-orders`**|`JSON sort orders (list of sort field object)`|`See above`|
 |**`default-sort-order-id`**|`JSON int`|`0`|
+|**`refs`**|`JSON map with string key and object value:`<br />`{`<br />&nbsp;&nbsp;`"<name>": {`<br />&nbsp;&nbsp;`"snapshot-id": <id>,`<br />&nbsp;&nbsp;`"type": <type>,`<br />&nbsp;&nbsp;`"max-ref-age-ms": <long>,`<br />&nbsp;&nbsp;`...`<br />&nbsp;&nbsp;`}`<br />&nbsp;&nbsp;`...`<br />`}`|`{`<br />&nbsp;&nbsp;`"test": {`<br />&nbsp;&nbsp;`"snapshot-id": 123456789000,`<br />&nbsp;&nbsp;`"type": "tag",`<br />&nbsp;&nbsp;`"max-ref-age-ms": 10000000`<br />&nbsp;&nbsp;`}`<br />`}`|
+
+### Name Mapping Serialization
+
+Name mapping is serialized as a list of field mapping JSON Objects which are serialized as follows
+
+|Field mapping field|JSON representation|Example|
+|--- |--- |--- |
+|**`names`**|`JSON list of strings`|`["latitude", "lat"]`|
+|**`field_id`**|`JSON int`|`1`|
+|**`fields`**|`JSON field mappings (list of objects)`|`[{ `<br />&nbsp;&nbsp;`"field-id": 4,`<br />&nbsp;&nbsp;`"names": ["latitude", "lat"]`<br />`}, {`<br />&nbsp;&nbsp;`"field-id": 5,`<br />&nbsp;&nbsp;`"names": ["longitude", "long"]`<br />`}]`|
+
+Example
+```json
+[ { "field-id": 1, "names": ["id", "record_id"] },
+   { "field-id": 2, "names": ["data"] },
+   { "field-id": 3, "names": ["location"], "fields": [
+       { "field-id": 4, "names": ["latitude", "lat"] },
+       { "field-id": 5, "names": ["longitude", "long"] }
+     ] } ]
+```
 
 
 ## Appendix D: Single-value serialization
diff --git a/landing-page/content/common/releases/how-to-verify-a-release.md b/landing-page/content/common/releases/how-to-verify-a-release.md
new file mode 100644
index 0000000..82cccc2
--- /dev/null
+++ b/landing-page/content/common/releases/how-to-verify-a-release.md
@@ -0,0 +1,154 @@
+---
+url: how-to-verify-a-release
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# How to Verify a Release
+
+Each Apache Iceberg release is validated by the community by holding a vote. A community release manager
+will prepare a release candidate and call a vote on the Iceberg
+[dev list](https://iceberg.apache.org/#community/#mailing-lists).
+To validate the release candidate, community members will test it out in their downstream projects and environments.
+It's recommended to report the Java, Scala, Spark, Flink and Hive versions you have tested against when you vote.
+
+In addition to testing in downstream projects, community members also check the release's signatures, checksums, and
+license documentation.
+
+## Validating a source release candidate
+
+Release announcements include links to the following:
+
+- **A source tarball**
+- **A signature (.asc)**
+- **A checksum (.sha512)**
+- **KEYS file**
+- **GitHub change comparison**
+
+After downloading the source tarball, signature, checksum, and KEYS file, here are instructions on how to
+verify signatures, checksums, and documentation.
+
+### Verifying Signatures
+
+First, import the keys.
+```bash
+curl https://dist.apache.org/repos/dist/dev/iceberg/KEYS -o KEYS
+gpg --import KEYS
+```
+
+Next, verify the `.asc` file.
+```bash
+gpg --verify apache-iceberg-{{% icebergVersion %}}.tar.gz.asc
+```
+
+### Verifying Checksums
+
+```bash
+shasum -a 512 apache-iceberg-{{% icebergVersion %}}.tar.gz.sha512
+```
+
+### Verifying License Documentation
+
+Untar the archive and change into the source directory.
+```bash
+tar xzf apache-iceberg-{{% icebergVersion %}}.tar.gz
+cd apache-iceberg-{{% icebergVersion %}}
+```
+
+Run RAT checks to validate license headers.
+```bash
+dev/check-license
+```
+
+### Verifying Build and Test
+
+To verify that the release candidate builds properly, run the following command.
+```bash
+./gradlew build
+```
+
+## Testing release binaries
+
+Release announcements will also include a maven repository location. You can use this
+location to test downstream dependencies by adding it to your maven or gradle build.
+
+To use the release in your maven build, add the following to your `POM` or `settings.xml`:
+```xml
+...
+  <repositories>
+    <repository>
+      <id>iceberg-release-candidate</id>
+      <name>Iceberg Release Candidate</name>
+      <url>${MAVEN_URL}</url>
+    </repository>
+  </repositories>
+...
+```
+
+To use the release in your gradle build, add the following to your `build.gradle`:
+```groovy
+repositories {
+    mavenCentral()
+    maven {
+        url "${MAVEN_URL}"
+    }
+}
+```
+
+!!! Note
+    Replace `${MAVEN_URL}` with the URL provided in the release announcement
+
+### Verifying with Spark
+
+To verify using spark, start a `spark-shell` with a command like the following command:
+```bash
+spark-shell \
+    --conf spark.jars.repositories=${MAVEN_URL} \
+    --packages org.apache.iceberg:iceberg-spark3-runtime:{{% icebergVersion %}} \
+    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
+    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.local.type=hadoop \
+    --conf spark.sql.catalog.local.warehouse=${LOCAL_WAREHOUSE_PATH} \
+    --conf spark.sql.catalog.local.default-namespace=default \
+    --conf spark.sql.defaultCatalog=local
+```
+
+### Verifying with Flink
+
+To verify using Flink, start a Flink SQL Client with the following command:
+```bash
+wget ${MAVEN_URL}/iceberg-flink-runtime/{{% icebergVersion %}}/iceberg-flink-runtime-{{% icebergVersion %}}.jar
+
+sql-client.sh embedded \
+    -j iceberg-flink-runtime-{{% icebergVersion %}}.jar \
+    -j ${FLINK_CONNECTOR_PACKAGE}-${HIVE_VERSION}_${SCALA_VERSION}-${FLINK_VERSION}.jar \
+    shell
+```
+
+## Voting
+
+Votes are cast by replying to the release candidate announcement email on the dev mailing list
+with either `+1`, `0`, or `-1`.
+
+> [ ] +1 Release this as Apache Iceberg {{% icebergVersion %}}
+[ ] +0
+[ ] -1 Do not release this because...
+
+In addition to your vote, it's customary to specify if your vote is binding or non-binding. Only members
+of the Project Management Committee have formally binding votes. If you're unsure, you can specify that your
+vote is non-binding. To read more about voting in the Apache framework, checkout the
+[Voting](https://www.apache.org/foundation/voting.html) information page on the Apache foundation's website.