You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2019/07/15 04:40:57 UTC

[incubator-iceberg] 06/06: Deployed 089343d5 with MkDocs version: 1.0.4

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-iceberg.git

commit 8a4757b13ebfa4e52c9595c5fc4759d037c8db1b
Author: Ryan Blue <bl...@apache.org>
AuthorDate: Sun Jul 14 20:40:39 2019 -0800

    Deployed 089343d5 with MkDocs version: 1.0.4
---
 api-quickstart/index.html |   4 +-
 api/index.html            |   1 -
 configuration/index.html  |   5 ++
 css/extra.css             |  23 ++++++++
 index.html                |   2 +-
 sitemap.xml               |  30 +++++-----
 sitemap.xml.gz            | Bin 216 -> 216 bytes
 spark/index.html          | 136 +++++++++++++++++++++++++++++++++++++++-------
 spec/index.html           |  41 ++++++++++++--
 9 files changed, 198 insertions(+), 44 deletions(-)

diff --git a/api-quickstart/index.html b/api-quickstart/index.html
index cf098d7..860553f 100644
--- a/api-quickstart/index.html
+++ b/api-quickstart/index.html
@@ -269,7 +269,7 @@
 
 <h1 id="api-quickstart">API Quickstart<a class="headerlink" href="#api-quickstart" title="Permanent link">&para;</a></h1>
 <h2 id="create-a-table">Create a table<a class="headerlink" href="#create-a-table" title="Permanent link">&para;</a></h2>
-<p>Tables are created using either a <code>Catalog</code> or an implementation of the <code>Tables</code> interface.</p>
+<p>Tables are created using either a <a href="/javadoc/master/index.html?org/apache/iceberg/catalog/Catalog.html"><code>Catalog</code></a> or an implementation of the <a href="/javadoc/master/index.html?org/apache/iceberg/Tables.html"><code>Tables</code></a> interface.</p>
 <h3 id="using-a-hive-catalog">Using a Hive catalog<a class="headerlink" href="#using-a-hive-catalog" title="Permanent link">&para;</a></h3>
 <p>The Hive catalog connects to a Hive MetaStore to keep track of Iceberg tables. This example uses Spark&rsquo;s Hadoop configuration to get a Hive catalog:</p>
 <pre><code class="scala">import org.apache.iceberg.hive.HiveCatalog
@@ -285,7 +285,7 @@ val table = catalog.createTable(name, schema, spec)
 // write into the new logs table with Spark 2.4
 logsDF.write
     .format(&quot;iceberg&quot;)
-    .save(&quot;db.table&quot;)
+    .save(&quot;logging.logs&quot;)
 </code></pre>
 
 <p>The logs <a href="#create-a-schema">schema</a> and <a href="#create-a-partition-spec">partition spec</a> are created below.</p>
diff --git a/api/index.html b/api/index.html
index 5d72643..a1e18e3 100644
--- a/api/index.html
+++ b/api/index.html
@@ -414,7 +414,6 @@ ListType list = ListType.ofRequired(1, IntegerType.get());
 <li><code>iceberg-data</code> is a client library used to read Iceberg tables from JVM applications</li>
 <li><code>iceberg-pig</code> is an implementation of Pig&rsquo;s LoadFunc API for Iceberg</li>
 <li><code>iceberg-runtime</code> generates a shaded runtime jar for Spark to integrate with iceberg tables</li>
-<li><code>iceberg-presto-runtime</code> generates a shaded runtime jar that is used by presto to integrate with iceberg tables</li>
 </ul></div>
         
         
diff --git a/configuration/index.html b/configuration/index.html
index cd68905..61f354d 100644
--- a/configuration/index.html
+++ b/configuration/index.html
@@ -328,6 +328,11 @@
 <td>gzip</td>
 <td>Avro compression codec</td>
 </tr>
+<tr>
+<td>write.metadata.compression-codec</td>
+<td>none</td>
+<td>Metadata compression codec; none or gzip</td>
+</tr>
 </tbody>
 </table>
 <h3 id="table-behavior-properties">Table behavior properties<a class="headerlink" href="#table-behavior-properties" title="Permanent link">&para;</a></h3>
diff --git a/css/extra.css b/css/extra.css
index ea1ac09..b9b9f1e 100644
--- a/css/extra.css
+++ b/css/extra.css
@@ -47,6 +47,11 @@ h3:target .headerlink {
   opacity: 1;
 }
 
+h4 {
+  font-weight: 500;
+  font-size: 22px;
+}
+
 h4:target .headerlink {
   color: #008cba;
   opacity: 1;
@@ -56,3 +61,21 @@ h5:target .headerlink {
   color: #008cba;
   opacity: 1;
 }
+
+code {
+  color: #458;
+}
+
+pre {
+  width: max-content;
+  min-width: 60em;
+  margin-top: 0.5em;
+  margin-bottom: 0.5em;
+}
+
+.admonition {
+  margin: 0.5em;
+  margin-left: 0em;
+  padding: 0.5em;
+  padding-left: 1em;
+}
diff --git a/index.html b/index.html
index 5de9ac9..dde4213 100644
--- a/index.html
+++ b/index.html
@@ -368,5 +368,5 @@
 
 <!--
 MkDocs version : 1.0.4
-Build Date UTC : 2019-07-05 23:28:31
+Build Date UTC : 2019-07-15 04:40:39
 -->
diff --git a/sitemap.xml b/sitemap.xml
index b7b102c..55f7197 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,42 +2,42 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
@@ -47,17 +47,17 @@
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
@@ -67,12 +67,12 @@
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
@@ -87,12 +87,12 @@
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-07-05</lastmod>
+     <lastmod>2019-07-14</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index d25f0b9..ea50037 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/spark/index.html b/spark/index.html
index 48aa525..743ed2c 100644
--- a/spark/index.html
+++ b/spark/index.html
@@ -257,6 +257,7 @@
                 <li class="third-level"><a href="#querying-with-sql">Querying with SQL</a></li>
                 <li class="third-level"><a href="#appending-data">Appending data</a></li>
                 <li class="third-level"><a href="#overwriting-data">Overwriting data</a></li>
+                <li class="third-level"><a href="#inspecting-tables">Inspecting tables</a></li>
     </ul>
 </div></div>
         <div class="col-md-9" role="main">
@@ -274,70 +275,76 @@
 </thead>
 <tbody>
 <tr>
-<td>SQL create table</td>
-<td></td>
+<td><a href="#reading-an-iceberg-table">DataFrame reads</a></td>
+<td>✔️</td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td>SQL alter table</td>
-<td></td>
+<td><a href="#appending-data">DataFrame append</a></td>
+<td>✔️</td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td>SQL drop table</td>
-<td></td>
+<td><a href="#overwriting-data">DataFrame overwrite</a></td>
+<td>✔️</td>
+<td>✔️</td>
+<td>Overwrite mode replaces partitions dynamically</td>
+</tr>
+<tr>
+<td><a href="#inspecting-tables">Metadata tables</a></td>
+<td>✔️</td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td>SQL select</td>
+<td>SQL create table</td>
 <td></td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td>SQL create table as</td>
+<td>SQL alter table</td>
 <td></td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td>SQL replace table as</td>
+<td>SQL drop table</td>
 <td></td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td>SQL insert into</td>
+<td>SQL select</td>
 <td></td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td>SQL insert overwrite</td>
+<td>SQL create table as</td>
 <td></td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td><a href="#reading-an-iceberg-table">DataFrame reads</a></td>
-<td>✔️</td>
+<td>SQL replace table as</td>
+<td></td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td><a href="#appending-data">DataFrame append</a></td>
-<td>✔️</td>
+<td>SQL insert into</td>
+<td></td>
 <td>✔️</td>
 <td></td>
 </tr>
 <tr>
-<td><a href="#overwriting-data">DataFrame overwrite</a></td>
-<td>✔️</td>
+<td>SQL insert overwrite</td>
+<td></td>
 <td>✔️</td>
-<td>Overwrite mode replaces partitions dynamically</td>
+<td></td>
 </tr>
 </tbody>
 </table>
@@ -405,7 +412,98 @@ data.write
 <div class="admonition warning">
 <p class="admonition-title">Warning</p>
 <p><strong>Spark does not define the behavior of DataFrame overwrite</strong>. Like most sources, Iceberg will dynamically overwrite partitions when the dataframe contains rows in a partition. Unpartitioned tables are completely overwritten.</p>
-</div></div>
+</div>
+<h3 id="inspecting-tables">Inspecting tables<a class="headerlink" href="#inspecting-tables" title="Permanent link">&para;</a></h3>
+<p>To inspect a table&rsquo;s history, snapshots, and other metadata, Iceberg supports metadata tables.</p>
+<p>Metadata tables are identified by adding the metadata table name after the original table name. For example, history for <code>db.table</code> is read using <code>db.table.history</code>.</p>
+<h4 id="history">History<a class="headerlink" href="#history" title="Permanent link">&para;</a></h4>
+<p>To show table history, run:</p>
+<pre><code class="scala">spark.read.format(&quot;iceberg&quot;).load(&quot;db.table.history&quot;).show(truncate = false)
+</code></pre>
+
+<pre><code class="text">+-------------------------+---------------------+---------------------+---------------------+
+| made_current_at         | snapshot_id         | parent_id           | is_current_ancestor |
++-------------------------+---------------------+---------------------+---------------------+
+| 2019-02-08 03:29:51.215 | 5781947118336215154 | NULL                | true                |
+| 2019-02-08 03:47:55.948 | 5179299526185056830 | 5781947118336215154 | true                |
+| 2019-02-09 16:24:30.13  | 296410040247533544  | 5179299526185056830 | false               |
+| 2019-02-09 16:32:47.336 | 2999875608062437330 | 5179299526185056830 | true                |
+| 2019-02-09 19:42:03.919 | 8924558786060583479 | 2999875608062437330 | true                |
+| 2019-02-09 19:49:16.343 | 6536733823181975045 | 8924558786060583479 | true                |
++-------------------------+---------------------+---------------------+---------------------+
+</code></pre>
+
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p><strong>This shows a commit that was rolled back.</strong> The example has two snapshots with the same parent, and one is <em>not</em> an ancestor of the current table state.</p>
+</div>
+<h4 id="snapshots">Snapshots<a class="headerlink" href="#snapshots" title="Permanent link">&para;</a></h4>
+<p>To show the valid snapshots for a table, run:</p>
+<pre><code class="scala">spark.read.format(&quot;iceberg&quot;).load(&quot;db.table.snapshots&quot;).show(truncate = false)
+</code></pre>
+
+<pre><code class="text">+-------------------------+----------------+-----------+-----------+----------------------------------------------------+-------------------------------------------------------+
+| committed_at            | snapshot_id    | parent_id | operation | manifest_list                                      | summary                                               |
++-------------------------+----------------+-----------+-----------+----------------------------------------------------+-------------------------------------------------------+
+| 2019-02-08 03:29:51.215 | 57897183625154 | null      | append    | s3://.../table/metadata/snap-57897183625154-1.avro | { added-records -&gt; 2478404, total-records -&gt; 2478404, |
+|                         |                |           |           |                                                    |   added-data-files -&gt; 438, total-data-files -&gt; 438,   |
+|                         |                |           |           |                                                    |   spark.app.id -&gt; application_1520379288616_155055 }  |
+| ...                     | ...            | ...       | ...       | ...                                                | ...                                                   |
++-------------------------+----------------+-----------+-----------+----------------------------------------------------+-------------------------------------------------------+
+</code></pre>
+
+<p>You can also join snapshots to table history. For example, this query will show table history, with the application ID that wrote each snapshot:</p>
+<pre><code class="scala">spark.read.format(&quot;iceberg&quot;).load(&quot;db.table.history&quot;).createOrReplaceTempView(&quot;history&quot;)
+spark.read.format(&quot;iceberg&quot;).load(&quot;db.table.snapshots&quot;).createOrReplaceTempView(&quot;snapshots&quot;)
+</code></pre>
+
+<pre><code class="sql">select
+    h.made_current_at,
+    s.operation,
+    h.snapshot_id,
+    h.is_current_ancestor,
+    s.summary['spark.app.id']
+from history h
+join snapshots s
+  on h.snapshot_id = s.snapshot_id
+order by made_current_at
+</code></pre>
+
+<pre><code class="text">+-------------------------+-----------+----------------+---------------------+----------------------------------+
+| made_current_at         | operation | snapshot_id    | is_current_ancestor | summary[spark.app.id]            |
++-------------------------+-----------+----------------+---------------------+----------------------------------+
+| 2019-02-08 03:29:51.215 | append    | 57897183625154 | true                | application_1520379288616_155055 |
+| 2019-02-09 16:24:30.13  | delete    | 29641004024753 | false               | application_1520379288616_151109 |
+| 2019-02-09 16:32:47.336 | append    | 57897183625154 | true                | application_1520379288616_155055 |
+| 2019-02-08 03:47:55.948 | overwrite | 51792995261850 | true                | application_1520379288616_152431 |
++-------------------------+-----------+----------------+---------------------+----------------------------------+
+</code></pre>
+
+<h4 id="manifests">Manifests<a class="headerlink" href="#manifests" title="Permanent link">&para;</a></h4>
+<p>To show the a table&rsquo;s file manifests and each file&rsquo;s metadata, run:</p>
+<pre><code class="scala">spark.read.format(&quot;iceberg&quot;).load(&quot;db.table.manifests&quot;).show(truncate = false)
+</code></pre>
+
+<pre><code class="text">+----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+---------------------------------+
+| path                                                                 | length | partition_spec_id | added_snapshot_id   | added_data_files_count | existing_data_files_count | deleted_data_files_count | partitions                      |
++----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+---------------------------------+
+| s3://.../table/metadata/45b5290b-ee61-4788-b324-b1e2735c0e10-m0.avro | 4479   | 0                 | 6668963634911763636 | 8                      | 0                         | 0                        | [[false,2019-05-13,2019-05-15]] |
++----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+---------------------------------+
+</code></pre>
+
+<h4 id="files">Files<a class="headerlink" href="#files" title="Permanent link">&para;</a></h4>
+<p>To show the a table&rsquo;s data files and each file&rsquo;s metadata, run:</p>
+<pre><code class="scala">spark.read.format(&quot;iceberg&quot;).load(&quot;db.table.files&quot;).show(truncate = false)
+</code></pre>
+
+<pre><code class="text">+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
+| file_path                                                               | file_format | record_count | file_size_in_bytes | column_sizes       | value_counts     | null_value_counts | lower_bounds    | upper_bounds    | key_metadata | split_offsets |
++-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
+| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET     | 1            | 597                | [1 -&gt; 90, 2 -&gt; 62] | [1 -&gt; 1, 2 -&gt; 1] | [1 -&gt; 0, 2 -&gt; 0]  | [1 -&gt; , 2 -&gt; c] | [1 -&gt; , 2 -&gt; c] | null         | [4]           |
+| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET     | 1            | 597                | [1 -&gt; 90, 2 -&gt; 62] | [1 -&gt; 1, 2 -&gt; 1] | [1 -&gt; 0, 2 -&gt; 0]  | [1 -&gt; , 2 -&gt; b] | [1 -&gt; , 2 -&gt; b] | null         | [4]           |
+| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET     | 1            | 597                | [1 -&gt; 90, 2 -&gt; 62] | [1 -&gt; 1, 2 -&gt; 1] | [1 -&gt; 0, 2 -&gt; 0]  | [1 -&gt; , 2 -&gt; a] | [1 -&gt; , 2 -&gt; a] | null         | [4]           |
++-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
+</code></pre></div>
         
         
     </div>
diff --git a/spec/index.html b/spec/index.html
index baa862a..4e68733 100644
--- a/spec/index.html
+++ b/spec/index.html
@@ -499,10 +499,14 @@ Timestamps <em>without time zone</em> represent a date and time of day regardles
 <p>All transforms must return <code>null</code> for a <code>null</code> input value.</p>
 <h4 id="bucket-transform-details">Bucket Transform Details<a class="headerlink" href="#bucket-transform-details" title="Permanent link">&para;</a></h4>
 <p>Bucket partition transforms use a 32-bit hash of the source value. The 32-bit hash implementation is the 32-bit Murmur3 hash, x86 variant, seeded with 0.</p>
-<p>Transforms are parameterized by a number of buckets[^3], <code>N</code>. The hash mod <code>N</code> must produce a positive value by first discarding the sign bit of the hash value. In pseudo-code, the function is:</p>
+<p>Transforms are parameterized by a number of buckets [1], <code>N</code>. The hash mod <code>N</code> must produce a positive value by first discarding the sign bit of the hash value. In pseudo-code, the function is:</p>
 <pre><code>  def bucket_N(x) = (murmur3_x86_32_hash(x) &amp; Integer.MAX_VALUE) % N
 </code></pre>
 
+<p>Notes:</p>
+<ol>
+<li>Changing the number of buckets as a table grows is possible by evolving the partition spec.</li>
+</ol>
 <p>For hash function details by type, see Appendix B.</p>
 <h4 id="truncate-transform-details">Truncate Transform Details<a class="headerlink" href="#truncate-transform-details" title="Permanent link">&para;</a></h4>
 <table>
@@ -679,7 +683,11 @@ Timestamps <em>without time zone</em> represent a date and time of day regardles
 <h4 id="manifest-entry-fields">Manifest Entry Fields<a class="headerlink" href="#manifest-entry-fields" title="Permanent link">&para;</a></h4>
 <p>The manifest entry fields are used to keep track of the snapshot in which files were added or logically deleted. The <code>data_file</code> struct is nested inside of the manifest entry so that it can be easily passed to job planning without the manifest entry fields.</p>
 <p>When a data file is added to the dataset, it’s manifest entry should store the snapshot ID in which the file was added and set status to 1 (added).</p>
-<p>When a data file is replaced or deleted from the dataset, it’s manifest entry fields store the snapshot ID in which the file was deleted and status 2 (deleted). The file may be deleted from the file system when the snapshot in which it was deleted is garbage collected, assuming that older snapshots have also been garbage collected[^4].</p>
+<p>When a data file is replaced or deleted from the dataset, it’s manifest entry fields store the snapshot ID in which the file was deleted and status 2 (deleted). The file may be deleted from the file system when the snapshot in which it was deleted is garbage collected, assuming that older snapshots have also been garbage collected [1].</p>
+<p>Notes:</p>
+<ol>
+<li>Technically, data files can be deleted when the last snapshot that contains the file as “live” data is garbage collected. But this is harder to detect and requires finding the diff of multiple snapshots. It is easier to track what files are deleted in a snapshot and delete them when that snapshot expires.</li>
+</ol>
 <h3 id="snapshots">Snapshots<a class="headerlink" href="#snapshots" title="Permanent link">&para;</a></h3>
 <p>A snapshot consists of the following fields:</p>
 <ul>
@@ -706,8 +714,12 @@ Timestamps <em>without time zone</em> represent a date and time of day regardles
 <h4 id="scan-planning">Scan Planning<a class="headerlink" href="#scan-planning" title="Permanent link">&para;</a></h4>
 <p>Scans are planned by reading the manifest files for the current snapshot listed in the table metadata. Deleted entries in a manifest are not included in the scan.</p>
 <p>For each manifest, scan predicates, that filter data rows, are converted to partition predicates, that filter data files, and used to select the data files in the manifest. This conversion uses the partition spec used to write the manifest file.</p>
-<p>Scan predicates are converted to partition predicates using an inclusive projection: if a scan predicate matches a row, then the partition predicate must match that row’s partition. This is an <em>inclusive projection</em>[^5] because rows that do not match the scan predicate may be included in the scan by the partition predicate.</p>
+<p>Scan predicates are converted to partition predicates using an inclusive projection: if a scan predicate matches a row, then the partition predicate must match that row’s partition. This is an <em>inclusive projection</em> [1] because rows that do not match the scan predicate may be included in the scan by the partition predicate.</p>
 <p>For example, an <code>events</code> table with a timestamp column named <code>ts</code> that is partitioned by <code>ts_day=day(ts)</code> is queried by users with ranges over the timestamp column: <code>ts &gt; X</code>. The inclusive projection is <code>ts_day &gt;= day(X)</code>, which is used to select files that may have matching rows. Note that, in most cases, timestamps just before <code>X</code> will be included in the scan because the file contains rows that match the predica [...]
+<p>Notes:</p>
+<ol>
+<li>An alternative, <em>strict projection</em>, creates a partition predicate that will match a file if all of the rows in the file must match the scan predicate. These projections are used to calculate the residual predicates for each file in a scan.</li>
+</ol>
 <h4 id="manifest-lists">Manifest Lists<a class="headerlink" href="#manifest-lists" title="Permanent link">&para;</a></h4>
 <p>Snapshots are embedded in table metadata, but the list of manifests for a snapshot can be stored in a separate manifest list file.</p>
 <p>A manifest list encodes extra fields that can be used to avoid scanning all of the manifests in a snapshot when planning a table scan. </p>
@@ -817,7 +829,11 @@ Timestamps <em>without time zone</em> represent a date and time of day regardles
 <tbody>
 <tr>
 <td><strong><code>format-version</code></strong></td>
-<td>An integer version number for the format. Currently, this is always 1.</td>
+<td>An integer version number for the format. Currently, this is always 1. Implementations must throw an exception if a table&rsquo;s version is higher than the supported version.</td>
+</tr>
+<tr>
+<td><strong><code>table-uuid</code></strong></td>
+<td>A UUID that identifies the table, generated when the table is created. Implementations must throw an exception if a table&rsquo;s UUID does not match the expected UUID after refreshing metadata.</td>
 </tr>
 <tr>
 <td><strong><code>location</code></strong></td>
@@ -867,7 +883,7 @@ Timestamps <em>without time zone</em> represent a date and time of day regardles
 </table>
 <p>For serialization details, see Appendix C.</p>
 <h4 id="file-system-tables">File System Tables<a class="headerlink" href="#file-system-tables" title="Permanent link">&para;</a></h4>
-<p>An atomic swap can be implemented using atomic rename in file systems that support it, like HDFS or most local file systems[^6].</p>
+<p>An atomic swap can be implemented using atomic rename in file systems that support it, like HDFS or most local file systems [1].</p>
 <p>Each version of table metadata is stored in a metadata folder under the table’s base location using a file naming scheme that includes a version number, <code>V</code>: <code>v&lt;V&gt;.metadata.json</code>. To commit a new metadata version, <code>V+1</code>, the writer performs the following steps:</p>
 <ol>
 <li>Read the current table metadata version <code>V</code>.</li>
@@ -879,8 +895,12 @@ Timestamps <em>without time zone</em> represent a date and time of day regardles
 </ol>
 </li>
 </ol>
+<p>Notes:</p>
+<ol>
+<li>The file system table scheme is implemented in <a href="/javadoc/master/index.html?org/apache/iceberg/hadoop/HadoopTableOperations.html">HadoopTableOperations</a>.</li>
+</ol>
 <h4 id="metastore-tables">Metastore Tables<a class="headerlink" href="#metastore-tables" title="Permanent link">&para;</a></h4>
-<p>The atomic swap needed to commit new versions of table metadata can be implemented by storing a pointer in a metastore or database that is updated with a check-and-put operation[^7]. The check-and-put validates that the version of the table that a write is based on is still current and then makes the new metadata from the write the current version.</p>
+<p>The atomic swap needed to commit new versions of table metadata can be implemented by storing a pointer in a metastore or database that is updated with a check-and-put operation [1]. The check-and-put validates that the version of the table that a write is based on is still current and then makes the new metadata from the write the current version.</p>
 <p>Each version of table metadata is stored in a metadata folder under the table’s base location using a naming scheme that includes a version and UUID: <code>&lt;V&gt;-&lt;uuid&gt;.metadata.json</code>. To commit a new metadata version, <code>V+1</code>, the writer performs the following steps:</p>
 <ol start="2">
 <li>Create a new table metadata file based on the current metadata.</li>
@@ -891,6 +911,10 @@ Timestamps <em>without time zone</em> represent a date and time of day regardles
 </ol>
 </li>
 </ol>
+<p>Notes:</p>
+<ol>
+<li>The metastore table scheme is partly implemented in <a href="/javadoc/master/index.html?org/apache/iceberg/BaseMetastoreTableOperations.html">BaseMetastoreTableOperations</a>.</li>
+</ol>
 <h2 id="appendix-a-format-specific-requirements">Appendix A: Format-specific Requirements<a class="headerlink" href="#appendix-a-format-specific-requirements" title="Permanent link">&para;</a></h2>
 <h3 id="avro">Avro<a class="headerlink" href="#avro" title="Permanent link">&para;</a></h3>
 <p><strong>Data Type Mappings</strong></p>
@@ -1567,6 +1591,11 @@ Hash results are not dependent on decimal scale, which is part of the type, not
 <td><code>1</code></td>
 </tr>
 <tr>
+<td><strong><code>table-uuid</code></strong></td>
+<td><code>JSON string</code></td>
+<td><code>"fb072c92-a02b-11e9-ae9c-1bb7bc9eca94"</code></td>
+</tr>
+<tr>
 <td><strong><code>location</code></strong></td>
 <td><code>JSON string</code></td>
 <td><code>"s3://b/wh/data.db/table"</code></td>