You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2019/12/18 19:56:05 UTC

[incubator-iceberg] branch asf-site updated: Deployed dbc753a5 with MkDocs version: 1.0.4

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-iceberg.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 90539f9  Deployed dbc753a5 with MkDocs version: 1.0.4
90539f9 is described below

commit 90539f920c513b13fc7b1d9a7929258ce33b83e9
Author: Ryan Blue <bl...@apache.org>
AuthorDate: Wed Dec 18 11:55:54 2019 -0800

    Deployed dbc753a5 with MkDocs version: 1.0.4
---
 api-quickstart/index.html |  17 ++++++++---------
 configuration/index.html  |  32 +++++++++++++++++++++++++++++++-
 index.html                |   2 +-
 sitemap.xml               |  44 ++++++++++++++++++++++----------------------
 sitemap.xml.gz            | Bin 222 -> 220 bytes
 spark/index.html          |  12 +++++++++++-
 6 files changed, 73 insertions(+), 34 deletions(-)

diff --git a/api-quickstart/index.html b/api-quickstart/index.html
index 9fafeb4..f37db68 100644
--- a/api-quickstart/index.html
+++ b/api-quickstart/index.html
@@ -342,7 +342,7 @@
 <p>The Hive catalog connects to a Hive MetaStore to keep track of Iceberg tables. This example uses Spark&rsquo;s Hadoop configuration to get a Hive catalog:</p>
 <pre><code class="scala">import org.apache.iceberg.hive.HiveCatalog
 
-val catalog = new HiveCatalog(spark.sparkContext.hadoopConfiguration)
+val catalog = new HiveCatalog(spark.sessionState.newHadoopConf())
 </code></pre>
 
 <p>The <code>Catalog</code> interface defines methods for working with tables, like <code>createTable</code>, <code>loadTable</code>, <code>renameTable</code>, and <code>dropTable</code>.</p>
@@ -353,6 +353,7 @@ val table = catalog.createTable(name, schema, spec)
 // write into the new logs table with Spark 2.4
 logsDF.write
     .format(&quot;iceberg&quot;)
+    .mode(&quot;append&quot;)
     .save(&quot;logging.logs&quot;)
 </code></pre>
 
@@ -362,13 +363,14 @@ logsDF.write
 <p>To create a table in HDFS, use <code>HadoopTables</code>:</p>
 <pre><code class="scala">import org.apache.iceberg.hadoop.HadoopTables
 
-val tables = new HadoopTables(conf)
+val tables = new HadoopTables(spark.sessionState.newHadoopConf())
 
 val table = tables.create(schema, spec, &quot;hdfs:/tables/logging/logs&quot;)
 
 // write into the new logs table with Spark 2.4
 logsDF.write
     .format(&quot;iceberg&quot;)
+    .mode(&quot;append&quot;)
     .save(&quot;hdfs:/tables/logging/logs&quot;)
 </code></pre>
 
@@ -402,15 +404,12 @@ val schema = new Schema(
 <p>When a table is created, all IDs in the schema are re-assigned to ensure uniqueness.</p>
 <h3 id="convert-a-schema-from-avro">Convert a schema from Avro<a class="headerlink" href="#convert-a-schema-from-avro" title="Permanent link">&para;</a></h3>
 <p>To create an Iceberg schema from an existing Avro schema, use converters in <code>AvroSchemaUtil</code>:</p>
-<pre><code class="scala">import org.apache.iceberg.avro.AvroSchemaUtil
-import org.apache.avro.Schema.Parser
+<pre><code class="scala">import org.apache.avro.Schema.Parser
+import org.apache.iceberg.avro.AvroSchemaUtil
 
-val avroSchema = new Parser().parse(
-    &quot;&quot;&quot;{ &quot;type&quot;: &quot;record&quot;, &quot;name&quot;: &quot;com.example.AvroType&quot;,
-      |  &quot;fields&quot;: [ ... ]
-      |}&quot;&quot;&quot;.stripMargin
+val avroSchema = new Parser().parse(&quot;&quot;&quot;{&quot;type&quot;: &quot;record&quot;, ... }&quot;&quot;&quot;)
 
-val schema = AvroSchemaUtil.convert(avroSchema)
+val icebergSchema = AvroSchemaUtil.toIceberg(avroSchema)
 </code></pre>
 
 <h3 id="convert-a-schema-from-spark">Convert a schema from Spark<a class="headerlink" href="#convert-a-schema-from-spark" title="Permanent link">&para;</a></h3>
diff --git a/configuration/index.html b/configuration/index.html
index a480afb..cc70c50 100644
--- a/configuration/index.html
+++ b/configuration/index.html
@@ -397,6 +397,11 @@
 <td>Parquet compression codec</td>
 </tr>
 <tr>
+<td>write.parquet.compression-level</td>
+<td>null</td>
+<td>Parquet compression level</td>
+</tr>
+<tr>
 <td>write.avro.compression-codec</td>
 <td>gzip</td>
 <td>Avro compression codec</td>
@@ -419,7 +424,22 @@
 <tr>
 <td>write.target-file-size-bytes</td>
 <td>Long.MAX_VALUE</td>
-<td>Controls the size of files generated to target about this many bytes.</td>
+<td>Controls the size of files generated to target about this many bytes</td>
+</tr>
+<tr>
+<td>write.wap.enabled</td>
+<td>false</td>
+<td>Enables write-audit-publish writes</td>
+</tr>
+<tr>
+<td>write.metadata.delete-after-commit.enabled</td>
+<td>false</td>
+<td>Controls whether to delete the oldest version metadata files after commit</td>
+</tr>
+<tr>
+<td>write.metadata.previous-versions-max</td>
+<td>100</td>
+<td>The max number of previous version metadata files to keep before deleting after commit</td>
 </tr>
 </tbody>
 </table>
@@ -463,6 +483,11 @@
 <td>100</td>
 <td>Minimum number of manifests to accumulate before merging</td>
 </tr>
+<tr>
+<td>commit.manifest-merge.enabled</td>
+<td>true</td>
+<td>Controls whether to automatically merge manifests on writes</td>
+</tr>
 </tbody>
 </table>
 <h2 id="spark-options">Spark options<a class="headerlink" href="#spark-options" title="Permanent link">&para;</a></h2>
@@ -539,6 +564,11 @@ df.write
 <td>As per table property</td>
 <td>Overrides this table&rsquo;s write.target-file-size-bytes</td>
 </tr>
+<tr>
+<td>check-nullability</td>
+<td>true</td>
+<td>Sets the nullable check on fields</td>
+</tr>
 </tbody>
 </table></div>
         
diff --git a/index.html b/index.html
index b06e7a2..7f50dcc 100644
--- a/index.html
+++ b/index.html
@@ -419,5 +419,5 @@
 
 <!--
 MkDocs version : 1.0.4
-Build Date UTC : 2019-10-26 00:33:02
+Build Date UTC : 2019-12-18 19:55:54
 -->
diff --git a/sitemap.xml b/sitemap.xml
index c3600f3..b01fec1 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,57 +2,57 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
@@ -62,17 +62,17 @@
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
@@ -82,12 +82,12 @@
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
@@ -97,7 +97,7 @@
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
@@ -107,27 +107,27 @@
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
      <loc>None</loc>
-     <lastmod>2019-10-25</lastmod>
+     <lastmod>2019-12-18</lastmod>
      <changefreq>daily</changefreq>
     </url>
     <url>
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 50a8112..2578750 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/spark/index.html b/spark/index.html
index a132bcc..edad352 100644
--- a/spark/index.html
+++ b/spark/index.html
@@ -331,7 +331,7 @@
  -->
 
 <h1 id="spark">Spark<a class="headerlink" href="#spark" title="Permanent link">&para;</a></h1>
-<p>Iceberg uses Spark&rsquo;s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions.</p>
+<p>Iceberg uses Apache Spark&rsquo;s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions.</p>
 <table>
 <thead>
 <tr>
@@ -423,6 +423,16 @@
 <h2 id="spark-24">Spark 2.4<a class="headerlink" href="#spark-24" title="Permanent link">&para;</a></h2>
 <p>To use Iceberg in Spark 2.4, add the <code>iceberg-spark-runtime</code> Jar to Spark&rsquo;s <code>jars</code> folder.</p>
 <p>Spark 2.4 is limited to reading and writing existing Iceberg tables. Use the <a href="../api">Iceberg API</a> to create Iceberg tables.</p>
+<p>Recommended way is to include Iceberg&rsquo;s latest released using the <code>--packages</code> option:</p>
+<pre><code class="sh">spark-shell --packages org.apache.iceberg:iceberg-spark-runtime:0.7.0-incubating
+</code></pre>
+
+<p>You can also build Iceberg locally, and add the jar to Spark&rsquo;s classpath. This can be helpful to test unreleased features or while developing something new:</p>
+<pre><code class="sh">./gradlew assemble
+spark-shell --jars spark-runtime/build/libs/iceberg-spark-runtime-93990904.jar
+</code></pre>
+
+<p>Where you have to replace <code>93990904</code> with the git hash that you&rsquo;re using.</p>
 <h3 id="reading-an-iceberg-table">Reading an Iceberg table<a class="headerlink" href="#reading-an-iceberg-table" title="Permanent link">&para;</a></h3>
 <p>To read an Iceberg table, use the <code>iceberg</code> format in <code>DataFrameReader</code>:</p>
 <pre><code class="scala">spark.read.format(&quot;iceberg&quot;).load(&quot;db.table&quot;)