You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2019/12/18 19:56:05 UTC
[incubator-iceberg] branch asf-site updated: Deployed dbc753a5 with
MkDocs version: 1.0.4
This is an automated email from the ASF dual-hosted git repository.
blue pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-iceberg.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 90539f9 Deployed dbc753a5 with MkDocs version: 1.0.4
90539f9 is described below
commit 90539f920c513b13fc7b1d9a7929258ce33b83e9
Author: Ryan Blue <bl...@apache.org>
AuthorDate: Wed Dec 18 11:55:54 2019 -0800
Deployed dbc753a5 with MkDocs version: 1.0.4
---
api-quickstart/index.html | 17 ++++++++---------
configuration/index.html | 32 +++++++++++++++++++++++++++++++-
index.html | 2 +-
sitemap.xml | 44 ++++++++++++++++++++++----------------------
sitemap.xml.gz | Bin 222 -> 220 bytes
spark/index.html | 12 +++++++++++-
6 files changed, 73 insertions(+), 34 deletions(-)
diff --git a/api-quickstart/index.html b/api-quickstart/index.html
index 9fafeb4..f37db68 100644
--- a/api-quickstart/index.html
+++ b/api-quickstart/index.html
@@ -342,7 +342,7 @@
<p>The Hive catalog connects to a Hive MetaStore to keep track of Iceberg tables. This example uses Spark’s Hadoop configuration to get a Hive catalog:</p>
<pre><code class="scala">import org.apache.iceberg.hive.HiveCatalog
-val catalog = new HiveCatalog(spark.sparkContext.hadoopConfiguration)
+val catalog = new HiveCatalog(spark.sessionState.newHadoopConf())
</code></pre>
<p>The <code>Catalog</code> interface defines methods for working with tables, like <code>createTable</code>, <code>loadTable</code>, <code>renameTable</code>, and <code>dropTable</code>.</p>
@@ -353,6 +353,7 @@ val table = catalog.createTable(name, schema, spec)
// write into the new logs table with Spark 2.4
logsDF.write
.format("iceberg")
+ .mode("append")
.save("logging.logs")
</code></pre>
@@ -362,13 +363,14 @@ logsDF.write
<p>To create a table in HDFS, use <code>HadoopTables</code>:</p>
<pre><code class="scala">import org.apache.iceberg.hadoop.HadoopTables
-val tables = new HadoopTables(conf)
+val tables = new HadoopTables(spark.sessionState.newHadoopConf())
val table = tables.create(schema, spec, "hdfs:/tables/logging/logs")
// write into the new logs table with Spark 2.4
logsDF.write
.format("iceberg")
+ .mode("append")
.save("hdfs:/tables/logging/logs")
</code></pre>
@@ -402,15 +404,12 @@ val schema = new Schema(
<p>When a table is created, all IDs in the schema are re-assigned to ensure uniqueness.</p>
<h3 id="convert-a-schema-from-avro">Convert a schema from Avro<a class="headerlink" href="#convert-a-schema-from-avro" title="Permanent link">¶</a></h3>
<p>To create an Iceberg schema from an existing Avro schema, use converters in <code>AvroSchemaUtil</code>:</p>
-<pre><code class="scala">import org.apache.iceberg.avro.AvroSchemaUtil
-import org.apache.avro.Schema.Parser
+<pre><code class="scala">import org.apache.avro.Schema.Parser
+import org.apache.iceberg.avro.AvroSchemaUtil
-val avroSchema = new Parser().parse(
- """{ "type": "record", "name": "com.example.AvroType",
- | "fields": [ ... ]
- |}""".stripMargin
+val avroSchema = new Parser().parse("""{"type": "record", ... }""")
-val schema = AvroSchemaUtil.convert(avroSchema)
+val icebergSchema = AvroSchemaUtil.toIceberg(avroSchema)
</code></pre>
<h3 id="convert-a-schema-from-spark">Convert a schema from Spark<a class="headerlink" href="#convert-a-schema-from-spark" title="Permanent link">¶</a></h3>
diff --git a/configuration/index.html b/configuration/index.html
index a480afb..cc70c50 100644
--- a/configuration/index.html
+++ b/configuration/index.html
@@ -397,6 +397,11 @@
<td>Parquet compression codec</td>
</tr>
<tr>
+<td>write.parquet.compression-level</td>
+<td>null</td>
+<td>Parquet compression level</td>
+</tr>
+<tr>
<td>write.avro.compression-codec</td>
<td>gzip</td>
<td>Avro compression codec</td>
@@ -419,7 +424,22 @@
<tr>
<td>write.target-file-size-bytes</td>
<td>Long.MAX_VALUE</td>
-<td>Controls the size of files generated to target about this many bytes.</td>
+<td>Controls the size of files generated to target about this many bytes</td>
+</tr>
+<tr>
+<td>write.wap.enabled</td>
+<td>false</td>
+<td>Enables write-audit-publish writes</td>
+</tr>
+<tr>
+<td>write.metadata.delete-after-commit.enabled</td>
+<td>false</td>
+<td>Controls whether to delete the oldest version metadata files after commit</td>
+</tr>
+<tr>
+<td>write.metadata.previous-versions-max</td>
+<td>100</td>
+<td>The max number of previous version metadata files to keep before deleting after commit</td>
</tr>
</tbody>
</table>
@@ -463,6 +483,11 @@
<td>100</td>
<td>Minimum number of manifests to accumulate before merging</td>
</tr>
+<tr>
+<td>commit.manifest-merge.enabled</td>
+<td>true</td>
+<td>Controls whether to automatically merge manifests on writes</td>
+</tr>
</tbody>
</table>
<h2 id="spark-options">Spark options<a class="headerlink" href="#spark-options" title="Permanent link">¶</a></h2>
@@ -539,6 +564,11 @@ df.write
<td>As per table property</td>
<td>Overrides this table’s write.target-file-size-bytes</td>
</tr>
+<tr>
+<td>check-nullability</td>
+<td>true</td>
+<td>Sets the nullable check on fields</td>
+</tr>
</tbody>
</table></div>
diff --git a/index.html b/index.html
index b06e7a2..7f50dcc 100644
--- a/index.html
+++ b/index.html
@@ -419,5 +419,5 @@
<!--
MkDocs version : 1.0.4
-Build Date UTC : 2019-10-26 00:33:02
+Build Date UTC : 2019-12-18 19:55:54
-->
diff --git a/sitemap.xml b/sitemap.xml
index c3600f3..b01fec1 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,57 +2,57 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
@@ -62,17 +62,17 @@
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
@@ -82,12 +82,12 @@
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
@@ -97,7 +97,7 @@
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
@@ -107,27 +107,27 @@
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>None</loc>
- <lastmod>2019-10-25</lastmod>
+ <lastmod>2019-12-18</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 50a8112..2578750 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/spark/index.html b/spark/index.html
index a132bcc..edad352 100644
--- a/spark/index.html
+++ b/spark/index.html
@@ -331,7 +331,7 @@
-->
<h1 id="spark">Spark<a class="headerlink" href="#spark" title="Permanent link">¶</a></h1>
-<p>Iceberg uses Spark’s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions.</p>
+<p>Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions.</p>
<table>
<thead>
<tr>
@@ -423,6 +423,16 @@
<h2 id="spark-24">Spark 2.4<a class="headerlink" href="#spark-24" title="Permanent link">¶</a></h2>
<p>To use Iceberg in Spark 2.4, add the <code>iceberg-spark-runtime</code> Jar to Spark’s <code>jars</code> folder.</p>
<p>Spark 2.4 is limited to reading and writing existing Iceberg tables. Use the <a href="../api">Iceberg API</a> to create Iceberg tables.</p>
+<p>Recommended way is to include Iceberg’s latest released using the <code>--packages</code> option:</p>
+<pre><code class="sh">spark-shell --packages org.apache.iceberg:iceberg-spark-runtime:0.7.0-incubating
+</code></pre>
+
+<p>You can also build Iceberg locally, and add the jar to Spark’s classpath. This can be helpful to test unreleased features or while developing something new:</p>
+<pre><code class="sh">./gradlew assemble
+spark-shell --jars spark-runtime/build/libs/iceberg-spark-runtime-93990904.jar
+</code></pre>
+
+<p>Where you have to replace <code>93990904</code> with the git hash that you’re using.</p>
<h3 id="reading-an-iceberg-table">Reading an Iceberg table<a class="headerlink" href="#reading-an-iceberg-table" title="Permanent link">¶</a></h3>
<p>To read an Iceberg table, use the <code>iceberg</code> format in <code>DataFrameReader</code>:</p>
<pre><code class="scala">spark.read.format("iceberg").load("db.table")