You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iceberg.apache.org by bl...@apache.org on 2021/09/14 22:48:38 UTC
[iceberg] branch master updated: Docs: Add UPDATE describtion for
Spark (#2897)
This is an automated email from the ASF dual-hosted git repository.
blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new f6ce6cd Docs: Add UPDATE describtion for Spark (#2897)
f6ce6cd is described below
commit f6ce6cd77821afa5489c93346af9cc93009b1b98
Author: Peidian li <38...@users.noreply.github.com>
AuthorDate: Wed Sep 15 06:48:25 2021 +0800
Docs: Add UPDATE describtion for Spark (#2897)
---
site/docs/spark-writes.md | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/site/docs/spark-writes.md b/site/docs/spark-writes.md
index 1fd2121..6f042ce 100644
--- a/site/docs/spark-writes.md
+++ b/site/docs/spark-writes.md
@@ -29,6 +29,7 @@ Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog impleme
| [SQL merge into](#merge-into) | ✔️ | | ⚠ Requires Iceberg Spark extensions |
| [SQL insert overwrite](#insert-overwrite) | ✔️ | | |
| [SQL delete from](#delete-from) | ✔️ | | ⚠ Row-level delete requires Spark extensions |
+| [SQL update](#update) | ✔️ | | ⚠ Requires Iceberg Spark extensions |
| [DataFrame append](#appending-data) | ✔️ | ✔️ | |
| [DataFrame overwrite](#overwriting-data) | ✔️ | ✔️ | ⚠ Behavior changed in Spark 3.0 |
| [DataFrame CTAS and RTAS](#creating-tables) | ✔️ | | |
@@ -171,6 +172,20 @@ WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
If the delte filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.
+### `UPDATE`
+
+Spark 3.1 added support for `UPDATE` queries that update matching rows in tables.
+
+Update queries accept a filter to match rows to update.
+
+```sql
+UPDATE prod.db.table
+SET c1 = 'update_c1', c2 = 'update_c2'
+WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
+```
+
+For more complex row-level updates based on incoming data, see the section on `MERGE INTO`.
+
## Writing with DataFrames
Spark 3 introduced the new `DataFrameWriterV2` API for writing to tables using data frames. The v2 API is recommended for several reasons: