You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by cz...@apache.org on 2022/12/28 02:48:07 UTC
[flink-table-store] branch release-0.3 updated: [FLINK-30506] Add documentation for writing Table Store with Spark3
This is an automated email from the ASF dual-hosted git repository.
czweng pushed a commit to branch release-0.3
in repository https://gitbox.apache.org/repos/asf/flink-table-store.git
The following commit(s) were added to refs/heads/release-0.3 by this push:
new d02276de [FLINK-30506] Add documentation for writing Table Store with Spark3
d02276de is described below
commit d02276de2fdcf90b50b3cbb1da75055b960704e1
Author: tsreaper <ts...@gmail.com>
AuthorDate: Wed Dec 28 10:46:42 2022 +0800
[FLINK-30506] Add documentation for writing Table Store with Spark3
This closes #445.
(cherry picked from commit 33c30f2e79e85ee15130320d6e712601c1b44d6c)
---
docs/content/docs/engines/overview.md | 10 ++---
docs/content/docs/engines/spark2.md | 2 +-
docs/content/docs/engines/spark3.md | 59 +++++++++++++++++++++++-------
docs/content/docs/how-to/writing-tables.md | 10 +++++
4 files changed, 61 insertions(+), 20 deletions(-)
diff --git a/docs/content/docs/engines/overview.md b/docs/content/docs/engines/overview.md
index 04e064eb..678d310e 100644
--- a/docs/content/docs/engines/overview.md
+++ b/docs/content/docs/engines/overview.md
@@ -34,8 +34,8 @@ Apache Spark and Apache Hive.
| Engine | Version | Feature | Read Pushdown |
|---|---|---|---|
-| Flink | 1.16/1.15/1.14 | read, write, create/drop table, create/drop database | Projection, Filter |
-| Hive | 3.1/2.3/2.2/2.1/2.1 CDH 6.3 | read | Projection, Filter |
-| Spark | 3.3/3.2/3.1/3.0 | read, create/drop table, create/drop database | Projection, Filter |
-| Spark | 2.4 | read | Projection, Filter |
-| Trino | 388/358 | read | Projection, Filter |
+| Flink | 1.16/1.15/1.14 | batch/streaming read, batch/streaming write, create/drop table, create/drop database | Projection, Filter |
+| Hive | 3.1/2.3/2.2/2.1/2.1 CDH 6.3 | batch read | Projection, Filter |
+| Spark | 3.3/3.2/3.1/3.0 | batch read, batch write, create/drop table, create/drop database | Projection, Filter |
+| Spark | 2.4 | batch read | Projection, Filter |
+| Trino | 388/358 | batch read | Projection, Filter |
diff --git a/docs/content/docs/engines/spark2.md b/docs/content/docs/engines/spark2.md
index 8a4e9ea9..bce84516 100644
--- a/docs/content/docs/engines/spark2.md
+++ b/docs/content/docs/engines/spark2.md
@@ -68,7 +68,7 @@ If you are using HDFS, make sure that the environment variable `HADOOP_HOME` or
**Step 1: Prepare Test Data**
-Table Store currently only supports reading tables through Spark. To create a Table Store table with records, please follow our [Flink quick start guide]({{< ref "docs/engines/flink#quick-start" >}}).
+Table Store currently only supports reading tables through Spark2. To create a Table Store table with records, please follow our [Flink quick start guide]({{< ref "docs/engines/flink#quick-start" >}}).
After the guide, all table files should be stored under the path `/tmp/table_store`, or the warehouse path you've specified.
diff --git a/docs/content/docs/engines/spark3.md b/docs/content/docs/engines/spark3.md
index 51393091..ce5fa814 100644
--- a/docs/content/docs/engines/spark3.md
+++ b/docs/content/docs/engines/spark3.md
@@ -62,15 +62,9 @@ If you are using HDFS, make sure that the environment variable `HADOOP_HOME` or
{{< /hint >}}
-**Step 1: Prepare Test Data**
+**Step 1: Specify Table Store Jar File**
-Table Store currently only supports reading tables through Spark. To create a Table Store table with records, please follow our [Flink quick start guide]({{< ref "docs/engines/flink#quick-start" >}}).
-
-After the guide, all table files should be stored under the path `/tmp/table_store`, or the warehouse path you've specified.
-
-**Step 2: Specify Table Store Jar File**
-
-You can append path to table store jar file to the `--jars` argument when starting `spark-sql`.
+Append path to table store jar file to the `--jars` argument when starting `spark-sql`.
```bash
spark-sql ... --jars /path/to/flink-table-store-spark-{{< version >}}.jar
@@ -78,7 +72,7 @@ spark-sql ... --jars /path/to/flink-table-store-spark-{{< version >}}.jar
Alternatively, you can copy `flink-table-store-spark-{{< version >}}.jar` under `spark/jars` in your Spark installation directory.
-**Step 3: Specify Table Store Catalog**
+**Step 2: Specify Table Store Catalog**
When starting `spark-sql`, use the following command to register Table Store’s Spark catalog with the name `tablestore`. Table files of the warehouse is stored under `/tmp/table_store`.
@@ -88,13 +82,50 @@ spark-sql ... \
--conf spark.sql.catalog.tablestore.warehouse=file:/tmp/table_store
```
+After `spark-sql` command line has started, run the following SQL to create and switch to database `tablestore.default`.
+
+```sql
+CREATE DATABASE tablestore.default;
+USE tablestore.default;
+```
+
+**Step 3: Create a table and Write Some Records**
+
+```sql
+create table my_table (
+ k int,
+ v string
+) tblproperties (
+ 'primary-key' = 'k'
+);
+
+INSERT INTO my_table VALUES (1, 'Hi'), (2, 'Hello');
+```
+
**Step 4: Query Table with SQL**
```sql
-SELECT * FROM tablestore.default.word_count;
+SELECT * FROM my_table;
+/*
+1 Hi
+2 Hello
+*/
+```
+
+**Step 5: Update the Records**
+
+```sql
+INSERT INTO my_table VALUES (1, 'Hi Again'), (3, 'Test');
+
+SELECT * FROM my_table;
+/*
+1 Hi Again
+2 Hello
+3 Test
+*/
```
-**Step 5: Query Table with Scala API**
+**Step 6: Query Table with Scala API**
If you don't want to use Table Store catalog, you can also run `spark-shell` and query the table with Scala API.
@@ -103,9 +134,9 @@ spark-shell ... --jars /path/to/flink-table-store-spark-{{< version >}}.jar
```
```scala
-val dataset = spark.read.format("tablestore").load("file:/tmp/table_store/default.db/word_count")
-dataset.createOrReplaceTempView("word_count")
-spark.sql("SELECT * FROM word_count").show()
+val dataset = spark.read.format("tablestore").load("file:/tmp/table_store/default.db/my_table")
+dataset.createOrReplaceTempView("my_table")
+spark.sql("SELECT * FROM my_table").show()
```
## Spark Type Conversion
diff --git a/docs/content/docs/how-to/writing-tables.md b/docs/content/docs/how-to/writing-tables.md
index 1c44833d..fd3e3ba5 100644
--- a/docs/content/docs/how-to/writing-tables.md
+++ b/docs/content/docs/how-to/writing-tables.md
@@ -40,6 +40,16 @@ INSERT INTO MyTable SELECT ...
{{< /tab >}}
+{{< tab "Spark3" >}}
+
+Use `INSERT INTO` to apply records and changes to tables.
+
+```sql
+INSERT INTO MyTable SELECT ...
+```
+
+{{< /tab >}}
+
{{< /tabs >}}
## Overwriting the Whole Table