You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by cz...@apache.org on 2022/12/28 02:48:07 UTC

[flink-table-store] branch release-0.3 updated: [FLINK-30506] Add documentation for writing Table Store with Spark3

This is an automated email from the ASF dual-hosted git repository.

czweng pushed a commit to branch release-0.3
in repository https://gitbox.apache.org/repos/asf/flink-table-store.git


The following commit(s) were added to refs/heads/release-0.3 by this push:
     new d02276de [FLINK-30506] Add documentation for writing Table Store with Spark3
d02276de is described below

commit d02276de2fdcf90b50b3cbb1da75055b960704e1
Author: tsreaper <ts...@gmail.com>
AuthorDate: Wed Dec 28 10:46:42 2022 +0800

    [FLINK-30506] Add documentation for writing Table Store with Spark3
    
    This closes #445.
    
    (cherry picked from commit 33c30f2e79e85ee15130320d6e712601c1b44d6c)
---
 docs/content/docs/engines/overview.md      | 10 ++---
 docs/content/docs/engines/spark2.md        |  2 +-
 docs/content/docs/engines/spark3.md        | 59 +++++++++++++++++++++++-------
 docs/content/docs/how-to/writing-tables.md | 10 +++++
 4 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/docs/content/docs/engines/overview.md b/docs/content/docs/engines/overview.md
index 04e064eb..678d310e 100644
--- a/docs/content/docs/engines/overview.md
+++ b/docs/content/docs/engines/overview.md
@@ -34,8 +34,8 @@ Apache Spark and Apache Hive.
 
 | Engine | Version | Feature | Read Pushdown |
 |---|---|---|---|
-| Flink | 1.16/1.15/1.14 | read, write, create/drop table, create/drop database | Projection, Filter |
-| Hive      | 3.1/2.3/2.2/2.1/2.1 CDH 6.3 | read | Projection, Filter |
-| Spark     | 3.3/3.2/3.1/3.0 | read, create/drop table, create/drop database | Projection, Filter |
-| Spark     | 2.4 | read | Projection, Filter |
-| Trino     | 388/358 | read | Projection, Filter |
+| Flink | 1.16/1.15/1.14 | batch/streaming read, batch/streaming write, create/drop table, create/drop database | Projection, Filter |
+| Hive      | 3.1/2.3/2.2/2.1/2.1 CDH 6.3 | batch read | Projection, Filter |
+| Spark     | 3.3/3.2/3.1/3.0 | batch read, batch write, create/drop table, create/drop database | Projection, Filter |
+| Spark     | 2.4 | batch read | Projection, Filter |
+| Trino     | 388/358 | batch read | Projection, Filter |
diff --git a/docs/content/docs/engines/spark2.md b/docs/content/docs/engines/spark2.md
index 8a4e9ea9..bce84516 100644
--- a/docs/content/docs/engines/spark2.md
+++ b/docs/content/docs/engines/spark2.md
@@ -68,7 +68,7 @@ If you are using HDFS, make sure that the environment variable `HADOOP_HOME` or
 
 **Step 1: Prepare Test Data**
 
-Table Store currently only supports reading tables through Spark. To create a Table Store table with records, please follow our [Flink quick start guide]({{< ref "docs/engines/flink#quick-start" >}}).
+Table Store currently only supports reading tables through Spark2. To create a Table Store table with records, please follow our [Flink quick start guide]({{< ref "docs/engines/flink#quick-start" >}}).
 
 After the guide, all table files should be stored under the path `/tmp/table_store`, or the warehouse path you've specified.
 
diff --git a/docs/content/docs/engines/spark3.md b/docs/content/docs/engines/spark3.md
index 51393091..ce5fa814 100644
--- a/docs/content/docs/engines/spark3.md
+++ b/docs/content/docs/engines/spark3.md
@@ -62,15 +62,9 @@ If you are using HDFS, make sure that the environment variable `HADOOP_HOME` or
 
 {{< /hint >}}
 
-**Step 1: Prepare Test Data**
+**Step 1: Specify Table Store Jar File**
 
-Table Store currently only supports reading tables through Spark. To create a Table Store table with records, please follow our [Flink quick start guide]({{< ref "docs/engines/flink#quick-start" >}}).
-
-After the guide, all table files should be stored under the path `/tmp/table_store`, or the warehouse path you've specified.
-
-**Step 2: Specify Table Store Jar File**
-
-You can append path to table store jar file to the `--jars` argument when starting `spark-sql`.
+Append path to table store jar file to the `--jars` argument when starting `spark-sql`.
 
 ```bash
 spark-sql ... --jars /path/to/flink-table-store-spark-{{< version >}}.jar
@@ -78,7 +72,7 @@ spark-sql ... --jars /path/to/flink-table-store-spark-{{< version >}}.jar
 
 Alternatively, you can copy `flink-table-store-spark-{{< version >}}.jar` under `spark/jars` in your Spark installation directory.
 
-**Step 3: Specify Table Store Catalog**
+**Step 2: Specify Table Store Catalog**
 
 When starting `spark-sql`, use the following command to register Table Store’s Spark catalog with the name `tablestore`. Table files of the warehouse is stored under `/tmp/table_store`.
 
@@ -88,13 +82,50 @@ spark-sql ... \
     --conf spark.sql.catalog.tablestore.warehouse=file:/tmp/table_store
 ```
 
+After `spark-sql` command line has started, run the following SQL to create and switch to database `tablestore.default`.
+
+```sql
+CREATE DATABASE tablestore.default;
+USE tablestore.default;
+```
+
+**Step 3: Create a table and Write Some Records**
+
+```sql
+create table my_table (
+    k int,
+    v string
+) tblproperties (
+    'primary-key' = 'k'
+);
+
+INSERT INTO my_table VALUES (1, 'Hi'), (2, 'Hello');
+```
+
 **Step 4: Query Table with SQL**
 
 ```sql
-SELECT * FROM tablestore.default.word_count;
+SELECT * FROM my_table;
+/*
+1	Hi
+2	Hello
+*/
+```
+
+**Step 5: Update the Records**
+
+```sql
+INSERT INTO my_table VALUES (1, 'Hi Again'), (3, 'Test');
+
+SELECT * FROM my_table;
+/*
+1	Hi Again
+2	Hello
+3	Test
+*/
 ```
 
-**Step 5: Query Table with Scala API**
+**Step 6: Query Table with Scala API**
 
 If you don't want to use Table Store catalog, you can also run `spark-shell` and query the table with Scala API.
 
@@ -103,9 +134,9 @@ spark-shell ... --jars /path/to/flink-table-store-spark-{{< version >}}.jar
 ```
 
 ```scala
-val dataset = spark.read.format("tablestore").load("file:/tmp/table_store/default.db/word_count")
-dataset.createOrReplaceTempView("word_count")
-spark.sql("SELECT * FROM word_count").show()
+val dataset = spark.read.format("tablestore").load("file:/tmp/table_store/default.db/my_table")
+dataset.createOrReplaceTempView("my_table")
+spark.sql("SELECT * FROM my_table").show()
 ```
 
 ## Spark Type Conversion
diff --git a/docs/content/docs/how-to/writing-tables.md b/docs/content/docs/how-to/writing-tables.md
index 1c44833d..fd3e3ba5 100644
--- a/docs/content/docs/how-to/writing-tables.md
+++ b/docs/content/docs/how-to/writing-tables.md
@@ -40,6 +40,16 @@ INSERT INTO MyTable SELECT ...
 
 {{< /tab >}}
 
+{{< tab "Spark3" >}}
+
+Use `INSERT INTO` to apply records and changes to tables.
+
+```sql
+INSERT INTO MyTable SELECT ...
+```
+
+{{< /tab >}}
+
 {{< /tabs >}}
 
 ## Overwriting the Whole Table