You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/12/20 08:42:50 UTC

[GitHub] [flink-table-store] tsreaper opened a new pull request, #443: [FLINK-30458] Refactor Table Store Documentation

tsreaper opened a new pull request, #443:
URL: https://github.com/apache/flink-table-store/pull/443

   Currently the documents of table store are messy and are lacking basic concepts. This ticket refactors table store documentation and at the mean time adding basic concepts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053934427


##########
docs/content/docs/sql-api/writing-tables.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Writing Tables"
+weight: 4
+type: docs
+aliases:
+- /sql-api/writing-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Writing Tables
+
+## Applying Records/Changes to Tables
+
+{{< tabs "insert-into-example" >}}
+
+{{< tab "Flink" >}}
+
+Use `INSERT INTO` to apply records and changes to tables.
+
+```sql
+INSERT INTO MyTable SELECT ...
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+## Overwriting the Whole Table

Review Comment:
   It means removing all records from the table / partition and then add the given records.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053935750


##########
docs/content/docs/features/table-types.md:
##########
@@ -0,0 +1,142 @@
+---
+title: "Table Types"
+weight: 1
+type: docs
+aliases:
+- /features/table-types.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Types
+
+Table Store supports various types of tables. Users can specify `write-mode` table property to specify table types when creating tables.
+
+## Changelog Tables with Primary Keys
+
+Changelog table is the default table type when creating a table. Users can also specify `'write-mode' = 'change-log'` explicitly in table properties when creating the table.
+
+Primary keys are a set of columns that are unique for each record. Table Store imposes an ordering of data, which means the system will sort the primary key within each bucket. Using this feature, users can achieve high performance by adding filter conditions on the primary key.
+
+By [defining primary keys]({{< ref "docs/sql-api/creating-tables#tables-with-primary-keys" >}}) on a changelog table, users can access the following features.
+
+### Merge Engines
+
+When Table Store sink receives two or more records with the same primary keys, it will merge them into one record to keep primary keys unique. By specifying the `merge-engine` table property, users can choose how records are merged together.
+
+#### Deduplicate
+
+`deduplicate` merge engine is the default merge engine. Table Store will only keep the latest record and throw away other records with the same primary keys.
+
+Specifically, if the latest record is a `DELETE` record, all records with the same primary keys will be deleted.
+
+#### Partial Update
+
+By specifying `'merge-engine' = 'partial-update'`, users can set columns of a record across multiple updates and finally get a complete record. Specifically, value fields are updated to the latest data one by one under the same primary key, but null values are not overwritten.
+
+For example, let's say Table Store receives three records `<1, 23.0, 10, NULL>`, `<1, NULL, NULL, 'This is a book'>` and `<1, 25.2, NULL, NULL>`, where the first column is the primary key. The final result will be `<1, 25.2, 10, 'This is a book'>`.
+
+NOTE: For streaming queries, `partial-update` merge engine must be used together with `full-compaction` [changelog producer]({{< ref "docs/features/table-types#changelog-producers" >}}).
+
+#### Aggregation
+
+Sometimes users only care about aggregated results. The `aggregation` merge engine aggregates each value field with the latest data one by one under the same primary key according to the aggregate function.
+
+Each field not part of the primary keys must be given an aggregate function, specified by the `fields.<field-name>.aggregate-function` table property. For example, consider the following table definition.
+
+{{< tabs "aggregation-merge-engine-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    product_id BIGINT,
+    price DOUBLE,
+    sales BIGINT,
+    PRIMARY KEY (product_id) NOT ENFORCED
+) WITH (
+    'merge-engine' = 'aggregation',
+    'fields.price.aggregate-function' = 'max',
+    'fields.sales.aggregate-function' = 'sum'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+Field `price` will be aggregated by the `max` function, and field `sales` will be aggregated by the `sum` function. Given two input records `<1, 23.0, 15>` and `<1, 30.2, 20>`, the final result will be `<1, 30.2, 35>`.
+
+Current supported aggregate functions are data types are:
+
+* `sum`: supports DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT and DOUBLE.
+* `min`/`max`: support DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and TIMESTAMP_LTZ.
+* `last_value` / `last_non_null_value`: support all data types.
+* `listagg`: supports STRING data type.
+* `bool_and` / `bool_or`: support BOOLEAN data type.
+
+### Changelog Producers
+
+Streaming queries will continuously produce latest changes. These changes can come from the underlying table files or from an [external log system]({{< ref "docs/features/external-log-systems" >}}) like Kafka. Compared to the external log system, changes from table files have lower cost but higher latency (depending on how often snapshots are created).
+
+By specifying the `changelog-producer` table property when creating the table, users can choose the pattern of changes produced from files.
+
+#### None
+
+By default, no extra changelog producer will be applied to the writer of table. Table Store source can only see the merged changes across snapshots, like what keys are removed and what are the new values of some keys.
+
+However, these merged changes cannot form a complete changelog, because we can't read the old values of the keys directly from them. Merged changes require the consumers to "remember" the values of each key and to rewrite the values without seeing the old ones.

Review Comment:
   Consider the following scenario: The consumer wants to calculate the sum on some group keys (might not be equal to the primary keys). If we don't have `before` values, the consumer can only see the new values of a record and cannot determine what should be added to the result.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053934087


##########
docs/content/docs/maintenance-actions/write-performance.md:
##########
@@ -0,0 +1,118 @@
+---
+title: "Write Performance"
+weight: 1
+type: docs
+aliases:
+- /maintenance-actions/write-performance.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Write Performance
+
+Performance of Table Store writers are related with the following factors.
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or equal to the number of buckets, preferably equal. You can control the parallelism of the sink with the `sink.parallelism` table property.
+
+<table class="table table-bordered">
+    <thead>
+    <tr>
+      <th class="text-left" style="width: 20%">Option</th>
+      <th class="text-left" style="width: 5%">Required</th>
+      <th class="text-left" style="width: 5%">Default</th>
+      <th class="text-left" style="width: 10%">Type</th>
+      <th class="text-left" style="width: 60%">Description</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td><h5>sink.parallelism</h5></td>
+      <td>No</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>Integer</td>
+      <td>Defines the parallelism of the sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator.</td>
+    </tr>
+    </tbody>
+</table>
+
+## Number of Sorted Runs to Trigger Compaction
+
+Table Store uses LSM which supports a large number of updates. LSM organizes files in several "sorted runs". When querying records from an LSM, all sorted runs must be combined to produce a complete view of all records.

Review Comment:
   I guess I'll need to add a separate document to introduce all these concepts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] Gerrrr commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
Gerrrr commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053671570


##########
docs/content/docs/sql-api/writing-tables.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Writing Tables"
+weight: 4
+type: docs
+aliases:
+- /sql-api/writing-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Writing Tables
+
+## Applying Records/Changes to Tables
+
+{{< tabs "insert-into-example" >}}
+
+{{< tab "Flink" >}}
+
+Use `INSERT INTO` to apply records and changes to tables.

Review Comment:
   Under `changelog-mode: all`, how does FTS keep `before` values for the insertions into the log store?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1054066928


##########
docs/content/docs/concepts/file-layouts.md:
##########
@@ -0,0 +1,53 @@
+---
+title: "File Layouts"
+weight: 3
+type: docs
+aliases:
+- /concepts/file-layouts.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# File Layouts
+
+All files of a table are stored under one base directory. Table Store files are organized in a layered style. The following image illustrates the file layout. Starting from a snapshot file, Table Store readers can recursively access all records from the table.
+
+{{< img src="/img/file-layout.png">}}
+
+## Snapshot Files
+
+All snapshot files are stored in the `snapshot` directory.
+
+A snapshot file is a JSON file containing information about this snapshot, including
+
+* the schema file in use
+* the manifest list containing all changes prior to this snapshot
+* the manifest list containing all changes of this snapshot

Review Comment:
   It is hard to understand these two cases. Can we just provide only one?
   `the manifest list containing all changes of this snapshot`



##########
docs/content/docs/sql-api/querying-tables.md:
##########
@@ -24,99 +24,98 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Query Table
-
-You can directly SELECT the table in batch runtime mode of Flink SQL.
-
-```sql
--- Batch mode, read latest snapshot
-SET 'execution.runtime-mode' = 'batch';
-SELECT * FROM MyTable;
-```
-
-## Query Engines
-
-Table Store not only supports Flink SQL queries natively but also provides
-queries from other popular engines. See [Engines]({{< ref "docs/engines/overview" >}})
-
-## Query Optimization
-
-It is highly recommended to specify partition and primary key filters
-along with the query, which will speed up the data skipping of the query.
-
-The filter functions that can accelerate data skipping are:
-- `=`
-- `<`
-- `<=`
-- `>`
-- `>=`
-- `IN (...)`
-- `LIKE 'abc%'`
-- `IS NULL`
-
-Table Store will sort the data by primary key, which speeds up the point queries
-and range queries. When using a composite primary key, it is best for the query
-filters to form a [leftmost prefix](https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html)
-of the primary key for good acceleration.
-
-Suppose that a table has the following specification:
-
-```sql
-CREATE TABLE orders (
-    catalog_id BIGINT,
-    order_id BIGINT,
-    .....,
-    PRIMARY KEY (catalog_id, order_id) NOT ENFORCED -- composite primary key
-)
-```
-
-The query obtains a good acceleration by specifying a range filter for
-the leftmost prefix of the primary key.
-
-```sql
-SELECT * FROM orders WHERE catalog_id=1025;
-
-SELECT * FROM orders WHERE catalog_id=1025 AND order_id=29495;
-
-SELECT * FROM orders
-  WHERE catalog_id=1025
-  AND order_id>2035 AND order_id<6000;
-```
-
-However, the following filter cannot accelerate the query well.
-
-```sql
-SELECT * FROM orders WHERE order_id=29495;
-
-SELECT * FROM orders WHERE catalog_id=1025 OR order_id=29495;
-```
-
-## Snapshots Table
-
-You can query the snapshot history information of the table through Flink SQL.
+# Querying Tables
+
+Just like all other tables, Table Store tables can be queried with `SELECT` statement.
+
+## Scan Mode
+
+By specifying the `scan.mode` table property, users can specify where and how Table Store sources should produce records.
+
+<table class="table table-bordered">
+<thead>
+<tr>
+<th>Scan Mode</th>
+<th>Batch Source Behavior</th>
+<th>Streaming Source Behavior</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>default</td>
+<td colspan="2">
+The default scan mode. Determines actual scan mode according to other table properties. If "scan.timestamp-millis" is set the actual scan mode will be "from-timestamp". Otherwise the actual scan mode will be "latest-full".
+</td>
+</tr>
+<tr>
+<td>latest-full</td>
+<td>
+Produces the latest snapshot of table.
+</td>
+<td>
+Produces the latest snapshot on the table upon first startup, and continues to read the following changes.
+</td>
+</tr>
+<tr>
+<td>compacted-full</td>
+<td>
+Produces the snapshot after the latest <a href="{{< ref "docs/concepts/lsm-trees#compactions" >}}">compaction</a>.
+</td>
+<td>
+Produces the snapshot after the latest compaction on the table upon first startup, and continues to read the following changes.
+</td>
+</tr>
+<tr>
+<td>latest</td>

Review Comment:
   We can remove this legacy one



##########
docs/content/docs/features/external-log-systems.md:
##########
@@ -0,0 +1,64 @@
+---
+title: "External Log Systems"
+weight: 2
+type: docs
+aliases:
+- /features/external-log-systems.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# External Log Systems
+
+Aside from [underlying table files]({{< ref "docs/features/table-types#changelog-producers" >}}), changelog of Table Store can also be stored into or consumed from an external log system, such as Kafka. By specifying `log.system` table property, users can choose which external log system to use.
+
+If an external log system is used, all records written into table files will also be written into the log system. Changes produced by the streaming queries will thus come from the log system instead of table files.
+
+## Consistency Guarantees
+
+By default, changes in the log systems are visible to consumers only after a snapshot, just like table files. This behavior guarantees the exactly-once semantics. That is, each record is seen by the consumers exactly once.
+
+However, users can also specify the table property `'log.consistency' = 'eventual'` so that changelog written into the log system can be immediately consumed by the consumers, without waiting for the next snapshot. This behavior decreases the latency of changelog, but it can only guarantee the at-least-once semantics (that is, consumers might see duplicated records) due to possible failures.

Review Comment:
   Therefore, in order to achieve correct calculation results, FlinkSQL automatically adds a normalized node. Similarly, when the changelog-log is none, it not only generates update-before, but also achieves the effect of de duplication.



##########
docs/content/docs/features/lookup-joins.md:
##########
@@ -0,0 +1,97 @@
+---
+title: "Lookup Joins"

Review Comment:
   I prefer to put this into SQL API, and add a `Flink-Only`.



##########
docs/content/docs/maintenance-actions/_index.md:
##########
@@ -0,0 +1,26 @@
+---
+title: Maintenance Actions

Review Comment:
   Maybe just `Tuning`?
   I think `Actions` can not cover all in here.



##########
docs/content/docs/sql-api/creating-tables.md:
##########
@@ -0,0 +1,261 @@
+---
+title: "Creating Tables"
+weight: 2
+type: docs
+aliases:
+- /sql-api/creating-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Creating Tables
+
+## Creating Catalog Managed Tables
+
+Tables created in Table Store [catalogs]({{< ref "docs/sql-api/creating-catalogs" >}}) are managed by the catalog. When the table is dropped from catalog, its table files will also be deleted.
+
+The following SQL assumes that you have registered and are using a Table Store catalog. It creates a managed table named `MyTable` with five columns in the catalog's `default` database.
+
+{{< tabs "catalog-managed-table-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+### Tables with Primary Keys
+
+The following SQL creates a table named `MyTable` with five columns, where `dt`, `hh` and `user_id` are the primary keys.
+
+{{< tabs "primary-keys-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING,
+    PRIMARY KEY (dt, hh, user_id) NOT ENFORCED
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) TBLPROPERTIES (
+    'primary-key' = 'dt,hh,user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+### Partitioned Tables
+
+The following SQL creates a table named `MyTable` with five columns partitioned by `dt` and `hh`.
+
+{{< tabs "partitions-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+{{< hint info >}}
+
+Partition keys must be a subset of primary keys if primary keys are defined.
+
+{{< /hint >}}
+
+### Table Properties
+
+Users can specify table properties to enable features or improve performance of Table Store. For a complete list of such properties, see [configurations]({{< ref "docs/maintenance-actions/configurations" >}}).
+
+The following SQL creates a table named `MyTable` with five columns partitioned by `dt` and `hh`. This table has two properties: `'bucket' = '2'` and `'bucket-key' = 'user_id'`.
+
+{{< tabs "table-properties-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh) WITH (
+    'bucket' = '2',
+    'bucket-key' = 'user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh) TBLPROPERTIES (
+    'bucket' = '2',
+    'bucket-key' = 'user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+## Creating External Tables
+
+External tables are recorded but not managed by catalogs. If an external table is dropped, its table files will not be deleted.
+
+Table Store external tables can be used in any catalog. If you do not want to create a Table Store catalog and just want to read / write a table, you can consider external tables.
+
+{{< tabs "external-table-example" >}}
+
+{{< tab "Flink" >}}
+
+Flink SQL supports reading and writing an external table. External Table Store tables are created by specifying the `connector` and `path` table properties. The following SQL creates an external table named `MyTable` with five columns, where the base path of table files is `hdfs://path/to/table`.
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) WITH (
+    'connector' = 'table-store',
+    'path' = 'hdfs://path/to/table',
+    'auto-create' = 'true' -- this table property creates table files for an empty table if table path does not exist
+                           -- currently only supported by Flink
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Hive" >}}
+
+Hive SQL only supports reading from an external table. The following SQL creates an external table named `my_table`, where the base path of table files is `hdfs://path/to/table`. As schemas are stored in table files, users do not need to write column definitions.
+
+```sql
+CREATE EXTERNAL TABLE my_table
+STORED BY 'org.apache.flink.table.store.hive.TableStoreHiveStorageHandler'
+LOCATION 'hdfs://path/to/table';
+```
+
+{{< /tab >}}
+

Review Comment:
   Here can add a Spark3 like: `Query Table with Scala API`.



##########
docs/content/docs/sql-api/altering-tables.md:
##########
@@ -0,0 +1,130 @@
+---
+title: "Altering Tables"
+weight: 3
+type: docs
+aliases:
+- /sql-api/altering-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Altering Tables
+
+## Changing/Adding Table Properties
+
+The following SQL sets `write-buffer-size` table property to `256 MB`.
+
+{{< tabs "set-properties-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+ALTER TABLE my_table SET (
+    'write-buffer-size' = '256 MB'
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+ALTER TABLE tablestore.default.my_table SET TBLPROPERTIES (

Review Comment:
   we can just use `my_table`. Spark also has `use catalog.database`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1054140330


##########
docs/content/docs/maintenance-actions/_index.md:
##########
@@ -0,0 +1,26 @@
+---
+title: Maintenance Actions

Review Comment:
   Just `Maintenance`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1054155956


##########
docs/content/docs/sql-api/creating-tables.md:
##########
@@ -0,0 +1,261 @@
+---
+title: "Creating Tables"
+weight: 2
+type: docs
+aliases:
+- /sql-api/creating-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Creating Tables
+
+## Creating Catalog Managed Tables
+
+Tables created in Table Store [catalogs]({{< ref "docs/sql-api/creating-catalogs" >}}) are managed by the catalog. When the table is dropped from catalog, its table files will also be deleted.
+
+The following SQL assumes that you have registered and are using a Table Store catalog. It creates a managed table named `MyTable` with five columns in the catalog's `default` database.
+
+{{< tabs "catalog-managed-table-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+### Tables with Primary Keys
+
+The following SQL creates a table named `MyTable` with five columns, where `dt`, `hh` and `user_id` are the primary keys.
+
+{{< tabs "primary-keys-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING,
+    PRIMARY KEY (dt, hh, user_id) NOT ENFORCED
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) TBLPROPERTIES (
+    'primary-key' = 'dt,hh,user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+### Partitioned Tables
+
+The following SQL creates a table named `MyTable` with five columns partitioned by `dt` and `hh`.
+
+{{< tabs "partitions-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+{{< hint info >}}
+
+Partition keys must be a subset of primary keys if primary keys are defined.
+
+{{< /hint >}}
+
+### Table Properties
+
+Users can specify table properties to enable features or improve performance of Table Store. For a complete list of such properties, see [configurations]({{< ref "docs/maintenance-actions/configurations" >}}).
+
+The following SQL creates a table named `MyTable` with five columns partitioned by `dt` and `hh`. This table has two properties: `'bucket' = '2'` and `'bucket-key' = 'user_id'`.
+
+{{< tabs "table-properties-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh) WITH (
+    'bucket' = '2',
+    'bucket-key' = 'user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE tablestore.default.MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh) TBLPROPERTIES (
+    'bucket' = '2',
+    'bucket-key' = 'user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+## Creating External Tables
+
+External tables are recorded but not managed by catalogs. If an external table is dropped, its table files will not be deleted.
+
+Table Store external tables can be used in any catalog. If you do not want to create a Table Store catalog and just want to read / write a table, you can consider external tables.
+
+{{< tabs "external-table-example" >}}
+
+{{< tab "Flink" >}}
+
+Flink SQL supports reading and writing an external table. External Table Store tables are created by specifying the `connector` and `path` table properties. The following SQL creates an external table named `MyTable` with five columns, where the base path of table files is `hdfs://path/to/table`.
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) WITH (
+    'connector' = 'table-store',
+    'path' = 'hdfs://path/to/table',
+    'auto-create' = 'true' -- this table property creates table files for an empty table if table path does not exist
+                           -- currently only supported by Flink
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Hive" >}}
+
+Hive SQL only supports reading from an external table. The following SQL creates an external table named `my_table`, where the base path of table files is `hdfs://path/to/table`. As schemas are stored in table files, users do not need to write column definitions.
+
+```sql
+CREATE EXTERNAL TABLE my_table
+STORED BY 'org.apache.flink.table.store.hive.TableStoreHiveStorageHandler'
+LOCATION 'hdfs://path/to/table';
+```
+
+{{< /tab >}}
+

Review Comment:
   In that case we'll have to change the category name from "SQL API" to something else.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1054137393


##########
docs/content/docs/sql-api/querying-tables.md:
##########
@@ -24,99 +24,98 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Query Table
-
-You can directly SELECT the table in batch runtime mode of Flink SQL.
-
-```sql
--- Batch mode, read latest snapshot
-SET 'execution.runtime-mode' = 'batch';
-SELECT * FROM MyTable;
-```
-
-## Query Engines
-
-Table Store not only supports Flink SQL queries natively but also provides
-queries from other popular engines. See [Engines]({{< ref "docs/engines/overview" >}})
-
-## Query Optimization
-
-It is highly recommended to specify partition and primary key filters
-along with the query, which will speed up the data skipping of the query.
-
-The filter functions that can accelerate data skipping are:
-- `=`
-- `<`
-- `<=`
-- `>`
-- `>=`
-- `IN (...)`
-- `LIKE 'abc%'`
-- `IS NULL`
-
-Table Store will sort the data by primary key, which speeds up the point queries
-and range queries. When using a composite primary key, it is best for the query
-filters to form a [leftmost prefix](https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html)
-of the primary key for good acceleration.
-
-Suppose that a table has the following specification:
-
-```sql
-CREATE TABLE orders (
-    catalog_id BIGINT,
-    order_id BIGINT,
-    .....,
-    PRIMARY KEY (catalog_id, order_id) NOT ENFORCED -- composite primary key
-)
-```
-
-The query obtains a good acceleration by specifying a range filter for
-the leftmost prefix of the primary key.
-
-```sql
-SELECT * FROM orders WHERE catalog_id=1025;
-
-SELECT * FROM orders WHERE catalog_id=1025 AND order_id=29495;
-
-SELECT * FROM orders
-  WHERE catalog_id=1025
-  AND order_id>2035 AND order_id<6000;
-```
-
-However, the following filter cannot accelerate the query well.
-
-```sql
-SELECT * FROM orders WHERE order_id=29495;
-
-SELECT * FROM orders WHERE catalog_id=1025 OR order_id=29495;
-```
-
-## Snapshots Table
-
-You can query the snapshot history information of the table through Flink SQL.
+# Querying Tables
+
+Just like all other tables, Table Store tables can be queried with `SELECT` statement.
+
+## Scan Mode
+
+By specifying the `scan.mode` table property, users can specify where and how Table Store sources should produce records.
+
+<table class="table table-bordered">
+<thead>
+<tr>
+<th>Scan Mode</th>
+<th>Batch Source Behavior</th>
+<th>Streaming Source Behavior</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>default</td>
+<td colspan="2">
+The default scan mode. Determines actual scan mode according to other table properties. If "scan.timestamp-millis" is set the actual scan mode will be "from-timestamp". Otherwise the actual scan mode will be "latest-full".
+</td>
+</tr>
+<tr>
+<td>latest-full</td>
+<td>
+Produces the latest snapshot of table.
+</td>
+<td>
+Produces the latest snapshot on the table upon first startup, and continues to read the following changes.
+</td>
+</tr>
+<tr>
+<td>compacted-full</td>
+<td>
+Produces the snapshot after the latest <a href="{{< ref "docs/concepts/lsm-trees#compactions" >}}">compaction</a>.
+</td>
+<td>
+Produces the snapshot after the latest compaction on the table upon first startup, and continues to read the following changes.
+</td>
+</tr>
+<tr>
+<td>latest</td>

Review Comment:
   `latest` is not the legacy one. That's `full`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper merged pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper merged PR #443:
URL: https://github.com/apache/flink-table-store/pull/443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1054332744


##########
docs/content/docs/how-to/creating-tables.md:
##########
@@ -0,0 +1,281 @@
+---
+title: "Creating Tables"
+weight: 2
+type: docs
+aliases:
+- /how-to/creating-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Creating Tables
+
+## Creating Catalog Managed Tables
+
+Tables created in Table Store [catalogs]({{< ref "docs/how-to/creating-catalogs" >}}) are managed by the catalog. When the table is dropped from catalog, its table files will also be deleted.
+
+The following SQL assumes that you have registered and are using a Table Store catalog. It creates a managed table named `MyTable` with five columns in the catalog's `default` database.
+
+{{< tabs "catalog-managed-table-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+### Tables with Primary Keys
+
+The following SQL creates a table named `MyTable` with five columns, where `dt`, `hh` and `user_id` are the primary keys.
+
+{{< tabs "primary-keys-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING,
+    PRIMARY KEY (dt, hh, user_id) NOT ENFORCED
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) TBLPROPERTIES (
+    'primary-key' = 'dt,hh,user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+### Partitioned Tables
+
+The following SQL creates a table named `MyTable` with five columns partitioned by `dt` and `hh`.
+
+{{< tabs "partitions-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+{{< hint info >}}
+
+Partition keys must be a subset of primary keys if primary keys are defined.
+
+{{< /hint >}}
+
+### Table Properties
+
+Users can specify table properties to enable features or improve performance of Table Store. For a complete list of such properties, see [configurations]({{< ref "docs/maintenance/configurations" >}}).
+
+The following SQL creates a table named `MyTable` with five columns partitioned by `dt` and `hh`. This table has two properties: `'bucket' = '2'` and `'bucket-key' = 'user_id'`.
+
+{{< tabs "table-properties-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh) WITH (
+    'bucket' = '2',
+    'bucket-key' = 'user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) PARTITIONED BY (dt, hh) TBLPROPERTIES (
+    'bucket' = '2',
+    'bucket-key' = 'user_id'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+## Creating External Tables
+
+External tables are recorded but not managed by catalogs. If an external table is dropped, its table files will not be deleted.
+
+Table Store external tables can be used in any catalog. If you do not want to create a Table Store catalog and just want to read / write a table, you can consider external tables.
+
+{{< tabs "external-table-example" >}}
+
+{{< tab "Flink" >}}
+
+Flink SQL supports reading and writing an external table. External Table Store tables are created by specifying the `connector` and `path` table properties. The following SQL creates an external table named `MyTable` with five columns, where the base path of table files is `hdfs://path/to/table`.
+
+```sql
+CREATE TABLE MyTable (
+    user_id BIGINT,
+    item_id BIGINT,
+    behavior STRING,
+    dt STRING,
+    hh STRING
+) WITH (
+    'connector' = 'table-store',
+    'path' = 'hdfs://path/to/table',
+    'auto-create' = 'true' -- this table property creates table files for an empty table if table path does not exist
+                           -- currently only supported by Flink
+);
+```
+
+{{< /tab >}}
+
+{{< tab "Spark3" >}}
+
+Spark3 only supports creating external tables through Scala API. The following Scala code loads the table located at `hdfs://path/to/table` into a `DataSet`.
+
+```scala
+val dataset = spark.read.format("tablestore").load("hdfs://path/to/table")
+```
+
+{{< /tab >}}
+
+{{< tab "Spark2" >}}
+
+Spark2 only supports creating external tables through Scala API. The following Scala code loads the table located at `hdfs://path/to/table` into a `DataSet`.
+
+```scala
+val dataset = spark.read.format("tablestore").load("hdfs://path/to/table")
+```
+
+{{< /tab >}}
+
+{{< tab "Hive" >}}
+
+Hive SQL only supports reading from an external table. The following SQL creates an external table named `my_table`, where the base path of table files is `hdfs://path/to/table`. As schemas are stored in table files, users do not need to write column definitions.
+
+```sql
+CREATE EXTERNAL TABLE my_table
+STORED BY 'org.apache.flink.table.store.hive.TableStoreHiveStorageHandler'
+LOCATION 'hdfs://path/to/table';
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+## Creating Temporary Tables
+
+Temporary tables are only supported by Flink. Like external tables, temporary tables are just recorded but not managed by the current Flink SQL session. If the temporary table is dropped, its resources will not be deleted. Temporary tables are also dropped when Flink SQL session is closed.
+
+If you want to use Table Store catalog along with other tables but do not want to store them in other catalogs, you can create a temporary table. The following Flink SQL creates a Table Store catalog and a temporary table and also illustrates how to use both tables together.
+
+```sql
+CREATE CATALOG my_catalog WITH (

Review Comment:
   use tab to note this is flink sql.



##########
docs/content/docs/how-to/creating-tables.md:
##########
@@ -0,0 +1,281 @@
+---
+title: "Creating Tables"
+weight: 2
+type: docs
+aliases:
+- /how-to/creating-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Creating Tables
+
+## Creating Catalog Managed Tables
+
+Tables created in Table Store [catalogs]({{< ref "docs/how-to/creating-catalogs" >}}) are managed by the catalog. When the table is dropped from catalog, its table files will also be deleted.
+
+The following SQL assumes that you have registered and are using a Table Store catalog. It creates a managed table named `MyTable` with five columns in the catalog's `default` database.
+
+{{< tabs "catalog-managed-table-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (

Review Comment:
   Can we remove this one?
   No pk changelog table is not recommented.
   I think pk one is enough.
   
   We can add pk in most of exmples.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] Gerrrr commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
Gerrrr commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053789276


##########
docs/content/docs/features/table-types.md:
##########
@@ -0,0 +1,142 @@
+---
+title: "Table Types"
+weight: 1
+type: docs
+aliases:
+- /features/table-types.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Types
+
+Table Store supports various types of tables. Users can specify `write-mode` table property to specify table types when creating tables.
+
+## Changelog Tables with Primary Keys
+
+Changelog table is the default table type when creating a table. Users can also specify `'write-mode' = 'change-log'` explicitly in table properties when creating the table.
+
+Primary keys are a set of columns that are unique for each record. Table Store imposes an ordering of data, which means the system will sort the primary key within each bucket. Using this feature, users can achieve high performance by adding filter conditions on the primary key.
+
+By [defining primary keys]({{< ref "docs/sql-api/creating-tables#tables-with-primary-keys" >}}) on a changelog table, users can access the following features.
+
+### Merge Engines
+
+When Table Store sink receives two or more records with the same primary keys, it will merge them into one record to keep primary keys unique. By specifying the `merge-engine` table property, users can choose how records are merged together.
+
+#### Deduplicate
+
+`deduplicate` merge engine is the default merge engine. Table Store will only keep the latest record and throw away other records with the same primary keys.
+
+Specifically, if the latest record is a `DELETE` record, all records with the same primary keys will be deleted.
+
+#### Partial Update
+
+By specifying `'merge-engine' = 'partial-update'`, users can set columns of a record across multiple updates and finally get a complete record. Specifically, value fields are updated to the latest data one by one under the same primary key, but null values are not overwritten.
+
+For example, let's say Table Store receives three records `<1, 23.0, 10, NULL>`, `<1, NULL, NULL, 'This is a book'>` and `<1, 25.2, NULL, NULL>`, where the first column is the primary key. The final result will be `<1, 25.2, 10, 'This is a book'>`.
+
+NOTE: For streaming queries, `partial-update` merge engine must be used together with `full-compaction` [changelog producer]({{< ref "docs/features/table-types#changelog-producers" >}}).
+
+#### Aggregation
+
+Sometimes users only care about aggregated results. The `aggregation` merge engine aggregates each value field with the latest data one by one under the same primary key according to the aggregate function.
+
+Each field not part of the primary keys must be given an aggregate function, specified by the `fields.<field-name>.aggregate-function` table property. For example, consider the following table definition.
+
+{{< tabs "aggregation-merge-engine-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    product_id BIGINT,
+    price DOUBLE,
+    sales BIGINT,
+    PRIMARY KEY (product_id) NOT ENFORCED
+) WITH (
+    'merge-engine' = 'aggregation',
+    'fields.price.aggregate-function' = 'max',
+    'fields.sales.aggregate-function' = 'sum'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+Field `price` will be aggregated by the `max` function, and field `sales` will be aggregated by the `sum` function. Given two input records `<1, 23.0, 15>` and `<1, 30.2, 20>`, the final result will be `<1, 30.2, 35>`.
+
+Current supported aggregate functions are data types are:
+
+* `sum`: supports DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT and DOUBLE.
+* `min`/`max`: support DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and TIMESTAMP_LTZ.
+* `last_value` / `last_non_null_value`: support all data types.
+* `listagg`: supports STRING data type.
+* `bool_and` / `bool_or`: support BOOLEAN data type.
+
+### Changelog Producers
+
+Streaming queries will continuously produce latest changes. These changes can come from the underlying table files or from an [external log system]({{< ref "docs/features/external-log-systems" >}}) like Kafka. Compared to the external log system, changes from table files have lower cost but higher latency (depending on how often snapshots are created).
+
+By specifying the `changelog-producer` table property when creating the table, users can choose the pattern of changes produced from files.
+
+#### None
+
+By default, no extra changelog producer will be applied to the writer of table. Table Store source can only see the merged changes across snapshots, like what keys are removed and what are the new values of some keys.
+
+However, these merged changes cannot form a complete changelog, because we can't read the old values of the keys directly from them. Merged changes require the consumers to "remember" the values of each key and to rewrite the values without seeing the old ones.

Review Comment:
   Why do we need `before` values in the complete changelog in the general case? AFAIU we'd only need them for append-only tables.



##########
docs/content/docs/features/table-types.md:
##########
@@ -0,0 +1,142 @@
+---
+title: "Table Types"
+weight: 1
+type: docs
+aliases:
+- /features/table-types.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Types
+
+Table Store supports various types of tables. Users can specify `write-mode` table property to specify table types when creating tables.
+
+## Changelog Tables with Primary Keys
+
+Changelog table is the default table type when creating a table. Users can also specify `'write-mode' = 'change-log'` explicitly in table properties when creating the table.

Review Comment:
   It might be worth elaborating that `'write-mode' = 'change-log'` generally means that the table supports inserts/updates/deletions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] Gerrrr commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
Gerrrr commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053657989


##########
docs/content/docs/features/table-types.md:
##########
@@ -0,0 +1,142 @@
+---
+title: "Table Types"
+weight: 1
+type: docs
+aliases:
+- /features/table-types.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Types
+
+Table Store supports various types of tables. Users can specify `write-mode` table property to specify table types when creating tables.
+
+## Changelog Tables with Primary Keys
+
+Changelog table is the default table type when creating a table. Users can also specify `'write-mode' = 'change-log'` explicitly in table properties when creating the table.
+
+Primary keys are a set of columns that are unique for each record. Table Store imposes an ordering of data, which means the system will sort the primary key within each bucket. Using this feature, users can achieve high performance by adding filter conditions on the primary key.
+
+By [defining primary keys]({{< ref "docs/sql-api/creating-tables#tables-with-primary-keys" >}}) on a changelog table, users can access the following features.
+
+### Merge Engines
+
+When Table Store sink receives two or more records with the same primary keys, it will merge them into one record to keep primary keys unique. By specifying the `merge-engine` table property, users can choose how records are merged together.
+
+#### Deduplicate
+
+`deduplicate` merge engine is the default merge engine. Table Store will only keep the latest record and throw away other records with the same primary keys.
+
+Specifically, if the latest record is a `DELETE` record, all records with the same primary keys will be deleted.
+
+#### Partial Update
+
+By specifying `'merge-engine' = 'partial-update'`, users can set columns of a record across multiple updates and finally get a complete record. Specifically, value fields are updated to the latest data one by one under the same primary key, but null values are not overwritten.
+
+For example, let's say Table Store receives three records `<1, 23.0, 10, NULL>`, `<1, NULL, NULL, 'This is a book'>` and `<1, 25.2, NULL, NULL>`, where the first column is the primary key. The final result will be `<1, 25.2, 10, 'This is a book'>`.
+
+NOTE: For streaming queries, `partial-update` merge engine must be used together with `full-compaction` [changelog producer]({{< ref "docs/features/table-types#changelog-producers" >}}).
+
+#### Aggregation
+
+Sometimes users only care about aggregated results. The `aggregation` merge engine aggregates each value field with the latest data one by one under the same primary key according to the aggregate function.
+
+Each field not part of the primary keys must be given an aggregate function, specified by the `fields.<field-name>.aggregate-function` table property. For example, consider the following table definition.
+
+{{< tabs "aggregation-merge-engine-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    product_id BIGINT,
+    price DOUBLE,
+    sales BIGINT,
+    PRIMARY KEY (product_id) NOT ENFORCED
+) WITH (
+    'merge-engine' = 'aggregation',
+    'fields.price.aggregate-function' = 'max',
+    'fields.sales.aggregate-function' = 'sum'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+Field `price` will be aggregated by the `max` function, and field `sales` will be aggregated by the `sum` function. Given two input records `<1, 23.0, 15>` and `<1, 30.2, 20>`, the final result will be `<1, 30.2, 35>`.
+
+Current supported aggregate functions are data types are:
+
+* `sum`: supports DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT and DOUBLE.
+* `min`/`max`: support DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and TIMESTAMP_LTZ.
+* `last_value` / `last_non_null_value`: support all data types.
+* `listagg`: supports STRING data type.
+* `bool_and` / `bool_or`: support BOOLEAN data type.
+
+### Changelog Producers
+
+Streaming queries will continuously produce latest changes. These changes can come from the underlying table files or from an [external log system]({{< ref "docs/features/external-log-systems" >}}) like Kafka. Compared to the external log system, changes from table files have lower cost but higher latency (depending on how often snapshots are created).
+
+By specifying the `changelog-producer` table property when creating the table, users can choose the pattern of changes produced from files.

Review Comment:
   Does it mean that the `changelog-producer` option is only relevant to the changelog from files? Does it do anything when Kafka is used?



##########
docs/content/docs/sql-api/writing-tables.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Writing Tables"
+weight: 4
+type: docs
+aliases:
+- /sql-api/writing-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Writing Tables
+
+## Applying Records/Changes to Tables
+
+{{< tabs "insert-into-example" >}}
+
+{{< tab "Flink" >}}
+
+Use `INSERT INTO` to apply records and changes to tables.
+
+```sql
+INSERT INTO MyTable SELECT ...
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+## Overwriting the Whole Table

Review Comment:
   What does it mean to overwrite a table or a partition?



##########
docs/content/docs/sql-api/querying-tables.md:
##########
@@ -24,99 +24,98 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Query Table
-
-You can directly SELECT the table in batch runtime mode of Flink SQL.
-
-```sql
--- Batch mode, read latest snapshot
-SET 'execution.runtime-mode' = 'batch';
-SELECT * FROM MyTable;
-```
-
-## Query Engines
-
-Table Store not only supports Flink SQL queries natively but also provides
-queries from other popular engines. See [Engines]({{< ref "docs/engines/overview" >}})
-
-## Query Optimization
-
-It is highly recommended to specify partition and primary key filters
-along with the query, which will speed up the data skipping of the query.
-
-The filter functions that can accelerate data skipping are:
-- `=`
-- `<`
-- `<=`
-- `>`
-- `>=`
-- `IN (...)`
-- `LIKE 'abc%'`
-- `IS NULL`
-
-Table Store will sort the data by primary key, which speeds up the point queries
-and range queries. When using a composite primary key, it is best for the query
-filters to form a [leftmost prefix](https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html)
-of the primary key for good acceleration.
-
-Suppose that a table has the following specification:
-
-```sql
-CREATE TABLE orders (
-    catalog_id BIGINT,
-    order_id BIGINT,
-    .....,
-    PRIMARY KEY (catalog_id, order_id) NOT ENFORCED -- composite primary key
-)
-```
-
-The query obtains a good acceleration by specifying a range filter for
-the leftmost prefix of the primary key.
-
-```sql
-SELECT * FROM orders WHERE catalog_id=1025;
-
-SELECT * FROM orders WHERE catalog_id=1025 AND order_id=29495;
-
-SELECT * FROM orders
-  WHERE catalog_id=1025
-  AND order_id>2035 AND order_id<6000;
-```
-
-However, the following filter cannot accelerate the query well.
-
-```sql
-SELECT * FROM orders WHERE order_id=29495;
-
-SELECT * FROM orders WHERE catalog_id=1025 OR order_id=29495;
-```
-
-## Snapshots Table
-
-You can query the snapshot history information of the table through Flink SQL.
+# Querying Tables
+
+Just like all other tables, Table Store tables can be queried with `SELECT` statement.

Review Comment:
   As I mentioned above, it would be very helpful to understand under what situations FTS adds a normalization node to the query topology.



##########
docs/content/docs/maintenance-actions/write-performance.md:
##########
@@ -0,0 +1,118 @@
+---
+title: "Write Performance"
+weight: 1
+type: docs
+aliases:
+- /maintenance-actions/write-performance.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Write Performance
+
+Performance of Table Store writers are related with the following factors.
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or equal to the number of buckets, preferably equal. You can control the parallelism of the sink with the `sink.parallelism` table property.
+
+<table class="table table-bordered">
+    <thead>
+    <tr>
+      <th class="text-left" style="width: 20%">Option</th>
+      <th class="text-left" style="width: 5%">Required</th>
+      <th class="text-left" style="width: 5%">Default</th>
+      <th class="text-left" style="width: 10%">Type</th>
+      <th class="text-left" style="width: 60%">Description</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td><h5>sink.parallelism</h5></td>
+      <td>No</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>Integer</td>
+      <td>Defines the parallelism of the sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator.</td>
+    </tr>
+    </tbody>
+</table>
+
+## Number of Sorted Runs to Trigger Compaction
+
+Table Store uses LSM which supports a large number of updates. LSM organizes files in several "sorted runs". When querying records from an LSM, all sorted runs must be combined to produce a complete view of all records.

Review Comment:
   Is a sorted run a collection of data files that contain non-overlapping keys? Can a single LSM level contain more than a single sorted run? Can a single data file belong to more than one sorted run?



##########
docs/content/docs/features/external-log-systems.md:
##########
@@ -0,0 +1,64 @@
+---
+title: "External Log Systems"
+weight: 2
+type: docs
+aliases:
+- /features/external-log-systems.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# External Log Systems
+
+Aside from [underlying table files]({{< ref "docs/features/table-types#changelog-producers" >}}), changelog of Table Store can also be stored into or consumed from an external log system, such as Kafka. By specifying `log.system` table property, users can choose which external log system to use.
+
+If an external log system is used, all records written into table files will also be written into the log system. Changes produced by the streaming queries will thus come from the log system instead of table files.
+
+## Consistency Guarantees
+
+By default, changes in the log systems are visible to consumers only after a snapshot, just like table files. This behavior guarantees the exactly-once semantics. That is, each record is seen by the consumers exactly once.
+
+However, users can also specify the table property `'log.consistency' = 'eventual'` so that changelog written into the log system can be immediately consumed by the consumers, without waiting for the next snapshot. This behavior decreases the latency of changelog, but it can only guarantee the at-least-once semantics (that is, consumers might see duplicated records) due to possible failures.

Review Comment:
   It might be worth mentioning somewhere that `'log.consistency' = 'eventual'` adds a normalization node to streaming queries. In general, it would very helpful to understand when FTS adds a normalization node in queries and insertions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1054194851


##########
docs/content/docs/features/lookup-joins.md:
##########
@@ -0,0 +1,97 @@
+---
+title: "Lookup Joins"

Review Comment:
   Lookup join seems more like a feature to me. The "SQL API" section should only contain basic and common usage of Table Store, while more advanced usage should be moved to the "Features" section.



##########
docs/content/docs/features/lookup-joins.md:
##########
@@ -0,0 +1,97 @@
+---
+title: "Lookup Joins"

Review Comment:
   Lookup join seems more like a feature to me. The "SQL API" section should only contain basic and common usage of Table Store, while more advanced usage should be moved to the "Features" section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053932615


##########
docs/content/docs/features/table-types.md:
##########
@@ -0,0 +1,142 @@
+---
+title: "Table Types"
+weight: 1
+type: docs
+aliases:
+- /features/table-types.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Types
+
+Table Store supports various types of tables. Users can specify `write-mode` table property to specify table types when creating tables.
+
+## Changelog Tables with Primary Keys
+
+Changelog table is the default table type when creating a table. Users can also specify `'write-mode' = 'change-log'` explicitly in table properties when creating the table.
+
+Primary keys are a set of columns that are unique for each record. Table Store imposes an ordering of data, which means the system will sort the primary key within each bucket. Using this feature, users can achieve high performance by adding filter conditions on the primary key.
+
+By [defining primary keys]({{< ref "docs/sql-api/creating-tables#tables-with-primary-keys" >}}) on a changelog table, users can access the following features.
+
+### Merge Engines
+
+When Table Store sink receives two or more records with the same primary keys, it will merge them into one record to keep primary keys unique. By specifying the `merge-engine` table property, users can choose how records are merged together.
+
+#### Deduplicate
+
+`deduplicate` merge engine is the default merge engine. Table Store will only keep the latest record and throw away other records with the same primary keys.
+
+Specifically, if the latest record is a `DELETE` record, all records with the same primary keys will be deleted.
+
+#### Partial Update
+
+By specifying `'merge-engine' = 'partial-update'`, users can set columns of a record across multiple updates and finally get a complete record. Specifically, value fields are updated to the latest data one by one under the same primary key, but null values are not overwritten.
+
+For example, let's say Table Store receives three records `<1, 23.0, 10, NULL>`, `<1, NULL, NULL, 'This is a book'>` and `<1, 25.2, NULL, NULL>`, where the first column is the primary key. The final result will be `<1, 25.2, 10, 'This is a book'>`.
+
+NOTE: For streaming queries, `partial-update` merge engine must be used together with `full-compaction` [changelog producer]({{< ref "docs/features/table-types#changelog-producers" >}}).
+
+#### Aggregation
+
+Sometimes users only care about aggregated results. The `aggregation` merge engine aggregates each value field with the latest data one by one under the same primary key according to the aggregate function.
+
+Each field not part of the primary keys must be given an aggregate function, specified by the `fields.<field-name>.aggregate-function` table property. For example, consider the following table definition.
+
+{{< tabs "aggregation-merge-engine-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    product_id BIGINT,
+    price DOUBLE,
+    sales BIGINT,
+    PRIMARY KEY (product_id) NOT ENFORCED
+) WITH (
+    'merge-engine' = 'aggregation',
+    'fields.price.aggregate-function' = 'max',
+    'fields.sales.aggregate-function' = 'sum'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+Field `price` will be aggregated by the `max` function, and field `sales` will be aggregated by the `sum` function. Given two input records `<1, 23.0, 15>` and `<1, 30.2, 20>`, the final result will be `<1, 30.2, 35>`.
+
+Current supported aggregate functions are data types are:
+
+* `sum`: supports DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT and DOUBLE.
+* `min`/`max`: support DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP and TIMESTAMP_LTZ.
+* `last_value` / `last_non_null_value`: support all data types.
+* `listagg`: supports STRING data type.
+* `bool_and` / `bool_or`: support BOOLEAN data type.
+
+### Changelog Producers
+
+Streaming queries will continuously produce latest changes. These changes can come from the underlying table files or from an [external log system]({{< ref "docs/features/external-log-systems" >}}) like Kafka. Compared to the external log system, changes from table files have lower cost but higher latency (depending on how often snapshots are created).
+
+By specifying the `changelog-producer` table property when creating the table, users can choose the pattern of changes produced from files.

Review Comment:
   It does not affect Kafka. Kafka always stores the complete input records just like the `input` changelog producer. I'll make it clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #443: [FLINK-30458] Refactor Table Store Documentation

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053934969


##########
docs/content/docs/sql-api/writing-tables.md:
##########
@@ -0,0 +1,79 @@
+---
+title: "Writing Tables"
+weight: 4
+type: docs
+aliases:
+- /sql-api/writing-tables.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Writing Tables
+
+## Applying Records/Changes to Tables
+
+{{< tabs "insert-into-example" >}}
+
+{{< tab "Flink" >}}
+
+Use `INSERT INTO` to apply records and changes to tables.

Review Comment:
   `changelog-mode` only affects external log systems like Kafka. There must be `before` values from the input.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org