You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by lz...@apache.org on 2022/11/17 03:43:32 UTC

[flink-table-store] branch master updated: [FLINK-30043] Some example sqls in flink table store rescale-bucket doucument are incorrect

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink-table-store.git


The following commit(s) were added to refs/heads/master by this push:
     new c2df4674 [FLINK-30043] Some example sqls in flink table store rescale-bucket doucument are incorrect
c2df4674 is described below

commit c2df467406e7817b1f308cd4a8eb1c5baaa5f294
Author: Stan <53...@users.noreply.github.com>
AuthorDate: Thu Nov 17 11:43:28 2022 +0800

    [FLINK-30043] Some example sqls in flink table store rescale-bucket doucument are incorrect
    
    This closes #385
---
 docs/content/docs/development/rescale-bucket.md | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/docs/content/docs/development/rescale-bucket.md b/docs/content/docs/development/rescale-bucket.md
index 337b1063..ba2e59e9 100644
--- a/docs/content/docs/development/rescale-bucket.md
+++ b/docs/content/docs/development/rescale-bucket.md
@@ -88,6 +88,21 @@ WITH (
     'bucket' = '16'
 );
 
+-- like from a kafka table 
+CREATE temporary TABLE raw_orders(
+    trade_order_id BIGINT,
+    item_id BIGINT,
+    item_price BIGINT,
+    gmt_create STRING,
+    order_status STRING
+) WITH (
+    'connector' = 'kafka',
+    'topic' = '...',
+    'properties.bootstrap.servers' = '...',
+    'format' = 'csv'
+    ...
+);
+
 -- streaming insert as bucket num = 16
 INSERT INTO verified_orders
 SELECT trade_order_id,
@@ -95,7 +110,7 @@ SELECT trade_order_id,
        item_price,
        DATE_FORMAT(gmt_create, 'yyyy-MM-dd') AS dt
 FROM raw_orders
-WHERE order_status = 'verified'
+WHERE order_status = 'verified';
 ```
 The pipeline has been running well for the past few weeks. However, the data volume has grown fast recently, 
 and the job's latency keeps increasing. To improve the data freshness, users can 
@@ -110,7 +125,7 @@ and the job's latency keeps increasing. To improve the data freshness, users can
 - Increase the bucket number
   ```sql
   -- scaling out
-  ALTER TABLE verified_orders SET ('bucket' = '32')
+  ALTER TABLE verified_orders SET ('bucket' = '32');
   ```
 - Switch to the batch mode and overwrite the current partition(s) to which the streaming job is writing
   ```sql
@@ -122,7 +137,7 @@ and the job's latency keeps increasing. To improve the data freshness, users can
          item_id,
          item_price
   FROM verified_orders
-  WHERE dt = '2022-06-22' AND order_status = 'verified'
+  WHERE dt = '2022-06-22';
   
   -- case 2: there are late events updating the historical partitions, but the range does not exceed 3 days
   INSERT OVERWRITE verified_orders
@@ -131,7 +146,7 @@ and the job's latency keeps increasing. To improve the data freshness, users can
          item_price,
          dt
   FROM verified_orders
-  WHERE dt IN ('2022-06-20', '2022-06-21', '2022-06-22') AND order_status = 'verified'
+  WHERE dt IN ('2022-06-20', '2022-06-21', '2022-06-22');
   ```
 - After overwrite job finished, switch back to streaming mode. And now, the parallelism can be increased alongside with bucket number to restore the streaming job from the savepoint 
 ( see [Start a SQL Job from a savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sqlclient/#start-a-sql-job-from-a-savepoint) )
@@ -145,5 +160,5 @@ and the job's latency keeps increasing. To improve the data freshness, users can
        item_price,
        DATE_FORMAT(gmt_create, 'yyyy-MM-dd') AS dt
   FROM raw_orders
-  WHERE order_status = 'verified'
+  WHERE order_status = 'verified';
   ```