You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by zy...@apache.org on 2023/06/26 11:24:51 UTC

[doris] branch master updated: [Docs](inverted index) update docs for build index (#21184)

This is an automated email from the ASF dual-hosted git repository.

zykkk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 65d81c04e6 [Docs](inverted index) update docs for build index (#21184)
65d81c04e6 is described below

commit 65d81c04e64e46e2883f0cc89843f599ae3091c6
Author: YueW <45...@users.noreply.github.com>
AuthorDate: Mon Jun 26 19:24:44 2023 +0800

    [Docs](inverted index) update docs for build index (#21184)
---
 docs/en/docs/data-table/index/inverted-index.md    | 63 ++++++++++++++++++++++
 docs/zh-CN/docs/data-table/index/inverted-index.md | 63 ++++++++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/docs/en/docs/data-table/index/inverted-index.md b/docs/en/docs/data-table/index/inverted-index.md
index 331fa90491..06102632f8 100644
--- a/docs/en/docs/data-table/index/inverted-index.md
+++ b/docs/en/docs/data-table/index/inverted-index.md
@@ -99,6 +99,8 @@ table_properties;
 ```
 
 - add an inverted index to existed table
+
+**Before version 2.0-beta:**
 ```sql
 -- syntax 1
 CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|chinese|unicode")] [COMMENT 'your comment'];
@@ -106,6 +108,26 @@ CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("par
 ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|chinese|unicode")] [COMMENT 'your comment'];
 ```
 
+**After version 2.0-beta (including 2.0-beta):**
+
+The above 'create/add index' operation only generates inverted index for incremental data. The syntax of build index is added to add inverted index to stock data:
+```sql
+-- syntax 1, add inverted index to the stock data of the whole table by default
+BUILD INDEX index_name ON table_name;
+-- syntax 2, partition can be specified, and one or more can be specified
+BUILD INDEX index_name ON table_name PARTITIONS(partition_name1, partition_name2);
+```
+(**The above 'create/add index' operation needs to be executed before executing the build index**)
+
+To view the progress of the `build index`, you can use the following statement
+```sql
+show build index [FROM db_name];
+-- Example 1: Viewing the progress of all build index tasks
+show build index;
+-- Example 2: Viewing the progress of the build index task for a specified table
+show build index where TableName = "table1";
+```
+
 - drop an inverted index
 ```sql
 -- syntax 1
@@ -327,6 +349,11 @@ mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00'
 mysql> CREATE INDEX idx_timestamp ON hackernews_1m(timestamp) USING INVERTED;
 Query OK, 0 rows affected (0.03 sec)
 ```
+**After 2.0-beta (including 2.0-beta), you need to execute `build index` to add inverted index to the stock data:**
+```sql
+mysql> BUILD INDEX idx_timestamp ON hackernews_1m;
+Query OK, 0 rows affected (0.01 sec)
+```
 
 - progress of building index can be view by SQL. It just costs 1s (compare FinishTime and CreateTime) to build index for timestamp column with 1 million rows.
 ```sql
@@ -339,6 +366,18 @@ mysql> SHOW ALTER TABLE COLUMN;
 1 row in set (0.00 sec)
 ```
 
+**After 2.0-beta (including 2.0-beta), you can view the progress of stock data creating index by `show build index`:**
+```sql
+-- If the table has no partitions, the PartitionName defaults to TableName
+mysql> show build index;
++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| JobId | TableName     | PartitionName | AlterInvertedIndexes                                     | CreateTime              | FinishTime              | TransactionId | State    | Msg  | Progress |
++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| 10191 | hackernews_1m | hackernews_1m | [ADD INDEX idx_timestamp (`timestamp`) USING INVERTED],  | 2023-06-26 15:32:33.894 | 2023-06-26 15:32:34.847 | 3             | FINISHED |      | NULL     |
++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+1 row in set (0.04 sec)
+```
+
 - after the index is build, Doris will automaticaly use index for range query, but the performance is almost the same since it's already fast on the small dataset
 ```sql
 mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
@@ -365,6 +404,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189;
 mysql> ALTER TABLE hackernews_1m ADD INDEX idx_parent(parent) USING INVERTED;
 Query OK, 0 rows affected (0.01 sec)
 
+-- After 2.0-beta (including 2.0-beta), you need to execute `build index` to add inverted index to the stock data:
+mysql> BUILD INDEX idx_parent ON hackernews_1m;
+Query OK, 0 rows affected (0.01 sec)
+
 mysql> SHOW ALTER TABLE COLUMN;
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
 | JobId | TableName     | CreateTime              | FinishTime              | IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | State    | Msg  | Progress | Timeout |
@@ -373,6 +416,14 @@ mysql> SHOW ALTER TABLE COLUMN;
 | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 | hackernews_1m | 10054   | 10008         | 1:378856428   | 4             | FINISHED |      | NULL     | 2592000 |
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
 
+mysql> show build index;
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| JobId | TableName     | PartitionName | AlterInvertedIndexes                               | CreateTime              | FinishTime              | TransactionId | State    | Msg  | Progress |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| 11005 | hackernews_1m | hackernews_1m | [ADD INDEX idx_parent (`parent`) USING INVERTED],  | 2023-06-26 16:25:10.167 | 2023-06-26 16:25:10.838 | 1002          | FINISHED |      | NULL     |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+1 row in set (0.01 sec)
+
 mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189;
 +---------+
 | count() |
@@ -396,6 +447,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster';
 mysql> ALTER TABLE hackernews_1m ADD INDEX idx_author(author) USING INVERTED;
 Query OK, 0 rows affected (0.01 sec)
 
+-- After 2.0-beta (including 2.0-beta), you need to execute `build index` to add inverted index to the stock data:
+mysql> BUILD INDEX idx_author ON hackernews_1m;
+Query OK, 0 rows affected (0.01 sec)
+
 -- costs 1.5s to build index for author column with 1 million rows.
 mysql> SHOW ALTER TABLE COLUMN;
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
@@ -406,6 +461,14 @@ mysql> SHOW ALTER TABLE COLUMN;
 | 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 | hackernews_1m | 10077   | 10008         | 1:1335127701  | 5             | FINISHED |      | NULL     | 2592000 |
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
 
+mysql> show build index order by CreateTime desc limit 1;
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| JobId | TableName     | PartitionName | AlterInvertedIndexes                               | CreateTime              | FinishTime              | TransactionId | State    | Msg  | Progress |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) USING INVERTED],  | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004          | FINISHED |      | NULL     |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+1 row in set (0.01 sec)
+
 -- equal qury on text field autor get 3x speedup
 mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster';
 +---------+
diff --git a/docs/zh-CN/docs/data-table/index/inverted-index.md b/docs/zh-CN/docs/data-table/index/inverted-index.md
index 15f7485d8e..e78ce537bc 100644
--- a/docs/zh-CN/docs/data-table/index/inverted-index.md
+++ b/docs/zh-CN/docs/data-table/index/inverted-index.md
@@ -97,6 +97,8 @@ table_properties;
 ```
 
 - 已有表增加倒排索引
+
+**2.0-beta版本之前:**
 ```sql
 -- 语法1
 CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|unicode|chinese")] [COMMENT 'your comment'];
@@ -104,6 +106,26 @@ CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("par
 ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|unicode|chinese")] [COMMENT 'your comment'];
 ```
 
+**2.0-beta版本(含2.0-beta)之后:**
+
+上述`create/add index`操作只对增量数据生成倒排索引,增加了build index的语法用于对存量数据加倒排索引:
+```sql
+-- 语法1,默认给全表的存量数据加上倒排索引
+BUILD INDEX index_name ON table_name;
+-- 语法2,可指定partition,可指定一个或多个
+BUILD INDEX index_name ON table_name PARTITIONS(partition_name1, partition_name2);
+```
+(**在执行build index之前需要已经执行了以上`create/add index`的操作**)
+
+查看`build index`进展,可通过以下语句进行查看:
+```sql
+show build index [FROM db_name];
+-- 示例1,查看所有的build index任务进展
+show build index;
+-- 示例2,查看指定table的build index任务进展
+show build index where TableName = "table1";
+```
+
 - 删除倒排索引
 ```sql
 -- 语法1
@@ -324,6 +346,11 @@ mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00'
 -- CREATE INDEX 是第一种建索引的语法,另外一种在后面展示
 mysql> CREATE INDEX idx_timestamp ON hackernews_1m(timestamp) USING INVERTED;
 Query OK, 0 rows affected (0.03 sec)
+```
+  **2.0-beta(含2.0-beta)后,需要再执行`build index`才能给存量数据加上倒排索引:**
+```sql
+mysql> BUILD INDEX idx_timestamp ON hackernews_1m;
+Query OK, 0 rows affected (0.01 sec)
 ```
 
 - 查看索引创建进度,通过FinishTime和CreateTime的差值,可以看到100万条数据对timestamp列建倒排索引只用了1s
@@ -337,6 +364,18 @@ mysql> SHOW ALTER TABLE COLUMN;
 1 row in set (0.00 sec)
 ```
 
+**2.0-beta(含2.0-beta)后,可通过`show builde index`来查看存量数据创建索引进展:**
+```sql
+-- 若table没有分区,PartitionName默认就是TableName
+mysql> show build index;
++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| JobId | TableName     | PartitionName | AlterInvertedIndexes                                     | CreateTime              | FinishTime              | TransactionId | State    | Msg  | Progress |
++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| 10191 | hackernews_1m | hackernews_1m | [ADD INDEX idx_timestamp (`timestamp`) USING INVERTED],  | 2023-06-26 15:32:33.894 | 2023-06-26 15:32:34.847 | 3             | FINISHED |      | NULL     |
++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+1 row in set (0.04 sec)
+```
+
 - 索引创建后,范围查询用同样的查询方式,Doris会自动识别索引进行优化,但是这里由于数据量小性能差别不大
 ```sql
 mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
@@ -363,6 +402,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189;
 mysql> ALTER TABLE hackernews_1m ADD INDEX idx_parent(parent) USING INVERTED;
 Query OK, 0 rows affected (0.01 sec)
 
+-- 2.0-beta(含2.0-beta)后,需要再执行build index才能给存量数据加上倒排索引:
+mysql> BUILD INDEX idx_parent ON hackernews_1m;
+Query OK, 0 rows affected (0.01 sec)
+
 mysql> SHOW ALTER TABLE COLUMN;
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
 | JobId | TableName     | CreateTime              | FinishTime              | IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | State    | Msg  | Progress | Timeout |
@@ -371,6 +414,14 @@ mysql> SHOW ALTER TABLE COLUMN;
 | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 | hackernews_1m | 10054   | 10008         | 1:378856428   | 4             | FINISHED |      | NULL     | 2592000 |
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
 
+mysql> show build index;
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| JobId | TableName     | PartitionName | AlterInvertedIndexes                               | CreateTime              | FinishTime              | TransactionId | State    | Msg  | Progress |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| 11005 | hackernews_1m | hackernews_1m | [ADD INDEX idx_parent (`parent`) USING INVERTED],  | 2023-06-26 16:25:10.167 | 2023-06-26 16:25:10.838 | 1002          | FINISHED |      | NULL     |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+1 row in set (0.01 sec)
+
 mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189;
 +---------+
 | count() |
@@ -394,6 +445,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster';
 mysql> ALTER TABLE hackernews_1m ADD INDEX idx_author(author) USING INVERTED;
 Query OK, 0 rows affected (0.01 sec)
 
+-- 2.0-beta(含2.0-beta)后,需要再执行build index才能给存量数据加上倒排索引:
+mysql> BUILD INDEX idx_author ON hackernews_1m;
+Query OK, 0 rows affected (0.01 sec)
+
 -- 100万条author数据增量建索引仅消耗1.5s
 mysql> SHOW ALTER TABLE COLUMN;
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
@@ -404,6 +459,14 @@ mysql> SHOW ALTER TABLE COLUMN;
 | 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 | hackernews_1m | 10077   | 10008         | 1:1335127701  | 5             | FINISHED |      | NULL     | 2592000 |
 +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
 
+mysql> show build index order by CreateTime desc limit 1;
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| JobId | TableName     | PartitionName | AlterInvertedIndexes                               | CreateTime              | FinishTime              | TransactionId | State    | Msg  | Progress |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) USING INVERTED],  | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004          | FINISHED |      | NULL     |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+1 row in set (0.01 sec)
+
 -- 创建索引后,字符串等值匹配也有明显加速
 mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster';
 +---------+


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org