You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/17 03:43:57 UTC

[GitHub] [incubator-doris] morningman opened a new pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2

morningman opened a new pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2
URL: https://github.com/apache/incubator-doris/pull/3340
 
 
   This CL mainly made the following modifications:
   
   1. Reorganized SegmentV2 upgrade document.
   2. When the variable `use_v2_rollup` is set to true, the base rollup in v2 format is forcibly queried for verifying the data.
   3. Fix a problem that there is no persistent storage format information in the schema change operation that performs v2 conversion.
   4. Allow users to directly create v2 format tables.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2
URL: https://github.com/apache/incubator-doris/pull/3340#discussion_r410805271
 
 

 ##########
 File path: docs/documentation/cn/administrator-guide/segment-v2-usage.md
 ##########
 @@ -1,118 +1,128 @@
-# Doris Segment V2上线和试用手册
+# Segment V2 升级手册
 
 ## 背景
 
-Doris 0.12版本中实现了segment v2(新的存储格式),引入词典压缩、bitmap索引、page cache等优化,能够提升系统性能。目前0.12版本已经发布alpha版本,正在内部上线过程中,上线的方案和试用方法记录如下
+Doris 0.12 版本中实现了新的存储格式:Segment v2,引入词典压缩、bitmap索引、page cache等优化,能够提升系统性能。
+
+0.12 版本会同时支持读写原有的 Segment V1(以下简称V1) 和新的 Segment V2(以下简称V2) 两种格式。如果原有数据想使用 V2 相关特性,需通过命令将 V1 转换成 V2 格式。
+
+本文档主要介绍从 0.11 版本升级至 0.12 版本后,如果转换和使用 V2 格式。
 
 Review comment:
   ```suggestion
   本文档主要介绍从 0.11 版本升级至 0.12 版本后,如何转换和使用 V2 格式。
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3340:
URL: https://github.com/apache/incubator-doris/pull/3340#discussion_r411387555



##########
File path: fe/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java
##########
@@ -944,7 +945,9 @@ private void createJob(long dbId, OlapTable olapTable, Map<Long, LinkedList<Colu
             } else if (hasIndexChange) {
                 needAlter = true;
             } else if (storageFormat == TStorageFormat.V2) {
-                needAlter = true;
+                if (olapTable.getTableProperty().getStorageFormat() != TStorageFormat.V2) {

Review comment:
       Yes, if storage format is already V2, `needAlter` will remain `false`, and nothing will be done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3340:
URL: https://github.com/apache/incubator-doris/pull/3340#discussion_r411386872



##########
File path: docs/documentation/cn/administrator-guide/segment-v2-usage.md
##########
@@ -1,118 +1,128 @@
-# Doris Segment V2上线和试用手册

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3340:
URL: https://github.com/apache/incubator-doris/pull/3340#discussion_r411387864



##########
File path: fe/src/main/java/org/apache/doris/planner/MaterializedViewSelector.java
##########
@@ -109,9 +109,22 @@ public BestIndexInfo selectBestMV(ScanNode scanNode) throws UserException {
         long start = System.currentTimeMillis();
         Preconditions.checkState(scanNode instanceof OlapScanNode);
         OlapScanNode olapScanNode = (OlapScanNode) scanNode;
+
+        ConnectContext connectContext = ConnectContext.get();
+        if (connectContext != null && connectContext.getSessionVariable().isUseV2Rollup()) {

Review comment:
       Yes, if 'use v2 rollup' is true, only base index will be selected




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #3340: [SegmentV2] Optimize the upgrade logic of SegmentV2

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on a change in pull request #3340:
URL: https://github.com/apache/incubator-doris/pull/3340#discussion_r411297291



##########
File path: docs/documentation/cn/administrator-guide/segment-v2-usage.md
##########
@@ -1,118 +1,128 @@
-# Doris Segment V2上线和试用手册

Review comment:
       Please add the license firstly

##########
File path: docs/documentation/cn/administrator-guide/segment-v2-usage.md
##########
@@ -1,118 +1,128 @@
-# Doris Segment V2上线和试用手册
+# Segment V2 升级手册
 
 ## 背景
 
-Doris 0.12版本中实现了segment v2(新的存储格式),引入词典压缩、bitmap索引、page cache等优化,能够提升系统性能。目前0.12版本已经发布alpha版本,正在内部上线过程中,上线的方案和试用方法记录如下
+Doris 0.12 版本中实现了新的存储格式:Segment v2,引入词典压缩、bitmap索引、page cache等优化,能够提升系统性能。

Review comment:
       ```suggestion
   Doris 0.12 版本中实现了新的存储格式:Segment V2,引入词典压缩、bitmap索引、page cache等优化,能够提升系统性能。
   ```

##########
File path: docs/documentation/cn/administrator-guide/segment-v2-usage.md
##########
@@ -1,118 +1,128 @@
-# Doris Segment V2上线和试用手册
+# Segment V2 升级手册
 
 ## 背景
 
-Doris 0.12版本中实现了segment v2(新的存储格式),引入词典压缩、bitmap索引、page cache等优化,能够提升系统性能。目前0.12版本已经发布alpha版本,正在内部上线过程中,上线的方案和试用方法记录如下
+Doris 0.12 版本中实现了新的存储格式:Segment v2,引入词典压缩、bitmap索引、page cache等优化,能够提升系统性能。
+
+0.12 版本会同时支持读写原有的 Segment V1(以下简称V1) 和新的 Segment V2(以下简称V2) 两种格式。如果原有数据想使用 V2 相关特性,需通过命令将 V1 转换成 V2 格式。
+
+本文档主要介绍从 0.11 版本升级至 0.12 版本后,如何转换和使用 V2 格式。
 
-## 上线
+V2 格式的表可以支持以下新的特性:
 
-为了保证上线的稳定性,上线分为三个阶段:
-第一个阶段是上线0.12版本,但是不全量开启segment v2的功能,只在验证的时候,创建segment v2的表(或者索引)
-第二个阶段是全量开启segment v2的功能,替换现有的segment存储格式,这样对新表会创建segment v2的格式的存储文件,但是对于旧表,需要依赖于compaction和schema change等过程,实现格式的转化
-第三个阶段就是转化旧的segment格式到新的segment v2的格式,这个需要在验证segment v2的正确性和性能没有问题之后,可以按照用户意愿逐步完成。
+1. bitmap 索引
+2. 内存表
+3. page cache
+4. 字典压缩
+5. 延迟物化(Lazy materialization)
 
-### 上线验证
+## 集群升级
 
-- 正确性
+0.12 版本仅支持从 0.11 版本升级,不支持从 0.11 之前的版本升级。请先确保升级的前的 Doris 集群版本为 0.11。
 
-正确性是segment v2上线最需要保证的指标。 在第一阶段,为了保证线上环境的稳定性,并且验证segment v2的正确性,采用如下的方案:
-1. 选择几个需要验证的表,使用以下语句,创建segment v2格式的rollup表,该rollup表与base表的schema相同
+0.12 版本有两个 V2 相关的重要参数:
 
-	alter table table_name add rollup table_name (columns) properties ("storage_format" = "v2");
+* `default_rowset_type`:FE 一个全局变量(Global Variable)设置,默认为 "alpha",即 V1 版本。
+* `default_rowset_type`:BE 的一个配置项,默认为 "ALPHA",即 V1 版本。
 
-	其中,
-	rollup后面的index名字直接指定为base table的table name,该语句会自动生成一个__v2_table_name。
+保持上述配置默认的话,按常规步骤对集群升级后,原有集群数据的存储格式不会变更,即依然为 V1 格式。如果对 V2 格式没有需求,则继续正常使用集群即可,无需做任何额外操作。所有原有数据、以及新导入的数据,都依然是 V1 版本。
 
-	columns可以随便指定一个列名即可,这里一方面是为了兼容现有的语法,一方面是为了方便。
+## V2 格式转换
 
-2. 上面的创建出来的rollup index名字格式类似:__v2_table_name
+### 已有表数据转换成 V2
 
-	通过命令 :
+对于已有表数据的格式转换,Doris 提供两种方式:
 
-	`desc table_name all;`
+1. 创建一个 V2 格式的特殊 Rollup
 
-	查看table中是否存在名字:__v2_table_name的rollup。由于创建segment v2的rollup表是一个异步操作,所以并不会立即成功。如果上面命令中并没有显示新创建的 rollup,可能是还在创建过程中。
+    该方式会针对指定表,创建一个 V2 格式的特殊 Rollup。创建完成后,新的 V2 格式的 Rollup 会和原有表格式数据并存。用户可以指定对 V2 格式的 Rollup 进行查询验证。
+    
+    该方式主要用于对 V2 格式的验证,因为不会修改原有表数据,因此可以安全的进行 V2 格式的数据验证,而不用担心表数据因格式转换而损坏。通常先使用这个方式对数据进行校验,之后再使用方法2对整个表进行数据格式转换。
+    
+    操作步骤如下:
+    
+    ```
+    ## 创建 V2 格式的 ROLLUP
+    
+    ALTER TABLE table_name ADD ROLLUP table_name (columns) PROPERTIES ("storage_format" = "v2");
+    ```
 
-    可以通过下面命令查看正在进行的 rollup 表。
+    其中, Rollup 的名称必须为表名。columns 字段可以任意填写,系统不会检查该字段的合法性。该语句会自动生成一个名为 `__v2_table_name` 的 Rollup,并且该 Rollup 列包含表的全部列。
+    
+    通过以下语句查看创建进度:
+    
+    ```
+    SHOW ALTER TABLE ROLLUP;
+    ```
+    
+    创建完成后,可以通过 `DESC table_name ALL;` 查看到名为 `__v2_table_name` 的 Rollup。
+    
+    之后,通过如下命令,切换到 V2 格式查询:
 
-	`show alter table rollup;`
+    ```
+    set use_v2_rollup = true;
+    select * from table_name limit 10;
+    ```
+    
+    `use_v2_rollup` 这个变量会强制查询名为 `__v2_table_name` 的 Rollup,并且不会考虑其他 rollup 的命中条件。所以该参数仅用于对 V2 格式数据进行验证。

Review comment:
       ```suggestion
       `use_v2_rollup` 这个变量会强制查询名为 `__v2_table_name` 的 Rollup,并且不会考虑其他 Rollup 的命中条件。所以该参数仅用于对 V2 格式数据进行验证。
   ```

##########
File path: fe/src/main/java/org/apache/doris/planner/MaterializedViewSelector.java
##########
@@ -109,9 +109,22 @@ public BestIndexInfo selectBestMV(ScanNode scanNode) throws UserException {
         long start = System.currentTimeMillis();
         Preconditions.checkState(scanNode instanceof OlapScanNode);
         OlapScanNode olapScanNode = (OlapScanNode) scanNode;
+
+        ConnectContext connectContext = ConnectContext.get();
+        if (connectContext != null && connectContext.getSessionVariable().isUseV2Rollup()) {

Review comment:
       If user set 'use v2 rollup', the best rollup will not be selected?

##########
File path: fe/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java
##########
@@ -944,7 +945,9 @@ private void createJob(long dbId, OlapTable olapTable, Map<Long, LinkedList<Colu
             } else if (hasIndexChange) {
                 needAlter = true;
             } else if (storageFormat == TStorageFormat.V2) {
-                needAlter = true;
+                if (olapTable.getTableProperty().getStorageFormat() != TStorageFormat.V2) {

Review comment:
       If the storage format is already V2,  will Doris forbidden the schema change job

##########
File path: fe/src/main/java/org/apache/doris/planner/MaterializedViewSelector.java
##########
@@ -109,9 +109,22 @@ public BestIndexInfo selectBestMV(ScanNode scanNode) throws UserException {
         long start = System.currentTimeMillis();
         Preconditions.checkState(scanNode instanceof OlapScanNode);
         OlapScanNode olapScanNode = (OlapScanNode) scanNode;
+
+        ConnectContext connectContext = ConnectContext.get();
+        if (connectContext != null && connectContext.getSessionVariable().isUseV2Rollup()) {
+            // if user set `use_v2_rollup` variable to true, and there is a segment v2 rollup,
+            // just return the segment v2 rollup, because user want to check the v2 format data.
+            OlapTable tbl = olapScanNode.getOlapTable();
+            String v2RollupIndexName = MaterializedViewHandler.NEW_STORAGE_FORMAT_INDEX_NAME_PREFIX + tbl.getName();
+            Long v2RollupIndexId = tbl.getIndexIdByName(v2RollupIndexName);
+            if (v2RollupIndexId != null) {
+                return new BestIndexInfo(v2RollupIndexId, false, "use_v2_rollup is true");
+            }
+        }
+
         Map<Long, List<Column>> candidateIndexIdToSchema = predicates(olapScanNode);
         long bestIndexId = priorities(olapScanNode, candidateIndexIdToSchema);

Review comment:
       Maybe We should check the candidate index firstly and choose the best rollup in priorities.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org