You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/29 16:07:07 UTC

[GitHub] [incubator-doris] HappenLee opened a new issue #3438: [Proposal] Vectorization query optimization for Doris

HappenLee opened a new issue #3438:
URL: https://github.com/apache/incubator-doris/issues/3438


   #### Motivation
   At present, the underlying storage in Doris is column storage.Query execution needs to be transferred to the query layer for execution by row-to-column first. Such an implementation maybe cause the performance problem。
   
   * 1. Row-to-row loss.
   * 2. Can not get better CPU performance without vectorized execution.
   
   So we want to transform the query layer of Doris to vectorized execution so that it can be not only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency. This can benefit query performance.
   
   Here I simply implemented a POC to verify whether there is a performance improvement
   
   ###### Test environment:
   * **Data set**
   ```Star Schema Benchmark```
       
   *  **Data generation**
     ```
      git clone git@github.com:vadimtk/ssb-dbgen.git
      cd ssb-dbgen
      make
    ```
   Download the **SSBM** code from github and compile it. After the compilation is successful, execute the following command to generate 3000W customer data:
   ```
    ./dbgen -s 1000 -T c
   ```
   *  **Build Table and Data Import**
   Use the following statement to create a test table, and import the data **customer.tbl** into Doris, the data size is about 3.2GB
   ```
   customer | CREATE TABLE `customer` (
     `C_CUSTKEY` int(11) NULL COMMENT "",
     `C_NAME` varchar(255) NOT NULL COMMENT "",
     `C_ADDRESS` varchar(255) NOT NULL COMMENT "",
     `C_CITY` varchar(255) NOT NULL COMMENT "",
     `C_NATION` varchar(255) NOT NULL COMMENT "",
     `C_REGION` varchar(255) NOT NULL COMMENT "",
     `C_PHONE` varchar(255) NOT NULL COMMENT "",
     `C_MKTSEGMENT` varchar(255) NOT NULL COMMENT ""
   ) ENGINE=OLAP
   DUPLICATE KEY(`C_CUSTKEY`, `C_NAME`, `C_ADDRESS`)
   COMMENT "OLAP"
   DISTRIBUTED BY HASH(`C_CUSTKEY`) BUCKETS 10
   PROPERTIES (
   "storage_type" = "COLUMN"
   ); 
   ```
   *  **Environment**
   ```
   GNU/Linux CentOS 6.3 (Final) build 2.6.32_1-19-0-0
   
   Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
    2 physical CPU package(s)
    24 physical CPU core(s)
    48 logical CPU(s)
   Identifier: Intel64 Family 6 Model 79 Stepping 1
   ProcessorID: F1 06 04 00 FF FB EB BF
   Context Switches/Interrupts: 12174692729137 / 297015608902
   
   
   Memory: 119.5 GiB/125.9 GiB
   ```
   
   Single FE and Single BE in the same server.
   
   ###### Test：
   
     * Modify the logic of Doris' query layer to support the vectorized aggregation of column inventory during aggregation calculations. Record the time when the row transfer to column:
   
   ![在NewPartitionedAggregationNode之中增加计算器，并且在析构函数之中打印出来](https://upload-images.jianshu.io/upload_images/8552201-87de31d5ffa0a4d7.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
      
   Calculate the loss time of row transter to column 
   ![](https://upload-images.jianshu.io/upload_images/8552201-0853796248b9ada4.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
   
   * **Results**
    ```select max(C_PHONE) from customer group by C_MKTSEGMENT;```
   
        Statistic|Origin| Convert to column | origin(muti-thread) | Convert to column(muti-thread) 
     :-:|:-:|:-:|:-:|:-:
     Time | 4.19 Sec | 4.57 Sec - 2.17Sec (Convert Time) | 0.67 Sec |  0.69 Sec |  
     Context-Switches | 31,737 |  32,468 |  40,463 |  30,699
     Migrations | 506 | 662 | 4,920 | 3265
     Instructions | 48,890,013,173 | 47,963,367,976  | 49,111,783,565  | 48,113,904,685 
     IPC | 1.57 | 1.42 | 1.40 | 1.37
     Branches | 9,201,175,036 | 9,124,545,231  | 9,248,803,634| 9,154,186,301
     Branches-Miss % | 0.90% | 1.02%  | 0.91% | 1.02%
   
   #### Implementation
   Doris currently has a corresponding ```VectorizedRowBatch ```implementation. So we can gradually complete the optimization each exec node.
   
   1. Starting from ```olap_scan_node```,  using vectorization query  test and observe whether there is expected performance improvement
   2. ```exec_node``` need to implement method 
   for ```VectorizedRowBatch``` trans to ```RowBatch``` method, retaining compatibility with the original execution logic
   
   
   
     
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] HappenLee commented on issue #3438: [Proposal] Vectorization query optimization for Doris

Posted by GitBox <gi...@apache.org>.

HappenLee commented on issue #3438:
URL: https://github.com/apache/incubator-doris/issues/3438#issuecomment-625759402


   
   ### About the POC
   At present, only aggregatenode has been verified .
   * 1. First, we do tuple convert columnVector and **time consuming of conversion operation is recorded.**
   ```
   {
             Timer time;
             FOREACH_ROW_LIMIT(in_batch, group_start, cache_size, in_batch_iter) {
                 // Hoist lookups out of non-null branch to speed up non-null case.
                 TupleRow *in_row = in_batch_iter.get();
                 for (int i = 0; i < agg_fn_evals_.size(); ++i) {
                      void** values = static_cast<void **>(valuesVector[i].col_data());
                      values[count] = agg_fn_evals_[i]->getAggValue(in_row);
                 }
   
                 for (int i = 0; i < ht_ctx->get_key_size(); ++i) {
                      void** keys = static_cast<void **>(keysVector[i].col_data());
                      keys[count] = ht_ctx->get_key(in_row, i);
                 }
                 count++;
             }
             _cost_time += time.stop();
         }
   ```
   * 2. Then we do aggregate use column storage
   ```
           EvalAndHashPrefetchGroup(keysVector, count, ht_ctx, remaining_capacity, tuples);
           for (int i = 0; i < agg_fn_evals_.size(); ++i) {
                UpdateTuple(agg_fn_evals_[i], tuples, static_cast<void**>(valuesVector[i].col_data()), count);
           }
   ```
   
   * 3. Execute the following query statement in a table of 70 million。```select max(C_PHONE) from customer group by C_MKTSEGMENT;``` 
   
   Count the time of aggratetation. we can find deducte the time of row conversion, **the time cut in half**.
   
      Statistic|origin| row convert to column  | 
     :-:|:-:|:-:
     Time | 4.19 Sec | 2.47 Sec | 
    
   * 4. In this case, the performance is improved through the **execution model of the column store**。
      In addition to the aggregatenode, another part of POC is the expression calculation and filtering part, such as AND OR filtering calculations whether can be improved by vectorization.
   **Join and sort can be ignored in this period, so if the query contains join or sort, it needs to be able to identify and use non vectorized execution path for execution by query planning.**
   
   ### Implementation and Design Detail
   There is three important parts: **Rewriting of scannode, Rewriting of expression calculation, Rewriting of aggregatenode**.
   
   
   #### Rewrite the scanner section first:
   * At present, for agg key and unique key, it is necessary to perform data pre aggregation and
   other operations by heap sort. **It seems default to avoid the transformation of columns and
   rows. So we only implement vectorized execution in dup_key.** 
   we can use **Rowblockv2 to be an abstraction of the column storage model, it is used to transfer data between each node**。we don't need convert **Rowblockv2 to old Rowblock**.
   * Implement the vectorization of individual expression (For example, the first phase can only
   implement simple expressions such as =/</>).
   
   We can make a judgment when querying the plan. If query on dup_key and only have simple expressions,
   the vectorization interface will be called. The pseudo code is as follows:
   ```
   VectorizedBatch vbatch = _reader->next_batch_with_aggregation();
   vector<rowId> selectedRowId = Expr.eval(vbatch);
   convertVecBatchToRowBatch(vbatch, selectedRowId, output_rowbatch);
   ```
   **If there are other expressions that are not supported temporarily, continue to use the old interface.** The input of new and old interfaces is different (one is row, the other is vbatch), but the output is the same (both are the original rowbatch of query layer)
   
   * Support more expr calculations, such as IN, FUNCTION and so on. **Finally implement all vectorization
   execution in scanner stage.** The output of scanner to the upper layer is still the old rowbatch, **but the internal layer and the storage layer is vectorized.**
   ![After rewite the scanner](https://upload-images.jianshu.io/upload_images/8552201-e1ba42199edb1a47.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
   
   
   #### Rewrite the scannode section second:
   
   * **Rewrite the interface between scan node and scanner to vectorization.** The ultimate goal is output of the upper layer of the sctopan node is the old rowbatch, but the output of the lower layer of the scan node (scanner and storage layer) is vectorized。
   
   So far, the changes of scan part and expr part are basically completed. We can rewrite the aggregate node now.
   ![After rewite the olap_scan_node](https://upload-images.jianshu.io/upload_images/8552201-213bb59042197868.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
   
   #### Rewrite the aggregatenode section finally:
   * Design and implement a new hash table structure that is compatible with the column storage model
   * Rewrite the computing model of the corresponding aggregation operator to support the vectorization
   
   * After all of that，**the final goal of the first phase is to vectorize the aggregate query of a single table.**
   If there is a join of multiple tables or a sort node, the old execution logic is still used. 
   
   
   ![After rewite the Aggregation_node](https://upload-images.jianshu.io/upload_images/8552201-54681e865bbd8c1c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
   
   Later, we can gradually extend the vectorization calculation to other execution nodes. **It's going to be a long-term plan.**
   
   #### RelationShip between Rowblock, Rowblockv2, Columnblock
   @kangkaisen 
   * **Rowblock is an old definition. At present, the storage layer and query layer still use rowblock interface to pass data.** The query layer accesses the data inside rowblock through rowcursor. Data in rowblock is stored
   by row.
   * **Rowblockv2 is a new definition which is vectorized. It contains multiple columnblocks.** Each columnblock is a column of data. Rowblockv2 is actually the newly defined vectorizedrowbatch.New segment V2 data is read in rowblockv2 format. The BetaRowsetReader will convert rowblockv2 to rowblock to returns the data.
   
   * The reader gets the next line at a time through the internal collect iterator. The collect iter will merge all rowsetreaders and return by row. **The merger operations here are based on rowcursor and rowblock.So it needs to be rewrite reader first.**
   
   * Now Doris has dup_key, agg_key and unique_key storage models. Dup_key does not need to do aggregation logic. **So we can first transform this part that directly deliver the corresponding data to the upper layer through columnvector rather than row_cursor.** At present, for agg key and unique key, it is necessary to perform data pre aggregation and other operationsby heap sort. It seems default to avoid the transformation of columns and rows.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] kangkaisen commented on issue #3438: [Proposal] Vectorization query optimization for Doris

Posted by GitBox <gi...@apache.org>.

kangkaisen commented on issue #3438:
URL: https://github.com/apache/incubator-doris/issues/3438#issuecomment-621314331


   @HappenLee I don't see the performance improve for your POC?
   
   What's your detailed design?
   
   Do you plan to still use `RowBlockV2` and `ColumnBlock`?
   
   Do you plan to still convert `RowBlockV2` to `RowBlock`?
   
   Why do we still use `VectorizedRowBatch`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] imay commented on issue #3438: [Proposal] Vectorization query optimization for Doris

Posted by GitBox <gi...@apache.org>.

imay commented on issue #3438:
URL: https://github.com/apache/incubator-doris/issues/3438#issuecomment-621314562


   @HappenLee 
   Vectorization is a good way to accelerate execution. However in my point of view this will change all the execution framework. So before real code work, we should discuss the work in more details. 
   Could you please explain your implementation more clearly?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org