You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/04/18 15:10:48 UTC

[GitHub] [incubator-doris] Gabriel39 opened a new pull request, #9088: [Feature-wip] support vectorized compaction

Gabriel39 opened a new pull request, #9088:
URL: https://github.com/apache/incubator-doris/pull/9088

   # Proposed changes
   
   Issue Number: close #8445 
   Btw, this PR is a follow-up for PR [8438](https://github.com/apache/incubator-doris/pull/8438)
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on pull request #9088: [Feature-wip] support vectorized compaction

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on PR #9088:
URL: https://github.com/apache/incubator-doris/pull/9088#issuecomment-1101483744

   I will attach a performance report later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9088: [Feature] support vectorized compaction

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9088:
URL: https://github.com/apache/incubator-doris/pull/9088#issuecomment-1109312958

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 closed pull request #9088: [Feature] support vectorized compaction

Posted by GitBox <gi...@apache.org>.
Gabriel39 closed pull request #9088: [Feature] support vectorized compaction
URL: https://github.com/apache/incubator-doris/pull/9088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on pull request #9088: [Feature] support vectorized compaction

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on PR #9088:
URL: https://github.com/apache/incubator-doris/pull/9088#issuecomment-1104770352

   cc @morningman @HappenLee @yiguolei 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on pull request #9088: [Feature] support vectorized compaction

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on PR #9088:
URL: https://github.com/apache/incubator-doris/pull/9088#issuecomment-1104767917

   I finish the performance test and add the progress and performance report in my own repo. Pls refer to https://github.com/Gabriel39/doris_compaction_perf/tree/main/report .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9088: [Feature] support vectorized compaction

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9088:
URL: https://github.com/apache/incubator-doris/pull/9088#discussion_r857743923


##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -99,6 +99,71 @@ Status BetaRowsetWriter::init(const RowsetWriterContext& rowset_writer_context)
     return Status::OK();
 }
 
+Status BetaRowsetWriter::add_block(const vectorized::Block* block) {
+    if (UNLIKELY(_segment_writer == nullptr)) {
+        RETURN_NOT_OK(_create_segment_writer(&_segment_writer));
+    }
+    size_t block_size_in_bytes = block->bytes();
+    size_t block_row_num = block->rows();
+    if (UNLIKELY(block_row_num == 0)) {
+        return Status::OK();
+    }
+    size_t row_avg_size_in_bytes = std::max((size_t)1, block_size_in_bytes / block_row_num);
+    size_t row_offset = 0;
+    int64_t segment_capacity_in_bytes = 0;
+    int64_t segment_capacity_in_rows = 0;
+    auto refresh_segment_capacity = [&]() {
+        segment_capacity_in_bytes =
+                (int64_t)MAX_SEGMENT_SIZE - (int64_t)_segment_writer->estimate_segment_size();
+        segment_capacity_in_rows = (int64_t)_context.max_rows_per_segment -
+                                   (int64_t)_segment_writer->num_rows_written();
+    };
+
+    refresh_segment_capacity();
+    if (UNLIKELY(segment_capacity_in_bytes <= 0 || segment_capacity_in_rows <= 0)) {
+        // no space for another single row, need flush now
+        RETURN_NOT_OK(_flush_segment_writer(&_segment_writer));
+        RETURN_NOT_OK(_create_segment_writer(&_segment_writer));
+        refresh_segment_capacity();
+    }
+
+    if (block_size_in_bytes > segment_capacity_in_bytes ||
+        block_row_num > segment_capacity_in_rows) {
+        size_t segment_max_row_num;
+        size_t input_row_num;
+        do {
+            assert(row_offset < block_row_num);
+            segment_max_row_num =
+                    std::max(std::min((size_t)segment_capacity_in_bytes / row_avg_size_in_bytes,
+                             (size_t)segment_capacity_in_rows), (size_t)1);
+            input_row_num = std::min(segment_max_row_num, block_row_num - row_offset);
+            assert(input_row_num > 0);

Review Comment:
   ```suggestion
               CHECK(input_row_num > 0);
   ```



##########
be/src/olap/rowset/segment_v2/segment_writer.h:
##########
@@ -83,6 +88,9 @@ class SegmentWriter {
     Status _write_footer();
     Status _write_raw_data(const std::vector<Slice>& slices);
 
+    std::string encode_short_keys(const std::vector<const void*> key_column_fields,

Review Comment:
   ```suggestion
       std::string encode_short_keys(const std::vector<const void*>& key_column_fields,
   ```



##########
be/src/olap/rowset/segment_v2/segment_writer.h:
##########
@@ -97,6 +105,11 @@ class SegmentWriter {
     std::vector<std::unique_ptr<ColumnWriter>> _column_writers;
     std::shared_ptr<MemTracker> _mem_tracker;
     uint32_t _row_count = 0;
+
+    vectorized::OlapBlockDataConvertor _olap_data_convertor;
+    std::vector< const KeyCoder* > _short_key_coders;

Review Comment:
   ```suggestion
       std::vector<const KeyCoder*> _short_key_coders;
   ```



##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -99,6 +99,71 @@ Status BetaRowsetWriter::init(const RowsetWriterContext& rowset_writer_context)
     return Status::OK();
 }
 
+Status BetaRowsetWriter::add_block(const vectorized::Block* block) {
+    if (UNLIKELY(_segment_writer == nullptr)) {
+        RETURN_NOT_OK(_create_segment_writer(&_segment_writer));
+    }
+    size_t block_size_in_bytes = block->bytes();
+    size_t block_row_num = block->rows();
+    if (UNLIKELY(block_row_num == 0)) {
+        return Status::OK();
+    }
+    size_t row_avg_size_in_bytes = std::max((size_t)1, block_size_in_bytes / block_row_num);
+    size_t row_offset = 0;
+    int64_t segment_capacity_in_bytes = 0;
+    int64_t segment_capacity_in_rows = 0;
+    auto refresh_segment_capacity = [&]() {
+        segment_capacity_in_bytes =
+                (int64_t)MAX_SEGMENT_SIZE - (int64_t)_segment_writer->estimate_segment_size();
+        segment_capacity_in_rows = (int64_t)_context.max_rows_per_segment -
+                                   (int64_t)_segment_writer->num_rows_written();
+    };
+
+    refresh_segment_capacity();
+    if (UNLIKELY(segment_capacity_in_bytes <= 0 || segment_capacity_in_rows <= 0)) {
+        // no space for another single row, need flush now
+        RETURN_NOT_OK(_flush_segment_writer(&_segment_writer));
+        RETURN_NOT_OK(_create_segment_writer(&_segment_writer));
+        refresh_segment_capacity();
+    }
+
+    if (block_size_in_bytes > segment_capacity_in_bytes ||
+        block_row_num > segment_capacity_in_rows) {
+        size_t segment_max_row_num;
+        size_t input_row_num;
+        do {
+            assert(row_offset < block_row_num);

Review Comment:
   ```suggestion
               CHECK(row_offset < block_row_num);
   ```



##########
be/src/olap/rowset/beta_rowset_writer.cpp:
##########
@@ -99,6 +99,71 @@ Status BetaRowsetWriter::init(const RowsetWriterContext& rowset_writer_context)
     return Status::OK();
 }
 
+Status BetaRowsetWriter::add_block(const vectorized::Block* block) {
+    if (UNLIKELY(_segment_writer == nullptr)) {
+        RETURN_NOT_OK(_create_segment_writer(&_segment_writer));
+    }
+    size_t block_size_in_bytes = block->bytes();
+    size_t block_row_num = block->rows();
+    if (UNLIKELY(block_row_num == 0)) {
+        return Status::OK();
+    }
+    size_t row_avg_size_in_bytes = std::max((size_t)1, block_size_in_bytes / block_row_num);
+    size_t row_offset = 0;
+    int64_t segment_capacity_in_bytes = 0;
+    int64_t segment_capacity_in_rows = 0;
+    auto refresh_segment_capacity = [&]() {
+        segment_capacity_in_bytes =
+                (int64_t)MAX_SEGMENT_SIZE - (int64_t)_segment_writer->estimate_segment_size();
+        segment_capacity_in_rows = (int64_t)_context.max_rows_per_segment -
+                                   (int64_t)_segment_writer->num_rows_written();
+    };
+
+    refresh_segment_capacity();
+    if (UNLIKELY(segment_capacity_in_bytes <= 0 || segment_capacity_in_rows <= 0)) {
+        // no space for another single row, need flush now
+        RETURN_NOT_OK(_flush_segment_writer(&_segment_writer));
+        RETURN_NOT_OK(_create_segment_writer(&_segment_writer));
+        refresh_segment_capacity();
+    }
+
+    if (block_size_in_bytes > segment_capacity_in_bytes ||
+        block_row_num > segment_capacity_in_rows) {
+        size_t segment_max_row_num;
+        size_t input_row_num;
+        do {
+            assert(row_offset < block_row_num);

Review Comment:
   Use CHECK instead of assert, same as others



##########
be/src/olap/rowset/segment_v2/segment_writer.h:
##########
@@ -97,6 +105,11 @@ class SegmentWriter {
     std::vector<std::unique_ptr<ColumnWriter>> _column_writers;
     std::shared_ptr<MemTracker> _mem_tracker;
     uint32_t _row_count = 0;
+
+    vectorized::OlapBlockDataConvertor _olap_data_convertor;
+    std::vector< const KeyCoder* > _short_key_coders;
+    std::vector< uint16_t > _short_key_index_size;

Review Comment:
   ```suggestion
       std::vector<uint16_t> _short_key_index_size;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9088: [Feature] support vectorized compaction

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9088:
URL: https://github.com/apache/incubator-doris/pull/9088#issuecomment-1109312977

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org