You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "HappenLee (via GitHub)" <gi...@apache.org> on 2023/04/07 04:33:58 UTC

[GitHub] [doris] HappenLee opened a new pull request, #18457: [Pipeline](exec) Support shared scan in colo agg

HappenLee opened a new pull request, #18457:
URL: https://github.com/apache/doris/pull/18457

   # Proposed changes
   Before do the opt ckbench:
   21.8s
   
   After do the opt ckbench:
   18.9s
   
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1502162250

   run buildall
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1501593212

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #18457:
URL: https://github.com/apache/doris/pull/18457#discussion_r1162253923


##########
be/src/vec/exec/scan/pip_scanner_context.h:
##########
@@ -67,19 +69,57 @@ class PipScannerContext : public vectorized::ScannerContext {
         const int queue_size = _queue_mutexs.size();
         const int block_size = blocks.size();
         int64_t local_bytes = 0;
-        for (const auto& block : blocks) {
-            local_bytes += block->allocated_bytes();
-        }
 
-        for (int i = 0; i < queue_size && i < block_size; ++i) {
-            int queue = _next_queue_to_feed;
-            {
-                std::lock_guard<std::mutex> l(*_queue_mutexs[queue]);
-                for (int j = i; j < block_size; j += queue_size) {
-                    _blocks_queues[queue].emplace_back(std::move(blocks[j]));
+        if (_need_colocate_distribute) {
+            std::vector<uint64_t> hash_vals;
+            for (const auto& block : blocks) {
+                // vectorized calculate hash
+                int rows = block->rows();
+                const auto element_size = _max_queue_size;
+                hash_vals.resize(rows);
+                std::fill(hash_vals.begin(), hash_vals.end(), 0);
+                auto* __restrict hashes = hash_vals.data();
+
+                for (int j = 0; j < _col_distribute_ids.size(); ++j) {
+                    DCHECK_GT(block->columns(), _col_distribute_ids[j]) << "happen lee:" << print_id(_state->query_id());

Review Comment:
   warning: member access into incomplete type 'doris::RuntimeState' [clang-diagnostic-error]
   ```cpp
                       DCHECK_GT(block->columns(), _col_distribute_ids[j]) << "happen lee:" << print_id(_state->query_id());
                                                                                                              ^
   ```
   **be/src/udf/udf.h:41:** forward declaration of 'doris::RuntimeState'
   ```cpp
   class RuntimeState;
         ^
   ```
   



##########
be/src/vec/exec/scan/pip_scanner_context.h:
##########
@@ -67,19 +69,57 @@
         const int queue_size = _queue_mutexs.size();
         const int block_size = blocks.size();
         int64_t local_bytes = 0;
-        for (const auto& block : blocks) {
-            local_bytes += block->allocated_bytes();
-        }
 
-        for (int i = 0; i < queue_size && i < block_size; ++i) {
-            int queue = _next_queue_to_feed;
-            {
-                std::lock_guard<std::mutex> l(*_queue_mutexs[queue]);
-                for (int j = i; j < block_size; j += queue_size) {
-                    _blocks_queues[queue].emplace_back(std::move(blocks[j]));
+        if (_need_colocate_distribute) {
+            std::vector<uint64_t> hash_vals;
+            for (const auto& block : blocks) {
+                // vectorized calculate hash
+                int rows = block->rows();
+                const auto element_size = _max_queue_size;
+                hash_vals.resize(rows);
+                std::fill(hash_vals.begin(), hash_vals.end(), 0);
+                auto* __restrict hashes = hash_vals.data();
+
+                for (int j = 0; j < _col_distribute_ids.size(); ++j) {
+                    DCHECK_GT(block->columns(), _col_distribute_ids[j]) << "happen lee:" << print_id(_state->query_id());
+                    DCHECK_NE(block->get_by_position(_col_distribute_ids[j])
+                                      .column.get(), nullptr) << "happen lee:" << print_id(_state->query_id());

Review Comment:
   warning: member access into incomplete type 'doris::RuntimeState' [clang-diagnostic-error]
   ```cpp
                                         .column.get(), nullptr) << "happen lee:" << print_id(_state->query_id());
                                                                                                    ^
   ```
   **be/src/udf/udf.h:41:** forward declaration of 'doris::RuntimeState'
   ```cpp
   class RuntimeState;
         ^
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1502717512

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1501935982

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1501654658

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 33.03 seconds
    stream load tsv:          427 seconds loaded 74807831229 Bytes, about 167 MB/s
    stream load json:         22 seconds loaded 2358488459 Bytes, about 102 MB/s
    stream load orc:          71 seconds loaded 1101869774 Bytes, about 14 MB/s
    stream load parquet:          30 seconds loaded 861443392 Bytes, about 27 MB/s
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230410102641_clickbench_pr_126946.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1501779162

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1502680421

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1499924462

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1502620357

   run buildall
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #18457:
URL: https://github.com/apache/doris/pull/18457#discussion_r1160430219


##########
be/src/vec/exec/scan/pip_scanner_context.h:
##########
@@ -95,18 +132,86 @@ class PipScannerContext : public vectorized::ScannerContext {
             _queue_mutexs.emplace_back(new std::mutex);
             _blocks_queues.emplace_back(std::list<vectorized::BlockUPtr>());
         }
+        if (_need_colocate_distribute) {
+            int real_block_size =
+                    limit == -1 ? _batch_size : std::min(static_cast<int64_t>(_batch_size), limit);
+            int64_t free_blocks_memory_usage = 0;
+            for (int i = 0; i < _max_queue_size; ++i) {
+                auto block = std::make_unique<vectorized::Block>(_output_tuple_desc->slots(),
+                                                                 real_block_size,
+                                                                 true /*ignore invalid slots*/);
+                free_blocks_memory_usage += block->allocated_bytes();
+                _colocate_mutable_blocks.emplace_back(new vectorized::MutableBlock(block.get()));
+                _colocate_blocks.emplace_back(std::move(block));
+                _colocate_block_mutexs.emplace_back(new std::mutex);
+            }
+            _free_blocks_memory_usage->add(free_blocks_memory_usage);
+        }
     }
 
     bool has_enough_space_in_blocks_queue() const override {
         return _current_used_bytes < _max_bytes_in_queue / 2 * _max_queue_size;
     }
 
+    virtual void _dispose_coloate_blocks_not_in_queue() override {

Review Comment:
   warning: 'virtual' is redundant since the function is already declared 'override' [modernize-use-override]
   
   ```suggestion
       void _dispose_coloate_blocks_not_in_queue() override {
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1502683876

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1501935951

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1503003466

   run buildall
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on PR #18457:
URL: https://github.com/apache/doris/pull/18457#issuecomment-1501609380

   run buildall
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 merged pull request #18457: [Pipeline](exec) Support shared scan in colo agg

Posted by "Gabriel39 (via GitHub)" <gi...@apache.org>.
Gabriel39 merged PR #18457:
URL: https://github.com/apache/doris/pull/18457


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org