You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "jacktengg (via GitHub)" <gi...@apache.org> on 2023/01/28 11:06:49 UTC

[GitHub] [doris] jacktengg opened a new pull request, #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

jacktengg opened a new pull request, #16166:
URL: https://github.com/apache/doris/pull/16166

   # Proposed changes
   
   Issue Number: close #16165
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei merged PR #16166:
URL: https://github.com/apache/doris/pull/16166


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on PR #16166:
URL: https://github.com/apache/doris/pull/16166#issuecomment-1408273677

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on PR #16166:
URL: https://github.com/apache/doris/pull/16166#issuecomment-1409939967

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on PR #16166:
URL: https://github.com/apache/doris/pull/16166#issuecomment-1411602341

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #16166:
URL: https://github.com/apache/doris/pull/16166#issuecomment-1407389152

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 35.44 seconds
    load time: 525 seconds
    storage size: 17122289679 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230128123023_clickbench_pr_85639.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on code in PR #16166:
URL: https://github.com/apache/doris/pull/16166#discussion_r1089878994


##########
be/src/vec/exec/join/process_hash_table_probe_impl.h:
##########
@@ -409,99 +461,154 @@ Status ProcessHashTableProbe<JoinOpType>::do_process_with_other_join_conjuncts(
 
         bool all_match_one = true;
         int last_probe_index = probe_index;
-        while (probe_index < probe_rows) {
-            // ignore null rows
-            if constexpr (ignore_null && need_null_map_for_probe) {
-                if ((*null_map)[probe_index]) {
-                    if constexpr (probe_all) {
-                        _items_counts[probe_index++] = (uint32_t)1;
-                        same_to_prev.emplace_back(false);
-                        visited_map.emplace_back(nullptr);
-                        // only full outer / left outer need insert the data of right table
+
+        size_t probe_size = 0;
+        auto& probe_row_match_iter = hash_table_ctx.probe_row_match_iter;
+        if (probe_row_match_iter.ok()) {
+            auto origin_offset = current_offset;
+            while (probe_row_match_iter.ok()) {
+                if (LIKELY(current_offset < _build_block_rows.size())) {
+                    _build_block_offsets[current_offset] = probe_row_match_iter->block_offset;
+                    _build_block_rows[current_offset] = probe_row_match_iter->row_num;
+                } else {
+                    _build_block_offsets.emplace_back(probe_row_match_iter->block_offset);
+                    _build_block_rows.emplace_back(probe_row_match_iter->row_num);
+                }
+                visited_map.emplace_back(&probe_row_match_iter->visited);
+                ++probe_row_match_iter;
+                if (++current_offset >= _batch_size) {
+                    break;
+                }
+            }
+            same_to_prev.emplace_back(false);
+            for (int i = 0; i < current_offset - origin_offset - 1; ++i) {
+                same_to_prev.emplace_back(true);
+            }
+
+            all_match_one &= (current_offset == 1);
+            _items_counts[probe_index] = current_offset;
+            if (!probe_row_match_iter.ok()) {
+                ++probe_index;
+            }
+            probe_size = 1;
+        }
+        if (current_offset < _batch_size) {
+            while (probe_index < probe_rows) {
+                // ignore null rows
+                if constexpr (ignore_null && need_null_map_for_probe) {
+                    if ((*null_map)[probe_index]) {
+                        if constexpr (probe_all) {
+                            _items_counts[probe_index++] = (uint32_t)1;
+                            same_to_prev.emplace_back(false);
+                            visited_map.emplace_back(nullptr);
+                            // only full outer / left outer need insert the data of right table
+                            if (LIKELY(current_offset < _build_block_rows.size())) {
+                                _build_block_offsets[current_offset] = -1;
+                                _build_block_rows[current_offset] = -1;
+                            } else {
+                                _build_block_offsets.emplace_back(-1);
+                                _build_block_rows.emplace_back(-1);
+                            }
+                            ++current_offset;
+                        } else {
+                            _items_counts[probe_index++] = (uint32_t)0;
+                        }
+                        all_match_one = false;
+                        if constexpr (probe_all) {
+                            if (current_offset >= _batch_size) {
+                                break;
+                            }
+                        }
+                        continue;
+                    }
+                }
+
+                auto last_offset = current_offset;
+                auto find_result = !need_null_map_for_probe
+                                           ? key_getter.find_key(hash_table_ctx.hash_table,
+                                                                 probe_index, *_arena)
+                                   : (*null_map)[probe_index]
+                                           ? decltype(key_getter.find_key(hash_table_ctx.hash_table,
+                                                                          probe_index,
+                                                                          *_arena)) {nullptr, false}
+                                           : key_getter.find_key(hash_table_ctx.hash_table,
+                                                                 probe_index, *_arena);
+                if (probe_index + PREFETCH_STEP < probe_rows)

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
                   if (probe_index + PREFETCH_STEP < probe_rows) {
   ```
   
   be/src/vec/exec/join/process_hash_table_probe_impl.h:537:
   ```diff
   -                                                        probe_index + PREFETCH_STEP, *_arena);
   +                                                        probe_index + PREFETCH_STEP, *_arena);
   + }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on PR #16166:
URL: https://github.com/apache/doris/pull/16166#issuecomment-1411327884

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on PR #16166:
URL: https://github.com/apache/doris/pull/16166#issuecomment-1408097232

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on code in PR #16166:
URL: https://github.com/apache/doris/pull/16166#discussion_r1089717648


##########
be/src/vec/exec/join/process_hash_table_probe_impl.h:
##########
@@ -208,123 +207,180 @@ Status ProcessHashTableProbe<JoinOpType>::do_process(HashTableType& hash_table_c
 
     bool all_match_one = true;
     int last_probe_index = probe_index;
+    size_t probe_size = 0;
+    auto& probe_row_match_iter = hash_table_ctx.probe_row_match_iter;
     {
         SCOPED_TIMER(_search_hashtable_timer);
-        while (probe_index < probe_rows) {
-            if constexpr (ignore_null && need_null_map_for_probe) {
-                if ((*null_map)[probe_index]) {
-                    if constexpr (probe_all) {
-                        _items_counts[probe_index++] = (uint32_t)1;
-                        // only full outer / left outer need insert the data of right table
-                        if (LIKELY(current_offset < _build_block_rows.size())) {
-                            _build_block_offsets[current_offset] = -1;
-                            _build_block_rows[current_offset] = -1;
-                        } else {
-                            _build_block_offsets.emplace_back(-1);
-                            _build_block_rows.emplace_back(-1);
-                        }
-                        ++current_offset;
+        if constexpr (!is_right_semi_anti_join) {
+            if (probe_row_match_iter.ok()) {
+                for (; probe_row_match_iter.ok(); ++probe_row_match_iter) {
+                    if (LIKELY(current_offset < _build_block_rows.size())) {
+                        _build_block_offsets[current_offset] = probe_row_match_iter->block_offset;
+                        _build_block_rows[current_offset] = probe_row_match_iter->row_num;
                     } else {
-                        _items_counts[probe_index++] = (uint32_t)0;
+                        _build_block_offsets.emplace_back(probe_row_match_iter->block_offset);
+                        _build_block_rows.emplace_back(probe_row_match_iter->row_num);
                     }
-                    all_match_one = false;
-                    continue;
-                }
-            }
-            int last_offset = current_offset;
-            auto find_result =
-                    !need_null_map_for_probe
-                            ? key_getter.find_key(hash_table_ctx.hash_table, probe_index, *_arena)
-                    : (*null_map)[probe_index]
-                            ? decltype(key_getter.find_key(hash_table_ctx.hash_table, probe_index,
-                                                           *_arena)) {nullptr, false}
-                            : key_getter.find_key(hash_table_ctx.hash_table, probe_index, *_arena);
-            if (probe_index + PREFETCH_STEP < probe_rows)
-                key_getter.template prefetch<true>(hash_table_ctx.hash_table,
-                                                   probe_index + PREFETCH_STEP, *_arena);
-
-            if constexpr (JoinOpType == TJoinOp::LEFT_ANTI_JOIN ||
-                          JoinOpType == TJoinOp::NULL_AWARE_LEFT_ANTI_JOIN) {
-                if (is_mark_join) {
-                    ++current_offset;
-                    assert_cast<doris::vectorized::ColumnVector<UInt8>&>(*mcol[mcol.size() - 1])
-                            .get_data()
-                            .template push_back(!find_result.is_found());
-                } else {
-                    if (!find_result.is_found()) {
-                        ++current_offset;
+                    if (++current_offset >= _batch_size) {
+                        break;
                     }
                 }
-            } else if constexpr (JoinOpType == TJoinOp::LEFT_SEMI_JOIN) {
-                if (is_mark_join) {
-                    ++current_offset;
-                    assert_cast<doris::vectorized::ColumnVector<UInt8>&>(*mcol[mcol.size() - 1])
-                            .get_data()
-                            .template push_back(find_result.is_found());
-                } else {
-                    if (find_result.is_found()) {
-                        ++current_offset;
-                    }
+                all_match_one &= (current_offset == 1);
+                _items_counts[probe_index] = current_offset;
+                if (!probe_row_match_iter.ok()) {
+                    ++probe_index;
                 }
-            } else {
-                DCHECK(!is_mark_join);
-                if (find_result.is_found()) {
-                    auto& mapped = find_result.get_mapped();
-                    // TODO: Iterators are currently considered to be a heavy operation and have a certain impact on performance.
-                    // We should rethink whether to use this iterator mode in the future. Now just opt the one row case
-                    if (mapped.get_row_count() == 1) {
-                        if constexpr (std::is_same_v<Mapped, RowRefListWithFlag>) {
-                            mapped.visited = true;
-                        }
+                probe_size = 1;
+            }
+        }
 
-                        if constexpr (!is_right_semi_anti_join) {
+        if (current_offset < _batch_size) {
+            bool more_matches_for_current_probe_row = false;
+            while (probe_index < probe_rows) {
+                if constexpr (ignore_null && need_null_map_for_probe) {
+                    if ((*null_map)[probe_index]) {
+                        if constexpr (probe_all) {
+                            _items_counts[probe_index++] = (uint32_t)1;
+                            // only full outer / left outer need insert the data of right table
                             if (LIKELY(current_offset < _build_block_rows.size())) {
-                                _build_block_offsets[current_offset] = mapped.block_offset;
-                                _build_block_rows[current_offset] = mapped.row_num;
+                                _build_block_offsets[current_offset] = -1;
+                                _build_block_rows[current_offset] = -1;
                             } else {
-                                _build_block_offsets.emplace_back(mapped.block_offset);
-                                _build_block_rows.emplace_back(mapped.row_num);
+                                _build_block_offsets.emplace_back(-1);
+                                _build_block_rows.emplace_back(-1);
+                            }
+                            ++current_offset;
+                        } else {
+                            _items_counts[probe_index++] = (uint32_t)0;
+                        }
+                        all_match_one = false;
+                        if constexpr (probe_all) {
+                            if (current_offset >= _batch_size) {
+                                break;
                             }
+                        }
+                        continue;
+                    }
+                }
+                int last_offset = current_offset;
+                auto find_result = !need_null_map_for_probe
+                                           ? key_getter.find_key(hash_table_ctx.hash_table,
+                                                                 probe_index, *_arena)
+                                   : (*null_map)[probe_index]
+                                           ? decltype(key_getter.find_key(hash_table_ctx.hash_table,
+                                                                          probe_index,
+                                                                          *_arena)) {nullptr, false}
+                                           : key_getter.find_key(hash_table_ctx.hash_table,
+                                                                 probe_index, *_arena);
+                if (probe_index + PREFETCH_STEP < probe_rows)

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
                   if (probe_index + PREFETCH_STEP < probe_rows) {
   ```
   
   be/src/vec/exec/join/process_hash_table_probe_impl.h:276:
   ```diff
   -                                                        probe_index + PREFETCH_STEP, *_arena);
   +                                                        probe_index + PREFETCH_STEP, *_arena);
   + }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on code in PR #16166:
URL: https://github.com/apache/doris/pull/16166#discussion_r1090130844


##########
be/src/vec/exec/join/process_hash_table_probe_impl.h:
##########
@@ -208,123 +207,179 @@ Status ProcessHashTableProbe<JoinOpType>::do_process(HashTableType& hash_table_c
 
     bool all_match_one = true;
     int last_probe_index = probe_index;
+    size_t probe_size = 0;
+    auto& probe_row_match_iter = hash_table_ctx.probe_row_match_iter;
     {
         SCOPED_TIMER(_search_hashtable_timer);
-        while (probe_index < probe_rows) {
-            if constexpr (ignore_null && need_null_map_for_probe) {
-                if ((*null_map)[probe_index]) {
-                    if constexpr (probe_all) {
-                        _items_counts[probe_index++] = (uint32_t)1;
-                        // only full outer / left outer need insert the data of right table
-                        if (LIKELY(current_offset < _build_block_rows.size())) {
-                            _build_block_offsets[current_offset] = -1;
-                            _build_block_rows[current_offset] = -1;
-                        } else {
-                            _build_block_offsets.emplace_back(-1);
-                            _build_block_rows.emplace_back(-1);
-                        }
-                        ++current_offset;
+        if constexpr (!is_right_semi_anti_join) {
+            // handle ramaining matched rows from last probe row
+            if (probe_row_match_iter.ok()) {
+                for (; probe_row_match_iter.ok() && current_offset < _batch_size;
+                     ++probe_row_match_iter) {
+                    if (LIKELY(current_offset < _build_block_rows.size())) {
+                        _build_block_offsets[current_offset] = probe_row_match_iter->block_offset;
+                        _build_block_rows[current_offset] = probe_row_match_iter->row_num;
                     } else {
-                        _items_counts[probe_index++] = (uint32_t)0;
+                        _build_block_offsets.emplace_back(probe_row_match_iter->block_offset);
+                        _build_block_rows.emplace_back(probe_row_match_iter->row_num);
                     }
-                    all_match_one = false;
-                    continue;
-                }
-            }
-            int last_offset = current_offset;
-            auto find_result =
-                    !need_null_map_for_probe
-                            ? key_getter.find_key(hash_table_ctx.hash_table, probe_index, *_arena)
-                    : (*null_map)[probe_index]
-                            ? decltype(key_getter.find_key(hash_table_ctx.hash_table, probe_index,
-                                                           *_arena)) {nullptr, false}
-                            : key_getter.find_key(hash_table_ctx.hash_table, probe_index, *_arena);
-            if (probe_index + PREFETCH_STEP < probe_rows)
-                key_getter.template prefetch<true>(hash_table_ctx.hash_table,
-                                                   probe_index + PREFETCH_STEP, *_arena);
-
-            if constexpr (JoinOpType == TJoinOp::LEFT_ANTI_JOIN ||
-                          JoinOpType == TJoinOp::NULL_AWARE_LEFT_ANTI_JOIN) {
-                if (is_mark_join) {
                     ++current_offset;
-                    assert_cast<doris::vectorized::ColumnVector<UInt8>&>(*mcol[mcol.size() - 1])
-                            .get_data()
-                            .template push_back(!find_result.is_found());
-                } else {
-                    if (!find_result.is_found()) {
-                        ++current_offset;
-                    }
                 }
-            } else if constexpr (JoinOpType == TJoinOp::LEFT_SEMI_JOIN) {
-                if (is_mark_join) {
-                    ++current_offset;
-                    assert_cast<doris::vectorized::ColumnVector<UInt8>&>(*mcol[mcol.size() - 1])
-                            .get_data()
-                            .template push_back(find_result.is_found());
-                } else {
-                    if (find_result.is_found()) {
-                        ++current_offset;
-                    }
+                all_match_one &= (current_offset == 1);
+                _items_counts[probe_index] = current_offset;
+                if (!probe_row_match_iter.ok()) {
+                    ++probe_index;
                 }
-            } else {
-                DCHECK(!is_mark_join);
-                if (find_result.is_found()) {
-                    auto& mapped = find_result.get_mapped();
-                    // TODO: Iterators are currently considered to be a heavy operation and have a certain impact on performance.
-                    // We should rethink whether to use this iterator mode in the future. Now just opt the one row case
-                    if (mapped.get_row_count() == 1) {
-                        if constexpr (std::is_same_v<Mapped, RowRefListWithFlag>) {
-                            mapped.visited = true;
-                        }
+                probe_size = 1;
+            }
+        }
 
-                        if constexpr (!is_right_semi_anti_join) {
+        if (current_offset < _batch_size) {
+            while (probe_index < probe_rows) {
+                if constexpr (ignore_null && need_null_map_for_probe) {
+                    if ((*null_map)[probe_index]) {
+                        if constexpr (probe_all) {
+                            _items_counts[probe_index++] = (uint32_t)1;
+                            // only full outer / left outer need insert the data of right table
                             if (LIKELY(current_offset < _build_block_rows.size())) {
-                                _build_block_offsets[current_offset] = mapped.block_offset;
-                                _build_block_rows[current_offset] = mapped.row_num;
+                                _build_block_offsets[current_offset] = -1;
+                                _build_block_rows[current_offset] = -1;
                             } else {
-                                _build_block_offsets.emplace_back(mapped.block_offset);
-                                _build_block_rows.emplace_back(mapped.row_num);
+                                _build_block_offsets.emplace_back(-1);
+                                _build_block_rows.emplace_back(-1);
                             }
                             ++current_offset;
+                        } else {
+                            _items_counts[probe_index++] = (uint32_t)0;
                         }
+                        all_match_one = false;
+                        if constexpr (probe_all) {
+                            if (current_offset >= _batch_size) {
+                                break;
+                            }
+                        }
+                        continue;
+                    }
+                }
+                int last_offset = current_offset;
+                auto find_result = !need_null_map_for_probe
+                                           ? key_getter.find_key(hash_table_ctx.hash_table,
+                                                                 probe_index, *_arena)
+                                   : (*null_map)[probe_index]
+                                           ? decltype(key_getter.find_key(hash_table_ctx.hash_table,
+                                                                          probe_index,
+                                                                          *_arena)) {nullptr, false}
+                                           : key_getter.find_key(hash_table_ctx.hash_table,
+                                                                 probe_index, *_arena);
+                if (probe_index + PREFETCH_STEP < probe_rows)

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
                   if (probe_index + PREFETCH_STEP < probe_rows) {
   ```
   
   be/src/vec/exec/join/process_hash_table_probe_impl.h:275:
   ```diff
   -                                                        probe_index + PREFETCH_STEP, *_arena);
   +                                                        probe_index + PREFETCH_STEP, *_arena);
   + }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #16166: [fix](hashjoin) join produce blocks with rows larger than batch size

Posted by github-actions.
github-actions[bot] commented on PR #16166:
URL: https://github.com/apache/doris/pull/16166#issuecomment-1408406701

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org