You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/18 08:30:50 UTC

[GitHub] [incubator-doris] mrhhsg opened a new pull request, #9663: [Enhancement]Add prefetch join node

mrhhsg opened a new pull request, #9663:
URL: https://github.com/apache/incubator-doris/pull/9663

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9663:
URL: https://github.com/apache/doris/pull/9663#issuecomment-1325788689

   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and feel free a maintainer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9663:
URL: https://github.com/apache/incubator-doris/pull/9663#discussion_r881422957


##########
be/src/vec/exec/join/vhash_join_node.cpp:
##########
@@ -77,18 +81,44 @@ struct ProcessHashTableBuild {
             inserted_rows.reserve(_batch_size);
         }
 
+        std::vector<size_t> cached_hash_values(CACHE_NUM);
+        std::vector<KeyHolderType> cached_key_holders(CACHE_NUM);
+
+        size_t cached_count = 0;
         for (size_t k = 0; k < _rows; ++k) {
+            if (UNLIKELY(k % CACHE_NUM == 0)) {
+                cached_count = std::min(CACHE_NUM, _rows - k);
+                for (size_t i = k; i < _rows && i < (k + CACHE_NUM); ++i) {

Review Comment:
   for (size_t i = 0; i < cache_count; ++i) {
                       if constexpr (ignore_null) {
                   if ((*null_map)[i+k]) {
                           if ((*null_map)[i+k]) {
                               continue;
                           }
                       }
                       cached_key_holders[i ] =
                               key_getter.get_key_holder(i + k, _join_node->_arena);
                       cached_hash_values[i ] = key_getter.get_hash(
                               hash_table_ctx.hash_table, cached_key_holders[i ]);
                   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 closed pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
Gabriel39 closed pull request #9663: [Enhancement]Add prefetch in join node
URL: https://github.com/apache/doris/pull/9663


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9663:
URL: https://github.com/apache/incubator-doris/pull/9663#discussion_r881422957


##########
be/src/vec/exec/join/vhash_join_node.cpp:
##########
@@ -77,18 +81,44 @@ struct ProcessHashTableBuild {
             inserted_rows.reserve(_batch_size);
         }
 
+        std::vector<size_t> cached_hash_values(CACHE_NUM);
+        std::vector<KeyHolderType> cached_key_holders(CACHE_NUM);
+
+        size_t cached_count = 0;
         for (size_t k = 0; k < _rows; ++k) {
+            if (UNLIKELY(k % CACHE_NUM == 0)) {
+                cached_count = std::min(CACHE_NUM, _rows - k);
+                for (size_t i = k; i < _rows && i < (k + CACHE_NUM); ++i) {

Review Comment:
   for (size_t i = 0; i < cache_count; ++i) {
                       if constexpr (ignore_null) {
                   if ((*null_map)[i+k]) {
                           if ((*null_map)[i+k]) {
                               continue;
                           }
                       }
   
                       cached_key_holders[i ] =
                               key_getter.get_key_holder(i + k, _join_node->_arena);
                       cached_hash_values[i ] = key_getter.get_hash(
                               hash_table_ctx.hash_table, cached_key_holders[i ]);
                   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9663:
URL: https://github.com/apache/incubator-doris/pull/9663#discussion_r881422957


##########
be/src/vec/exec/join/vhash_join_node.cpp:
##########
@@ -77,18 +81,44 @@ struct ProcessHashTableBuild {
             inserted_rows.reserve(_batch_size);
         }
 
+        std::vector<size_t> cached_hash_values(CACHE_NUM);
+        std::vector<KeyHolderType> cached_key_holders(CACHE_NUM);
+
+        size_t cached_count = 0;
         for (size_t k = 0; k < _rows; ++k) {
+            if (UNLIKELY(k % CACHE_NUM == 0)) {
+                cached_count = std::min(CACHE_NUM, _rows - k);
+                for (size_t i = k; i < _rows && i < (k + CACHE_NUM); ++i) {

Review Comment:
   for (size_t i = 0; i < cache_count; ++i) {
                       if constexpr (ignore_null) {
                   if ((*null_map)[i+k]) {
                           if ((*null_map)[i+k]) {
                               continue;
                           }
                       }
                       cached_key_holders[i ] =
                               key_getter.get_key_holder(i + k, _join_node->_arena);
                       cached_hash_values[i ] = key_getter.get_hash(
                               hash_table_ctx.hash_table, cached_key_holders[i ]);
                   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on a diff in pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #9663:
URL: https://github.com/apache/incubator-doris/pull/9663#discussion_r877852699


##########
be/src/vec/common/columns_hashing_impl.h:
##########
@@ -132,12 +132,25 @@ class HashMethodBase {
         return emplaceImpl(key_holder, data);
     }
 
+    template <typename Data>
+    ALWAYS_INLINE EmplaceResult emplace_key(Data& data, size_t row, size_t hash_value,
+                                            Arena& pool) {
+        auto key_holder = static_cast<Derived&>(*this).get_key_holder(row, pool);

Review Comment:
   `get_key_holder` will cause memory copy for join keys. But this copy happens twice which will hurt performance. First we copy the keys to compute hash value for it and then emplace_key here which also copies keys. We should simplify this behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9663:
URL: https://github.com/apache/incubator-doris/pull/9663#discussion_r881047186


##########
be/src/vec/common/hash_table/hash_table_key_holder.h:
##########
@@ -120,7 +120,7 @@ namespace doris::vectorized {
   */
 struct SerializedKeyHolder {
     StringRef key;
-    Arena& pool;
+    Arena* pool;

Review Comment:
   why do you change this and I find you change a lot of related code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] mrhhsg commented on a diff in pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
mrhhsg commented on code in PR #9663:
URL: https://github.com/apache/incubator-doris/pull/9663#discussion_r881109495


##########
be/src/vec/common/hash_table/hash_table_key_holder.h:
##########
@@ -120,7 +120,7 @@ namespace doris::vectorized {
   */
 struct SerializedKeyHolder {
     StringRef key;
-    Arena& pool;
+    Arena* pool;

Review Comment:
   To avoid calling `get_key_holder` for each row twice, `KeyHolder` should be cached in a vector, but `SerializedKeyHolder` contains reference of Arena cannot be copied into the vector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on a diff in pull request #9663: [Enhancement]Add prefetch in join node

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #9663:
URL: https://github.com/apache/incubator-doris/pull/9663#discussion_r877799221


##########
be/src/vec/exec/join/vhash_join_node.cpp:
##########
@@ -26,6 +26,8 @@
 #include "vec/exprs/vexpr_context.h"
 #include "vec/utils/util.hpp"
 
+#define PREFETCH_COUNT (size_t(32 * 1024) / CACHE_LINE_SIZE)

Review Comment:
   To achieve best performance, we should take more information about cache size.
   For example, l1d cache size is 64KB in AMD Epyc server processor[1] while 48KB in Intel Icelake one[2]. 
   
   [1] https://www.club386.com/amd-unleashes-epyc-7003-series-with-3d-v-cache
   [2] https://www.techpowerup.com/248825/intel-increases-l1d-and-l2-cache-sizes-with-ice-lake



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org