You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/07 06:22:44 UTC

[GitHub] [incubator-doris] mrhhsg opened a new pull request, #9432: [Feature]Use multi hash tables to aggregate

mrhhsg opened a new pull request, #9432:
URL: https://github.com/apache/incubator-doris/pull/9432

   # Proposed changes
   
   Issue Number: close #9428
   
   ## Problem Summary:
   
   
   SSB   Flat 100G
   |query | 多哈希表 | 单哈希表|
   |--|--|--|
   |q1.1 | 94 | 121|
   |q1.2 | 87 | 110|
   |q.13 | 101 | 117|
   |q2.1 | 431 | 535|
   |q2.2 | 409 | 449|
   |q2.3 | 299 | 331|
   |q3.1 | 518 | 599|
   |q3.2 | 357 | 406|
   |q3.3 | 270 | 294|
   |q3.4 | 52 | 62|
   |q4.1 | 559 | 645|
   |q4.2 | 212 | 241|
   |q4.3 | 168 | 188|
   |Total | 3557 | 4098|
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9432: [Feature]Use multi hash tables to aggregate

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9432:
URL: https://github.com/apache/incubator-doris/pull/9432#discussion_r867598416


##########
be/src/vec/common/hash_table/hash_table_proxy_for_multi_tables.h:
##########
@@ -0,0 +1,189 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "hash_table.h"
+
+namespace doris::vectorized {
+
+/** HashTableProxyForMultiTables contains (1 << bits_for_tables_num) tables.
+  */
+template <typename TData, size_t bits_for_tables_num = 2>
+class HashTableProxyForMultiTables {
+public:
+    using Data = TData;
+    using Self = HashTableProxyForMultiTables;
+    using LookupResult = typename Data::LookupResult;
+    using key_type = typename Data::key_type;
+    using mapped_type = typename Data::mapped_type;
+    using value_type = typename Data::value_type;
+
+    static constexpr size_t tables_num = 1 << bits_for_tables_num;

Review Comment:
   use capital TABLES_NUM as a constexpr value



##########
be/src/vec/common/hash_table/hash_table_proxy_for_multi_tables.h:
##########
@@ -0,0 +1,189 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "hash_table.h"
+
+namespace doris::vectorized {
+
+/** HashTableProxyForMultiTables contains (1 << bits_for_tables_num) tables.
+  */
+template <typename TData, size_t bits_for_tables_num = 2>
+class HashTableProxyForMultiTables {
+public:
+    using Data = TData;
+    using Self = HashTableProxyForMultiTables;
+    using LookupResult = typename Data::LookupResult;
+    using key_type = typename Data::key_type;
+    using mapped_type = typename Data::mapped_type;
+    using value_type = typename Data::value_type;
+
+    static constexpr size_t tables_num = 1 << bits_for_tables_num;
+    static constexpr size_t max_hash_value_shift = std::max(bits_for_tables_num, size_t(32));
+
+    HashTableProxyForMultiTables() { tail_table = &tables[tables_num - 1]; }
+
+private:
+    Data tables[tables_num];
+    Data* tail_table;
+
+    template <typename Derived, bool is_const>
+    class iterator_base {
+    public:
+        using it_type = std::conditional_t<is_const, typename Data::const_iterator,
+                                           typename Data::iterator>;
+
+    private:
+        using Container = std::conditional_t<is_const, const Self, Self>;
+
+        Container* container;

Review Comment:
    Container* _container; Use underscores on member variables
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on pull request #9432: [Feature]Use multi hash tables to aggregate

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on PR #9432:
URL: https://github.com/apache/incubator-doris/pull/9432#issuecomment-1120184258

   I still think we should have a detailed analysis on this.
   I have some questions for SSB Flat 100G
   1. fill factor and conflict factor in agg hash table (maybe an estimate value).
   2. data characteristics. (such as data size and order)
   
   I think there is a strong correlation between performance and workload. So we should take carefully to avoid potential performance fallback for other workloads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #9432: [Feature]Use multi hash tables to aggregate

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9432:
URL: https://github.com/apache/doris/pull/9432#issuecomment-1304673788

   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and feel free a maintainer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on pull request #9432: [Feature]Use multi hash tables to aggregate

Posted by GitBox <gi...@apache.org>.
morningman commented on PR #9432:
URL: https://github.com/apache/incubator-doris/pull/9432#issuecomment-1120149747

   Please reformat your BE C++ code:
   http://doris.incubator.apache.org/zh-CN/developer-guide/cpp-format-code.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] mrhhsg commented on pull request #9432: [Feature]Use multi hash tables to aggregate

Posted by GitBox <gi...@apache.org>.
mrhhsg commented on PR #9432:
URL: https://github.com/apache/incubator-doris/pull/9432#issuecomment-1120151279

   > Please reformat your BE C++ code: http://doris.incubator.apache.org/zh-CN/developer-guide/cpp-format-code.html
   
   done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] closed pull request #9432: [Feature]Use multi hash tables to aggregate

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #9432: [Feature]Use multi hash tables to aggregate
URL: https://github.com/apache/doris/pull/9432


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org