You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/25 10:35:17 UTC

[GitHub] [doris] jacktengg opened a new pull request, #13654: [improvement](hashjoin) support two-level hash table in hash join

jacktengg opened a new pull request, #13654:
URL: https://github.com/apache/doris/pull/13654

   # Proposed changes
   
   Issue Number: close #13653
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1020998126


##########
be/src/vec/common/hash_table/hash_table.h:
##########
@@ -1072,6 +1072,13 @@ class HashTable : private boost::noncopyable,
 
     size_t size() const { return m_size; }
 
+    size_t* sizes(size_t& num_buckets) const {
+        num_buckets = 1;
+        size_t* sizes = new size_t[1];

Review Comment:
   This may cause memory leaks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021148611


##########
be/src/vec/exec/join/vhash_join_node.cpp:
##########
@@ -825,6 +873,126 @@ Status HashJoinNode::_process_build_block(RuntimeState* state, Block& block, uin
     return st;
 }
 
+void HashJoinNode::_hash_table_convert_to_partitioned(HashTableVariants& new_hash_table_variants,

Review Comment:
   just create new_hash_table_variants in _hash_table_init then these code is not needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021054030


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by buckets;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_BUCKET = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_BUCKETS = 1ULL << BITS_FOR_BUCKET;
+    static constexpr size_t MAX_BUCKET = NUM_BUCKETS - 1;
+
+    //factor that will trigger growing the hash table on insert.
+    static constexpr float MAX_BUCKET_OCCUPANCY_FRACTION = 0.5f;
+
+    size_t hash(const Key& x) const { return Hash::operator()(x); }
+
+    /// NOTE Bad for hash tables with more than 2^32 cells.
+    static size_t getBucketFromHash(size_t hash_value) {
+        return (hash_value >> (32 - BITS_FOR_BUCKET)) & MAX_BUCKET;
+    }
+
+    float get_factor() const { return MAX_BUCKET_OCCUPANCY_FRACTION; }
+
+    bool should_be_shrink(int64_t valid_row) { return false; }
+
+    void init_buf_size(size_t reserve_for_num_elements) {}
+
+    void delete_zero_key(Key key) {}
+
+    size_t get_buffer_size_in_bytes() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_bytes();
+        return buff_size;
+    }
+
+    size_t get_buffer_size_in_cells() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_cells();
+        return buff_size;
+    }
+
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = NUM_BUCKETS;
+        size_t* sizes = new size_t[NUM_BUCKETS];
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            sizes[i] = impls[i].get_buffer_size_in_cells();
+        }
+        return sizes;
+    }
+
+    void reset_resize_timer() {
+        for (auto& impl : impls) {
+            impl.reset_resize_timer();
+        }
+    }
+    int64_t get_resize_timer_value() const {
+        int64_t resize_timer_ns = 0;
+        for (const auto& impl : impls) {
+            resize_timer_ns += impl.get_resize_timer_value();
+        }
+        return resize_timer_ns;
+    }
+
+protected:
+    typename Impl::iterator beginOfNextNonEmptyBucket(size_t& bucket) {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+    typename Impl::const_iterator beginOfNextNonEmptyBucket(size_t& bucket) const {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+public:
+    using key_type = typename Impl::key_type;
+    using mapped_type = typename Impl::mapped_type;
+    using value_type = typename Impl::value_type;
+    using cell_type = typename Impl::cell_type;
+
+    using LookupResult = typename Impl::LookupResult;
+    using ConstLookupResult = typename Impl::ConstLookupResult;
+
+    Impl impls[NUM_BUCKETS];
+
+    PartitionedHashTable() = default;
+
+    explicit PartitionedHashTable(size_t size_hint) {
+        for (auto& impl : impls) impl.reserve(size_hint / NUM_BUCKETS);
+    }
+
+    /// Copy the data from another (normal) hash table. It should have the same hash function.
+    template <typename Source>
+    explicit PartitionedHashTable(const Source& src) {
+        typename Source::const_iterator it = src.begin();
+
+        /// It is assumed that the zero key (stored separately) is first in iteration order.
+        if (it != src.end() && it.get_ptr()->is_zero(src)) {
+            insert(it->get_value());
+            ++it;
+        }
+
+        for (; it != src.end(); ++it) {
+            const Cell* cell = it.get_ptr();
+            size_t hash_value = cell->get_hash(src);
+            size_t buck = getBucketFromHash(hash_value);
+            impls[buck].insert_unique_non_zero(cell, hash_value);
+        }
+    }
+
+    PartitionedHashTable(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            impls[i] = std::move(rhs.impls[i]);
+        }
+    }
+
+    PartitionedHashTable& operator=(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            impls[i] = std::move(rhs.impls[i]);
+        }
+        return *this;
+    }
+
+    class iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        iterator(Self* container_, size_t bucket_, typename Impl::iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        iterator() = default;
+
+        bool operator==(const iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const iterator& rhs) const { return !(*this == rhs); }
+
+        iterator& operator++() {
+            ++current_it;
+            if (current_it == container->impls[bucket].end()) {
+                ++bucket;
+                current_it = container->beginOfNextNonEmptyBucket(bucket);
+            }
+
+            return *this;
+        }
+
+        Cell& operator*() const { return *current_it; }
+        Cell* operator->() const { return current_it.get_ptr(); }
+
+        Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    class const_iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::const_iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        const_iterator(Self* container_, size_t bucket_, typename Impl::const_iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        const_iterator() = default;
+        const_iterator(const iterator& rhs)
+                : container(rhs.container),
+                  bucket(rhs.bucket),
+                  current_it(rhs.current_it) {} /// NOLINT
+
+        bool operator==(const const_iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const const_iterator& rhs) const { return !(*this == rhs); }
+
+        const_iterator& operator++() {
+            ++current_it;
+            if (current_it == container->impls[bucket].end()) {
+                ++bucket;
+                current_it = container->beginOfNextNonEmptyBucket(bucket);
+            }
+
+            return *this;
+        }
+
+        const Cell& operator*() const { return *current_it; }
+        const Cell* operator->() const { return current_it->get_ptr(); }
+
+        const Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    const_iterator begin() const {
+        size_t buck = 0;
+        typename Impl::const_iterator impl_it = beginOfNextNonEmptyBucket(buck);
+        return {this, buck, impl_it};
+    }
+
+    iterator begin() {
+        size_t buck = 0;
+        typename Impl::iterator impl_it = beginOfNextNonEmptyBucket(buck);
+        return {this, buck, impl_it};
+    }
+
+    const_iterator end() const { return {this, MAX_BUCKET, impls[MAX_BUCKET].end()}; }
+    iterator end() { return {this, MAX_BUCKET, impls[MAX_BUCKET].end()}; }
+
+    void expanse_for_add_elem(size_t num_elem) {
+        size_t num_elem_per_bucket = (num_elem + NUM_BUCKETS - 1) / NUM_BUCKETS;
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) impls[i].expanse_for_add_elem(num_elem_per_bucket);
+    }
+
+    /// Insert a value. In the case of any more complex values, it is better to use the `emplace` function.
+    std::pair<LookupResult, bool> ALWAYS_INLINE insert(const value_type& x) {
+        size_t hash_value = hash(Cell::get_key(x));
+
+        std::pair<LookupResult, bool> res;
+        emplace(Cell::get_key(x), res.first, res.second, hash_value);
+
+        if (res.second) insert_set_mapped(lookup_result_get_mapped(res.first), x);
+
+        return res;
+    }
+
+    template <typename KeyHolder>
+    void ALWAYS_INLINE prefetch(KeyHolder& key_holder) {
+        const auto& key = key_holder_get_key(key_holder);
+        const auto key_hash = hash(key);
+        const auto bucket = getBucketFromHash(key_hash);
+        impls[bucket].prefetch(key_holder);
+    }
+
+    template <bool READ>
+    void ALWAYS_INLINE prefetch_by_hash(size_t hash_value) {
+        const auto bucket = getBucketFromHash(hash_value);
+        impls[bucket].template prefetch_by_hash<READ>(hash_value);
+    }
+
+    template <bool READ, typename KeyHolder>
+    void ALWAYS_INLINE prefetch(KeyHolder& key_holder) {
+        const auto& key = key_holder_get_key(key_holder);
+        const auto key_hash = hash(key);
+        const auto bucket = getBucketFromHash(key_hash);
+        impls[bucket].template prefetch<READ>(key_holder);
+    }
+
+    /** Insert the key,
+      * return an iterator to a position that can be used for `placement new` of value,
+      * as well as the flag - whether a new key was inserted.
+      *
+      * You have to make `placement new` values if you inserted a new key,
+      * since when destroying a hash table, the destructor will be invoked for it!
+      *
+      * Example usage:
+      *
+      * Map::iterator it;
+      * bool inserted;
+      * map.emplace(key, it, inserted);
+      * if (inserted)
+      *     new(&it->second) Mapped(value);
+      */
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, bool& inserted) {
+        size_t hash_value = hash(key_holder_get_key(key_holder));
+        emplace(key_holder, it, inserted, hash_value);
+    }
+
+    /// Same, but with a precalculated values of hash function.
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, bool& inserted,
+                               size_t hash_value) {
+        size_t buck = getBucketFromHash(hash_value);
+        impls[buck].emplace(key_holder, it, inserted, hash_value);
+    }
+
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, size_t hash_value,
+                               bool& inserted) {
+        emplace(key_holder, it, inserted, hash_value);
+    }
+
+    LookupResult ALWAYS_INLINE find(Key x, size_t hash_value) {
+        size_t buck = getBucketFromHash(hash_value);
+        return impls[buck].find(x, hash_value);
+    }
+
+    ConstLookupResult ALWAYS_INLINE find(Key x, size_t hash_value) const {
+        return const_cast<std::decay_t<decltype(*this)>*>(this)->find(x, hash_value);
+    }
+
+    LookupResult ALWAYS_INLINE find(Key x) { return find(x, hash(x)); }
+
+    ConstLookupResult ALWAYS_INLINE find(Key x) const { return find(x, hash(x)); }
+
+    void write(doris::vectorized::BufferWritable& wb) const {
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) impls[i].write(wb);
+    }
+
+    /*
+    void writeText(DB::WriteBuffer & wb) const
+    {

Review Comment:
   delete these code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1020999618


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by buckets;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_BUCKET = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_BUCKETS = 1ULL << BITS_FOR_BUCKET;

Review Comment:
   NUM_BUCKETS or  NUM_SUB_TABLES ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1020999475


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by buckets;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_BUCKET = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_BUCKETS = 1ULL << BITS_FOR_BUCKET;
+    static constexpr size_t MAX_BUCKET = NUM_BUCKETS - 1;
+
+    //factor that will trigger growing the hash table on insert.
+    static constexpr float MAX_BUCKET_OCCUPANCY_FRACTION = 0.5f;
+
+    size_t hash(const Key& x) const { return Hash::operator()(x); }
+
+    /// NOTE Bad for hash tables with more than 2^32 cells.
+    static size_t getBucketFromHash(size_t hash_value) {
+        return (hash_value >> (32 - BITS_FOR_BUCKET)) & MAX_BUCKET;
+    }
+
+    float get_factor() const { return MAX_BUCKET_OCCUPANCY_FRACTION; }
+
+    bool should_be_shrink(int64_t valid_row) { return false; }
+
+    void init_buf_size(size_t reserve_for_num_elements) {}
+
+    void delete_zero_key(Key key) {}
+
+    size_t get_buffer_size_in_bytes() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_bytes();
+        return buff_size;
+    }
+
+    size_t get_buffer_size_in_cells() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_cells();
+        return buff_size;
+    }
+
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = NUM_BUCKETS;
+        size_t* sizes = new size_t[NUM_BUCKETS];
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            sizes[i] = impls[i].get_buffer_size_in_cells();
+        }
+        return sizes;
+    }
+
+    void reset_resize_timer() {
+        for (auto& impl : impls) {
+            impl.reset_resize_timer();
+        }
+    }
+    int64_t get_resize_timer_value() const {
+        int64_t resize_timer_ns = 0;
+        for (const auto& impl : impls) {
+            resize_timer_ns += impl.get_resize_timer_value();
+        }
+        return resize_timer_ns;
+    }
+
+protected:
+    typename Impl::iterator beginOfNextNonEmptyBucket(size_t& bucket) {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+    typename Impl::const_iterator beginOfNextNonEmptyBucket(size_t& bucket) const {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+public:
+    using key_type = typename Impl::key_type;
+    using mapped_type = typename Impl::mapped_type;
+    using value_type = typename Impl::value_type;
+    using cell_type = typename Impl::cell_type;
+
+    using LookupResult = typename Impl::LookupResult;
+    using ConstLookupResult = typename Impl::ConstLookupResult;
+
+    Impl impls[NUM_BUCKETS];

Review Comment:
   rename impls to sub_table or other more clear name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021055381


##########
be/src/vec/common/hash_table/hash_table.h:
##########
@@ -1111,6 +1118,12 @@ class HashTable : private boost::noncopyable,
     size_t get_buffer_size_in_bytes() const { return grower.buf_size() * sizeof(Cell); }
 
     size_t get_buffer_size_in_cells() const { return grower.buf_size(); }
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = 1;

Review Comment:
   I think, we should not return a pointer and delete outside this method.
   Just return a vector, cpp will do return value optimization.... maybe you could test it.
   Another method, get_buffer_sizes_in_cells(size_t& num_buckets, std::vector<xxx>* outputvalue)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021054512


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by buckets;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_BUCKET = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_BUCKETS = 1ULL << BITS_FOR_BUCKET;
+    static constexpr size_t MAX_BUCKET = NUM_BUCKETS - 1;
+
+    //factor that will trigger growing the hash table on insert.
+    static constexpr float MAX_BUCKET_OCCUPANCY_FRACTION = 0.5f;
+
+    size_t hash(const Key& x) const { return Hash::operator()(x); }
+
+    /// NOTE Bad for hash tables with more than 2^32 cells.
+    static size_t getBucketFromHash(size_t hash_value) {
+        return (hash_value >> (32 - BITS_FOR_BUCKET)) & MAX_BUCKET;
+    }
+
+    float get_factor() const { return MAX_BUCKET_OCCUPANCY_FRACTION; }
+
+    bool should_be_shrink(int64_t valid_row) { return false; }
+
+    void init_buf_size(size_t reserve_for_num_elements) {}
+
+    void delete_zero_key(Key key) {}
+
+    size_t get_buffer_size_in_bytes() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_bytes();
+        return buff_size;
+    }
+
+    size_t get_buffer_size_in_cells() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_cells();
+        return buff_size;
+    }
+
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = NUM_BUCKETS;

Review Comment:
   Is this method called?



##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by buckets;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_BUCKET = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_BUCKETS = 1ULL << BITS_FOR_BUCKET;
+    static constexpr size_t MAX_BUCKET = NUM_BUCKETS - 1;
+
+    //factor that will trigger growing the hash table on insert.
+    static constexpr float MAX_BUCKET_OCCUPANCY_FRACTION = 0.5f;
+
+    size_t hash(const Key& x) const { return Hash::operator()(x); }
+
+    /// NOTE Bad for hash tables with more than 2^32 cells.
+    static size_t getBucketFromHash(size_t hash_value) {
+        return (hash_value >> (32 - BITS_FOR_BUCKET)) & MAX_BUCKET;
+    }
+
+    float get_factor() const { return MAX_BUCKET_OCCUPANCY_FRACTION; }
+
+    bool should_be_shrink(int64_t valid_row) { return false; }
+
+    void init_buf_size(size_t reserve_for_num_elements) {}
+
+    void delete_zero_key(Key key) {}
+
+    size_t get_buffer_size_in_bytes() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_bytes();
+        return buff_size;
+    }
+
+    size_t get_buffer_size_in_cells() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_cells();
+        return buff_size;
+    }
+
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = NUM_BUCKETS;

Review Comment:
   Is this method used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021053893


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by buckets;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_BUCKET = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_BUCKETS = 1ULL << BITS_FOR_BUCKET;
+    static constexpr size_t MAX_BUCKET = NUM_BUCKETS - 1;
+
+    //factor that will trigger growing the hash table on insert.
+    static constexpr float MAX_BUCKET_OCCUPANCY_FRACTION = 0.5f;
+
+    size_t hash(const Key& x) const { return Hash::operator()(x); }
+
+    /// NOTE Bad for hash tables with more than 2^32 cells.
+    static size_t getBucketFromHash(size_t hash_value) {
+        return (hash_value >> (32 - BITS_FOR_BUCKET)) & MAX_BUCKET;
+    }
+
+    float get_factor() const { return MAX_BUCKET_OCCUPANCY_FRACTION; }
+
+    bool should_be_shrink(int64_t valid_row) { return false; }
+
+    void init_buf_size(size_t reserve_for_num_elements) {}
+
+    void delete_zero_key(Key key) {}
+
+    size_t get_buffer_size_in_bytes() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_bytes();
+        return buff_size;
+    }
+
+    size_t get_buffer_size_in_cells() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_cells();
+        return buff_size;
+    }
+
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = NUM_BUCKETS;
+        size_t* sizes = new size_t[NUM_BUCKETS];
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            sizes[i] = impls[i].get_buffer_size_in_cells();
+        }
+        return sizes;
+    }
+
+    void reset_resize_timer() {
+        for (auto& impl : impls) {
+            impl.reset_resize_timer();
+        }
+    }
+    int64_t get_resize_timer_value() const {
+        int64_t resize_timer_ns = 0;
+        for (const auto& impl : impls) {
+            resize_timer_ns += impl.get_resize_timer_value();
+        }
+        return resize_timer_ns;
+    }
+
+protected:
+    typename Impl::iterator beginOfNextNonEmptyBucket(size_t& bucket) {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+    typename Impl::const_iterator beginOfNextNonEmptyBucket(size_t& bucket) const {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+public:
+    using key_type = typename Impl::key_type;
+    using mapped_type = typename Impl::mapped_type;
+    using value_type = typename Impl::value_type;
+    using cell_type = typename Impl::cell_type;
+
+    using LookupResult = typename Impl::LookupResult;
+    using ConstLookupResult = typename Impl::ConstLookupResult;
+
+    Impl impls[NUM_BUCKETS];
+
+    PartitionedHashTable() = default;
+
+    explicit PartitionedHashTable(size_t size_hint) {
+        for (auto& impl : impls) impl.reserve(size_hint / NUM_BUCKETS);
+    }
+
+    /// Copy the data from another (normal) hash table. It should have the same hash function.
+    template <typename Source>
+    explicit PartitionedHashTable(const Source& src) {
+        typename Source::const_iterator it = src.begin();
+
+        /// It is assumed that the zero key (stored separately) is first in iteration order.
+        if (it != src.end() && it.get_ptr()->is_zero(src)) {
+            insert(it->get_value());
+            ++it;
+        }
+
+        for (; it != src.end(); ++it) {
+            const Cell* cell = it.get_ptr();
+            size_t hash_value = cell->get_hash(src);
+            size_t buck = getBucketFromHash(hash_value);
+            impls[buck].insert_unique_non_zero(cell, hash_value);
+        }
+    }
+
+    PartitionedHashTable(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            impls[i] = std::move(rhs.impls[i]);
+        }
+    }
+
+    PartitionedHashTable& operator=(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            impls[i] = std::move(rhs.impls[i]);
+        }
+        return *this;
+    }
+
+    class iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        iterator(Self* container_, size_t bucket_, typename Impl::iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        iterator() = default;
+
+        bool operator==(const iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const iterator& rhs) const { return !(*this == rhs); }
+
+        iterator& operator++() {
+            ++current_it;
+            if (current_it == container->impls[bucket].end()) {
+                ++bucket;
+                current_it = container->beginOfNextNonEmptyBucket(bucket);
+            }
+
+            return *this;
+        }
+
+        Cell& operator*() const { return *current_it; }
+        Cell* operator->() const { return current_it.get_ptr(); }
+
+        Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    class const_iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::const_iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        const_iterator(Self* container_, size_t bucket_, typename Impl::const_iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        const_iterator() = default;
+        const_iterator(const iterator& rhs)
+                : container(rhs.container),
+                  bucket(rhs.bucket),
+                  current_it(rhs.current_it) {} /// NOLINT
+
+        bool operator==(const const_iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const const_iterator& rhs) const { return !(*this == rhs); }
+
+        const_iterator& operator++() {
+            ++current_it;
+            if (current_it == container->impls[bucket].end()) {
+                ++bucket;
+                current_it = container->beginOfNextNonEmptyBucket(bucket);
+            }
+
+            return *this;
+        }
+
+        const Cell& operator*() const { return *current_it; }
+        const Cell* operator->() const { return current_it->get_ptr(); }
+
+        const Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    const_iterator begin() const {
+        size_t buck = 0;
+        typename Impl::const_iterator impl_it = beginOfNextNonEmptyBucket(buck);
+        return {this, buck, impl_it};
+    }
+
+    iterator begin() {
+        size_t buck = 0;
+        typename Impl::iterator impl_it = beginOfNextNonEmptyBucket(buck);
+        return {this, buck, impl_it};
+    }
+
+    const_iterator end() const { return {this, MAX_BUCKET, impls[MAX_BUCKET].end()}; }
+    iterator end() { return {this, MAX_BUCKET, impls[MAX_BUCKET].end()}; }
+
+    void expanse_for_add_elem(size_t num_elem) {
+        size_t num_elem_per_bucket = (num_elem + NUM_BUCKETS - 1) / NUM_BUCKETS;
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) impls[i].expanse_for_add_elem(num_elem_per_bucket);
+    }
+
+    /// Insert a value. In the case of any more complex values, it is better to use the `emplace` function.
+    std::pair<LookupResult, bool> ALWAYS_INLINE insert(const value_type& x) {
+        size_t hash_value = hash(Cell::get_key(x));
+
+        std::pair<LookupResult, bool> res;
+        emplace(Cell::get_key(x), res.first, res.second, hash_value);
+
+        if (res.second) insert_set_mapped(lookup_result_get_mapped(res.first), x);
+
+        return res;
+    }
+
+    template <typename KeyHolder>
+    void ALWAYS_INLINE prefetch(KeyHolder& key_holder) {
+        const auto& key = key_holder_get_key(key_holder);
+        const auto key_hash = hash(key);
+        const auto bucket = getBucketFromHash(key_hash);
+        impls[bucket].prefetch(key_holder);
+    }
+
+    template <bool READ>
+    void ALWAYS_INLINE prefetch_by_hash(size_t hash_value) {
+        const auto bucket = getBucketFromHash(hash_value);
+        impls[bucket].template prefetch_by_hash<READ>(hash_value);
+    }
+
+    template <bool READ, typename KeyHolder>
+    void ALWAYS_INLINE prefetch(KeyHolder& key_holder) {
+        const auto& key = key_holder_get_key(key_holder);
+        const auto key_hash = hash(key);
+        const auto bucket = getBucketFromHash(key_hash);
+        impls[bucket].template prefetch<READ>(key_holder);
+    }
+
+    /** Insert the key,
+      * return an iterator to a position that can be used for `placement new` of value,
+      * as well as the flag - whether a new key was inserted.
+      *
+      * You have to make `placement new` values if you inserted a new key,
+      * since when destroying a hash table, the destructor will be invoked for it!
+      *
+      * Example usage:
+      *
+      * Map::iterator it;
+      * bool inserted;
+      * map.emplace(key, it, inserted);
+      * if (inserted)
+      *     new(&it->second) Mapped(value);
+      */
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, bool& inserted) {
+        size_t hash_value = hash(key_holder_get_key(key_holder));
+        emplace(key_holder, it, inserted, hash_value);
+    }
+
+    /// Same, but with a precalculated values of hash function.
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, bool& inserted,
+                               size_t hash_value) {
+        size_t buck = getBucketFromHash(hash_value);

Review Comment:
   getBucketFromHash --> get_sub_table_from_hash



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1020999930


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).

Review Comment:
   comment is 8 not 256?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021052509


##########
be/src/vec/common/hash_table/hash_table.h:
##########
@@ -1111,6 +1118,12 @@ class HashTable : private boost::noncopyable,
     size_t get_buffer_size_in_bytes() const { return grower.buf_size() * sizeof(Cell); }
 
     size_t get_buffer_size_in_cells() const { return grower.buf_size(); }
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = 1;

Review Comment:
   And here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021148778


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,387 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 16 (or 1ULL << BITS_FOR_SUB_TABLE) small hash tables (sub table count of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by sub tables;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_SUB_TABLE = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_SUB_TABLES = 1ULL << BITS_FOR_SUB_TABLE;
+    static constexpr size_t MAX_SUB_TABLE = NUM_SUB_TABLES - 1;
+
+    //factor that will trigger growing the hash table on insert.
+    static constexpr float MAX_SUB_TABLE_OCCUPANCY_FRACTION = 0.5f;
+
+    size_t hash(const Key& x) const { return Hash::operator()(x); }
+
+    /// NOTE Bad for hash tables with more than 2^32 cells.
+    static size_t get_sub_table_from_hash(size_t hash_value) {
+        return (hash_value >> (32 - BITS_FOR_SUB_TABLE)) & MAX_SUB_TABLE;
+    }
+
+    float get_factor() const { return MAX_SUB_TABLE_OCCUPANCY_FRACTION; }
+
+    bool should_be_shrink(int64_t valid_row) { return false; }
+
+    void init_buf_size(size_t reserve_for_num_elements) {}
+
+    void delete_zero_key(Key key) {}
+
+    size_t get_buffer_size_in_bytes() const {
+        size_t buff_size = 0;
+        for (const auto& impl : sub_tables) buff_size += impl.get_buffer_size_in_bytes();
+        return buff_size;
+    }
+
+    size_t get_buffer_size_in_cells() const {
+        size_t buff_size = 0;
+        for (const auto& impl : sub_tables) buff_size += impl.get_buffer_size_in_cells();
+        return buff_size;
+    }
+
+    std::vector<size_t> get_buffer_sizes_in_cells() const {
+        std::vector<size_t> sizes;
+        for (size_t i = 0; i < NUM_SUB_TABLES; ++i) {
+            sizes.push_back(sub_tables[i].get_buffer_size_in_cells());
+        }
+        return sizes;
+    }
+
+    void reset_resize_timer() {
+        for (auto& impl : sub_tables) {
+            impl.reset_resize_timer();
+        }
+    }
+    int64_t get_resize_timer_value() const {
+        int64_t resize_timer_ns = 0;
+        for (const auto& impl : sub_tables) {
+            resize_timer_ns += impl.get_resize_timer_value();
+        }
+        return resize_timer_ns;
+    }
+
+protected:
+    typename Impl::iterator begin_of_next_non_empty_bucket(size_t& bucket) {
+        while (bucket != NUM_SUB_TABLES && sub_tables[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_SUB_TABLES) return sub_tables[bucket].begin();
+
+        --bucket;
+        return sub_tables[MAX_SUB_TABLE].end();
+    }
+
+    typename Impl::const_iterator begin_of_next_non_empty_bucket(size_t& bucket) const {
+        while (bucket != NUM_SUB_TABLES && sub_tables[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_SUB_TABLES) return sub_tables[bucket].begin();
+
+        --bucket;
+        return sub_tables[MAX_SUB_TABLE].end();
+    }
+
+public:
+    using key_type = typename Impl::key_type;
+    using mapped_type = typename Impl::mapped_type;
+    using value_type = typename Impl::value_type;
+    using cell_type = typename Impl::cell_type;
+
+    using LookupResult = typename Impl::LookupResult;
+    using ConstLookupResult = typename Impl::ConstLookupResult;
+
+    Impl sub_tables[NUM_SUB_TABLES];
+
+    PartitionedHashTable() = default;
+
+    explicit PartitionedHashTable(size_t size_hint) {
+        for (auto& impl : sub_tables) impl.reserve(size_hint / NUM_SUB_TABLES);
+    }
+
+    /// Copy the data from another (normal) hash table. It should have the same hash function.
+    template <typename Source>
+    explicit PartitionedHashTable(const Source& src) {
+        typename Source::const_iterator it = src.begin();
+
+        /// It is assumed that the zero key (stored separately) is first in iteration order.
+        if (it != src.end() && it.get_ptr()->is_zero(src)) {
+            insert(it->get_value());
+            ++it;
+        }
+
+        for (; it != src.end(); ++it) {
+            const Cell* cell = it.get_ptr();
+            size_t hash_value = cell->get_hash(src);
+            size_t buck = get_sub_table_from_hash(hash_value);
+            sub_tables[buck].insert_unique_non_zero(cell, hash_value);
+        }
+    }
+
+    PartitionedHashTable(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_SUB_TABLES; ++i) {
+            sub_tables[i] = std::move(rhs.sub_tables[i]);
+        }
+    }
+
+    PartitionedHashTable& operator=(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_SUB_TABLES; ++i) {
+            sub_tables[i] = std::move(rhs.sub_tables[i]);
+        }
+        return *this;
+    }
+
+    class iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        iterator(Self* container_, size_t bucket_, typename Impl::iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        iterator() = default;
+
+        bool operator==(const iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const iterator& rhs) const { return !(*this == rhs); }
+
+        iterator& operator++() {
+            ++current_it;
+            if (current_it == container->sub_tables[bucket].end()) {
+                ++bucket;
+                current_it = container->begin_of_next_non_empty_bucket(bucket);
+            }
+
+            return *this;
+        }
+
+        Cell& operator*() const { return *current_it; }
+        Cell* operator->() const { return current_it.get_ptr(); }
+
+        Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    class const_iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::const_iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        const_iterator(Self* container_, size_t bucket_, typename Impl::const_iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        const_iterator() = default;
+        const_iterator(const iterator& rhs)
+                : container(rhs.container),
+                  bucket(rhs.bucket),
+                  current_it(rhs.current_it) {} /// NOLINT
+
+        bool operator==(const const_iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const const_iterator& rhs) const { return !(*this == rhs); }
+
+        const_iterator& operator++() {
+            ++current_it;
+            if (current_it == container->sub_tables[bucket].end()) {
+                ++bucket;
+                current_it = container->begin_of_next_non_empty_bucket(bucket);
+            }
+
+            return *this;
+        }
+
+        const Cell& operator*() const { return *current_it; }
+        const Cell* operator->() const { return current_it->get_ptr(); }
+
+        const Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    const_iterator begin() const {
+        size_t buck = 0;
+        typename Impl::const_iterator impl_it = begin_of_next_non_empty_bucket(buck);
+        return {this, buck, impl_it};
+    }
+
+    iterator begin() {
+        size_t buck = 0;
+        typename Impl::iterator impl_it = begin_of_next_non_empty_bucket(buck);
+        return {this, buck, impl_it};
+    }
+
+    const_iterator end() const { return {this, MAX_SUB_TABLE, sub_tables[MAX_SUB_TABLE].end()}; }
+    iterator end() { return {this, MAX_SUB_TABLE, sub_tables[MAX_SUB_TABLE].end()}; }
+
+    void expanse_for_add_elem(size_t num_elem) {
+        size_t num_elem_per_bucket = (num_elem + NUM_SUB_TABLES - 1) / NUM_SUB_TABLES;
+        for (size_t i = 0; i < NUM_SUB_TABLES; ++i)
+            sub_tables[i].expanse_for_add_elem(num_elem_per_bucket);
+    }
+
+    /// Insert a value. In the case of any more complex values, it is better to use the `emplace` function.
+    std::pair<LookupResult, bool> ALWAYS_INLINE insert(const value_type& x) {
+        size_t hash_value = hash(Cell::get_key(x));
+
+        std::pair<LookupResult, bool> res;
+        emplace(Cell::get_key(x), res.first, res.second, hash_value);
+
+        if (res.second) insert_set_mapped(lookup_result_get_mapped(res.first), x);
+
+        return res;
+    }
+
+    template <typename KeyHolder>
+    void ALWAYS_INLINE prefetch(KeyHolder& key_holder) {
+        const auto& key = key_holder_get_key(key_holder);
+        const auto key_hash = hash(key);
+        const auto bucket = get_sub_table_from_hash(key_hash);
+        sub_tables[bucket].prefetch(key_holder);
+    }
+
+    template <bool READ>
+    void ALWAYS_INLINE prefetch_by_hash(size_t hash_value) {
+        const auto bucket = get_sub_table_from_hash(hash_value);
+        sub_tables[bucket].template prefetch_by_hash<READ>(hash_value);
+    }
+
+    template <bool READ, typename KeyHolder>
+    void ALWAYS_INLINE prefetch(KeyHolder& key_holder) {
+        const auto& key = key_holder_get_key(key_holder);
+        const auto key_hash = hash(key);
+        const auto bucket = get_sub_table_from_hash(key_hash);
+        sub_tables[bucket].template prefetch<READ>(key_holder);
+    }
+
+    /** Insert the key,
+      * return an iterator to a position that can be used for `placement new` of value,
+      * as well as the flag - whether a new key was inserted.
+      *
+      * You have to make `placement new` values if you inserted a new key,
+      * since when destroying a hash table, the destructor will be invoked for it!
+      *
+      * Example usage:
+      *
+      * Map::iterator it;
+      * bool inserted;
+      * map.emplace(key, it, inserted);
+      * if (inserted)
+      *     new(&it->second) Mapped(value);
+      */
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, bool& inserted) {
+        size_t hash_value = hash(key_holder_get_key(key_holder));
+        emplace(key_holder, it, inserted, hash_value);
+    }
+
+    /// Same, but with a precalculated values of hash function.
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, bool& inserted,
+                               size_t hash_value) {
+        size_t buck = get_sub_table_from_hash(hash_value);
+        sub_tables[buck].emplace(key_holder, it, inserted, hash_value);
+    }
+
+    template <typename KeyHolder>
+    void ALWAYS_INLINE emplace(KeyHolder&& key_holder, LookupResult& it, size_t hash_value,
+                               bool& inserted) {
+        emplace(key_holder, it, inserted, hash_value);
+    }
+
+    LookupResult ALWAYS_INLINE find(Key x, size_t hash_value) {
+        size_t buck = get_sub_table_from_hash(hash_value);
+        return sub_tables[buck].find(x, hash_value);
+    }
+
+    ConstLookupResult ALWAYS_INLINE find(Key x, size_t hash_value) const {
+        return const_cast<std::decay_t<decltype(*this)>*>(this)->find(x, hash_value);
+    }
+
+    LookupResult ALWAYS_INLINE find(Key x) { return find(x, hash(x)); }
+
+    ConstLookupResult ALWAYS_INLINE find(Key x) const { return find(x, hash(x)); }
+
+    void write(doris::vectorized::BufferWritable& wb) const {

Review Comment:
   read and write method is useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] hello-stephen commented on pull request #13654: [improvement](hashjoin) support two-level hash table in hash join

Posted by GitBox <gi...@apache.org>.

hello-stephen commented on PR #13654:
URL: https://github.com/apache/doris/pull/13654#issuecomment-1309834948

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 35.83 seconds
    load time: 446 seconds
    storage size: 17181733580 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221110063054_clickbench_pr_43161.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei closed pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei closed pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join
URL: https://github.com/apache/doris/pull/13654


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1020998204


##########
be/src/vec/common/hash_table/hash_table.h:
##########
@@ -1072,6 +1072,13 @@ class HashTable : private boost::noncopyable,
 
     size_t size() const { return m_size; }
 
+    size_t* sizes(size_t& num_buckets) const {
+        num_buckets = 1;
+        size_t* sizes = new size_t[1];

Review Comment:
   why do we need return a pointer not just a value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021055702


##########
gensrc/thrift/PaloInternalService.thrift:
##########
@@ -179,6 +179,8 @@ struct TQueryOptions {
   51: optional bool enable_new_shuffle_hash_method
 
   52: optional i32 be_exec_version = 0
+
+  53: optional bool enable_partitioned_hash_join = false

Review Comment:
   please set default value to true. If not we could not make sure we passed Test Pipeline.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [doris] yiguolei commented on a diff in pull request #13654: [improvement](hashjoin) support partitioned hash table in hash join

Posted by GitBox <gi...@apache.org>.

yiguolei commented on code in PR #13654:
URL: https://github.com/apache/doris/pull/13654#discussion_r1021049070


##########
be/src/vec/common/hash_table/partitioned_hash_table.h:
##########
@@ -0,0 +1,412 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/HashTable/TwoLevelHashTable.h
+// and modified by Doris
+#pragma once
+
+#include "vec/common/hash_table/hash_table.h"
+
+/** Two-level hash table.
+  * Represents 256 (or 1ULL << BITS_FOR_BUCKET) small hash tables (buckets of the first level).
+  * To determine which one to use, one of the bytes of the hash function is taken.
+  *
+  * Usually works a little slower than a simple hash table.
+  * However, it has advantages in some cases:
+  * - if you need to merge two hash tables together, then you can easily parallelize it by buckets;
+  * - delay during resizes is amortized, since the small hash tables will be resized separately;
+  * - in theory, resizes are cache-local in a larger range of sizes.
+  */
+
+template <size_t initial_size_degree = 8>
+struct PartitionedHashTableGrower : public HashTableGrowerWithPrecalculation<initial_size_degree> {
+    /// Increase the size of the hash table.
+    void increase_size() { this->increase_size_degree(this->size_degree() >= 15 ? 1 : 2); }
+};
+
+template <typename Key, typename Cell, typename Hash, typename Grower, typename Allocator,
+          typename ImplTable = HashTable<Key, Cell, Hash, Grower, Allocator>,
+          size_t BITS_FOR_BUCKET = 4>
+class PartitionedHashTable : private boost::noncopyable,
+                             protected Hash /// empty base optimization
+{
+protected:
+    friend class const_iterator;
+    friend class iterator;
+
+    using HashValue = size_t;
+    using Self = PartitionedHashTable;
+
+public:
+    using Impl = ImplTable;
+
+    static constexpr size_t NUM_BUCKETS = 1ULL << BITS_FOR_BUCKET;
+    static constexpr size_t MAX_BUCKET = NUM_BUCKETS - 1;
+
+    //factor that will trigger growing the hash table on insert.
+    static constexpr float MAX_BUCKET_OCCUPANCY_FRACTION = 0.5f;
+
+    size_t hash(const Key& x) const { return Hash::operator()(x); }
+
+    /// NOTE Bad for hash tables with more than 2^32 cells.
+    static size_t getBucketFromHash(size_t hash_value) {
+        return (hash_value >> (32 - BITS_FOR_BUCKET)) & MAX_BUCKET;
+    }
+
+    float get_factor() const { return MAX_BUCKET_OCCUPANCY_FRACTION; }
+
+    bool should_be_shrink(int64_t valid_row) { return false; }
+
+    void init_buf_size(size_t reserve_for_num_elements) {}
+
+    void delete_zero_key(Key key) {}
+
+    size_t get_buffer_size_in_bytes() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_bytes();
+        return buff_size;
+    }
+
+    size_t get_buffer_size_in_cells() const {
+        size_t buff_size = 0;
+        for (const auto& impl : impls) buff_size += impl.get_buffer_size_in_cells();
+        return buff_size;
+    }
+
+    size_t* get_buffer_sizes_in_cells(size_t& num_buckets) const {
+        num_buckets = NUM_BUCKETS;
+        size_t* sizes = new size_t[NUM_BUCKETS];
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            sizes[i] = impls[i].get_buffer_size_in_cells();
+        }
+        return sizes;
+    }
+
+    void reset_resize_timer() {
+        for (auto& impl : impls) {
+            impl.reset_resize_timer();
+        }
+    }
+    int64_t get_resize_timer_value() const {
+        int64_t resize_timer_ns = 0;
+        for (const auto& impl : impls) {
+            resize_timer_ns += impl.get_resize_timer_value();
+        }
+        return resize_timer_ns;
+    }
+
+protected:
+    typename Impl::iterator beginOfNextNonEmptyBucket(size_t& bucket) {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+    typename Impl::const_iterator beginOfNextNonEmptyBucket(size_t& bucket) const {
+        while (bucket != NUM_BUCKETS && impls[bucket].empty()) ++bucket;
+
+        if (bucket != NUM_BUCKETS) return impls[bucket].begin();
+
+        --bucket;
+        return impls[MAX_BUCKET].end();
+    }
+
+public:
+    using key_type = typename Impl::key_type;
+    using mapped_type = typename Impl::mapped_type;
+    using value_type = typename Impl::value_type;
+    using cell_type = typename Impl::cell_type;
+
+    using LookupResult = typename Impl::LookupResult;
+    using ConstLookupResult = typename Impl::ConstLookupResult;
+
+    Impl impls[NUM_BUCKETS];
+
+    PartitionedHashTable() = default;
+
+    explicit PartitionedHashTable(size_t size_hint) {
+        for (auto& impl : impls) impl.reserve(size_hint / NUM_BUCKETS);
+    }
+
+    /// Copy the data from another (normal) hash table. It should have the same hash function.
+    template <typename Source>
+    explicit PartitionedHashTable(const Source& src) {
+        typename Source::const_iterator it = src.begin();
+
+        /// It is assumed that the zero key (stored separately) is first in iteration order.
+        if (it != src.end() && it.get_ptr()->is_zero(src)) {
+            insert(it->get_value());
+            ++it;
+        }
+
+        for (; it != src.end(); ++it) {
+            const Cell* cell = it.get_ptr();
+            size_t hash_value = cell->get_hash(src);
+            size_t buck = getBucketFromHash(hash_value);
+            impls[buck].insert_unique_non_zero(cell, hash_value);
+        }
+    }
+
+    PartitionedHashTable(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            impls[i] = std::move(rhs.impls[i]);
+        }
+    }
+
+    PartitionedHashTable& operator=(PartitionedHashTable&& rhs) {
+        for (size_t i = 0; i < NUM_BUCKETS; ++i) {
+            impls[i] = std::move(rhs.impls[i]);
+        }
+        return *this;
+    }
+
+    class iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        iterator(Self* container_, size_t bucket_, typename Impl::iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        iterator() = default;
+
+        bool operator==(const iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const iterator& rhs) const { return !(*this == rhs); }
+
+        iterator& operator++() {
+            ++current_it;
+            if (current_it == container->impls[bucket].end()) {
+                ++bucket;
+                current_it = container->beginOfNextNonEmptyBucket(bucket);
+            }
+
+            return *this;
+        }
+
+        Cell& operator*() const { return *current_it; }
+        Cell* operator->() const { return current_it.get_ptr(); }
+
+        Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    class const_iterator /// NOLINT
+    {
+        Self* container {};
+        size_t bucket {};
+        typename Impl::const_iterator current_it {};
+
+        friend class PartitionedHashTable;
+
+        const_iterator(Self* container_, size_t bucket_, typename Impl::const_iterator current_it_)
+                : container(container_), bucket(bucket_), current_it(current_it_) {}
+
+    public:
+        const_iterator() = default;
+        const_iterator(const iterator& rhs)
+                : container(rhs.container),
+                  bucket(rhs.bucket),
+                  current_it(rhs.current_it) {} /// NOLINT
+
+        bool operator==(const const_iterator& rhs) const {
+            return bucket == rhs.bucket && current_it == rhs.current_it;
+        }
+        bool operator!=(const const_iterator& rhs) const { return !(*this == rhs); }
+
+        const_iterator& operator++() {
+            ++current_it;
+            if (current_it == container->impls[bucket].end()) {
+                ++bucket;
+                current_it = container->beginOfNextNonEmptyBucket(bucket);
+            }
+
+            return *this;
+        }
+
+        const Cell& operator*() const { return *current_it; }
+        const Cell* operator->() const { return current_it->get_ptr(); }
+
+        const Cell* get_ptr() const { return current_it.get_ptr(); }
+        size_t get_hash() const { return current_it.get_hash(); }
+    };
+
+    const_iterator begin() const {
+        size_t buck = 0;
+        typename Impl::const_iterator impl_it = beginOfNextNonEmptyBucket(buck);

Review Comment:
   beginOfNextNonEmptyBucket  to  begin_of_next_non_empty_bucket



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org