You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kvrocks.apache.org by "PragmaTwice (via GitHub)" <gi...@apache.org> on 2023/09/18 02:02:14 UTC

[GitHub] [kvrocks] PragmaTwice opened a new pull request, #1764: Make BlockSplitBloomFilter a view

PragmaTwice opened a new pull request, #1764:
URL: https://github.com/apache/kvrocks/pull/1764

   Besides, we supplement licenses and introduce span-lite to use std::span in c++20.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] PragmaTwice commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "PragmaTwice (via GitHub)" <gi...@apache.org>.
PragmaTwice commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328247671


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
+/// which may lead to segfault when performing SIMD instructions. It is the caller's
+/// responsibility to free the bitset passed in.

Review Comment:
   the bitset can be changed via `InsertHash`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] mapleFU commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328252580


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
+/// which may lead to segfault when performing SIMD instructions. It is the caller's
+/// responsibility to free the bitset passed in.

Review Comment:
   Yeah, I mean can we provide something like:
   
   ```
   StatusOr<unique_ptr<BlockSplitBloomFilter>> CreateBlockSplitBloomFilter(uint8_t* bitset, uint32_t num_bytes);
   StatusOr<unique_ptr<BlockSplitBloomFilter>> CreateBlockSplitBloomFilter(std::string& bitset);
   StatusOr<unique_ptr<const BlockSplitBloomFilter>> CreateBlockSplitBloomFilter(const uint8_t* bitset, uint32_t num_bytes);
   StatusOr<unique_ptr<const BlockSplitBloomFilter>> CreateBlockSplitBloomFilter(const std::string& bitset);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] PragmaTwice commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "PragmaTwice (via GitHub)" <gi...@apache.org>.
PragmaTwice commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328207560


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset because the given bitset may not satisfy the 32-byte alignment requirement

Review Comment:
   it is just a view, does not own anything. If you want to a owned filter, you can have a std::tuple<BloomFilter, std::string>.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] zncleon commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "zncleon (via GitHub)" <gi...@apache.org>.
zncleon commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328230735


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
+/// which may lead to segfault when performing SIMD instructions. It is the caller's
+/// responsibility to free the bitset passed in.

Review Comment:
   Maybe we should modify some comments like this?
   ```suggestion
   /// responsibility to ensure the bitset would not be changed when using the bloomfilter.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] mapleFU commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328206140


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset because the given bitset may not satisfy the 32-byte alignment requirement

Review Comment:
   ```suggestion
   /// bitset when the given bitset may not satisfy the 32-byte alignment requirement
   ```
   
   Would this be better?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] PragmaTwice commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "PragmaTwice (via GitHub)" <gi...@apache.org>.
PragmaTwice commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328251117


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
+/// which may lead to segfault when performing SIMD instructions. It is the caller's
+/// responsibility to free the bitset passed in.

Review Comment:
   it seems not related to this function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] PragmaTwice commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "PragmaTwice (via GitHub)" <gi...@apache.org>.
PragmaTwice commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328252657


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
+/// which may lead to segfault when performing SIMD instructions. It is the caller's
+/// responsibility to free the bitset passed in.

Review Comment:
   if a readonly version is needed, you can have a `const BlockSplitBloomFilter`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] PragmaTwice commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "PragmaTwice (via GitHub)" <gi...@apache.org>.
PragmaTwice commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328263966


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
+/// which may lead to segfault when performing SIMD instructions. It is the caller's
+/// responsibility to free the bitset passed in.

Review Comment:
   Got your point. I think you can add it when you need it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] PragmaTwice merged pull request #1764: Make BlockSplitBloomFilter a view

Posted by "PragmaTwice (via GitHub)" <gi...@apache.org>.
PragmaTwice merged PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] mapleFU commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328206694


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset because the given bitset may not satisfy the 32-byte alignment requirement

Review Comment:
   Oh, seems this bloom never copy, and when copy it's happening, it doesn't have owned-`std::string` as return value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kvrocks] mapleFU commented on a diff in pull request #1764: Make BlockSplitBloomFilter a view

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #1764:
URL: https://github.com/apache/kvrocks/pull/1764#discussion_r1328250047


##########
src/types/bloom_filter.h:
##########
@@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
 // This value will be reconsidered when implementing Bloom filter producer.
 static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
 
+/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
+static constexpr uint32_t kMinimumBloomFilterBytes = 32;
+
+class BlockSplitBloomFilter;
+
+using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;
+
+/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
+/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
+/// rounded up/down to lower/upper bound if num_bytes is out of range and also
+/// will be rounded up to a power of 2.
+///
+/// @param num_bytes The number of bytes to store Bloom filter bitset.
+OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);
+
+/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
+/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
+/// which may lead to segfault when performing SIMD instructions. It is the caller's
+/// responsibility to free the bitset passed in.

Review Comment:
   I guess @zncleon means the mutability. Sometime we may need `const` ctor



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org