You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2021/03/26 07:10:58 UTC
[orc] branch master updated: MINOR: Update ORCv1.md: Fix
documentation about bitset in bloom filter section (#670)
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/master by this push:
new d7814a8 MINOR: Update ORCv1.md: Fix documentation about bitset in bloom filter section (#670)
d7814a8 is described below
commit d7814a8a2d1600fc3856076eaf0617f7456e3af8
Author: yan.zhang <di...@gmail.com>
AuthorDate: Fri Mar 26 15:10:48 2021 +0800
MINOR: Update ORCv1.md: Fix documentation about bitset in bloom filter section (#670)
The corresponding code is https://github.com/apache/orc/blob/master/c%2B%2B/src/BloomFilter.cc#L47.
The code is correct, but the documentation is not. The code is to use `index % 64` which has the same effect as `index & 0x3f`.
```
constexpr uint64_t BITS_OF_LONG = 64;
constexpr uint8_t SHIFT_6_BITS = 6;
void BitSet::set(uint64_t index) {
mData[index >> SHIFT_6_BITS] |= (1ULL << (index % BITS_OF_LONG));
}
```
The fixes on documentation are
1. There is no need to use `>>>`. We have already flip `combinedHash` if it's negative and `position` sign bit will be always zero.
2. To set bit we have to mask `position` with 0x3f to keep the least 6 bits.
---
site/specification/ORCv1.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index d4cdbc5..3e7165e 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -1297,7 +1297,7 @@ in a bloom filter is as follows:
* position = combinedHash % m
6. Set the position in bit set. The LSB 6 bits identifies the long index
within bitset and bit position within the long uses little endian order.
- * bitset[position >>> 6] \|= (1L << position);
+ * bitset[position >> 6] \|= (1L << (position % 64));
Bloom filter streams are interlaced with row group indexes. This placement
makes it convenient to read the bloom filter stream and row index stream