You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2021/03/26 07:10:58 UTC

[orc] branch master updated: MINOR: Update ORCv1.md: Fix documentation about bitset in bloom filter section (#670)

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/master by this push:
     new d7814a8  MINOR: Update ORCv1.md: Fix documentation about bitset in bloom filter section (#670)
d7814a8 is described below

commit d7814a8a2d1600fc3856076eaf0617f7456e3af8
Author: yan.zhang <di...@gmail.com>
AuthorDate: Fri Mar 26 15:10:48 2021 +0800

    MINOR: Update ORCv1.md: Fix documentation about bitset in bloom filter section (#670)
    
    The corresponding code is https://github.com/apache/orc/blob/master/c%2B%2B/src/BloomFilter.cc#L47.
    
    The code is correct, but the documentation is not. The code is to use `index % 64` which has the same effect as `index & 0x3f`.
    ```
      constexpr uint64_t BITS_OF_LONG = 64;
      constexpr uint8_t  SHIFT_6_BITS = 6;
      void BitSet::set(uint64_t index) {
        mData[index >> SHIFT_6_BITS] |= (1ULL << (index % BITS_OF_LONG));
      }
    ```
    
    The fixes on documentation are
    1. There is no need to use `>>>`. We have already flip `combinedHash` if it's negative and `position`  sign bit will be always zero.
    2. To set bit we have to mask `position` with 0x3f to keep the least 6 bits.
---
 site/specification/ORCv1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index d4cdbc5..3e7165e 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -1297,7 +1297,7 @@ in a bloom filter is as follows:
   * position = combinedHash % m
 6. Set the position in bit set. The LSB 6 bits identifies the long index
    within bitset and bit position within the long uses little endian order.
-  * bitset[position >>> 6] \|= (1L << position);
+  * bitset[position >> 6] \|= (1L << (position % 64));
 
 Bloom filter streams are interlaced with row group indexes. This placement
 makes it convenient to read the bloom filter stream and row index stream