You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/05/18 22:59:37 UTC

[GitHub] [incubator-pinot] siddharthteotia opened a new pull request #5409: Faster bit unpacking (Part 1)

siddharthteotia opened a new pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409


   Couple of improvements have been done for bit-unpacking.
   
   - Use hand-written unpack methods for power of 2 (1, 2, 4, 8, 16, 32) number of bits used to encode the dictionaryId. The hand-written methods are faster than generic due to simplified bit math.
   - Amortize the overhead of function calls.
   
   Right now, the new code isn't yet wired into existing bit reader and writer. Couple of follow-ups will be coming soon:
   
   - Evaluate this optimization for non power of 2 number of bits. It is fairly possible but the performance benefit of using a special hand-written function for unpacking seems to get lost as the bit math itself gets complicated with branches for non power of 2 number of bits.
   - Consider using a new format where if the number of bits to encode is non power of 2, we convert it to nearest power of 2. This means if you need more than 16 bits, we use 32 bits (raw value). We get diminished returns as the overhead of unpacking itself increases at the cost of saving 10-12 bits. 
   - Integrate the new changes with existing code. 
   
   **Description of changes:**
   
   A new version of FixedBitIntReaderWriter is written that underneath uses a new version of fast bit unpack reader PinotDataBitSetV2.
   
   There are 3 important APIs here:
   
   `public int readInt(int index)` 
   Exists in the current code as well - Used by the scan operator to read through the forward index and dictId for each docId
   
   `public void readInt(int startDocId, int length, int[] buffer)`
   Exists in the current code as well - Used by the multi-value bit reader to get dictId for all MVs in a given cell. 
   
   `public void readValues(int[] docIds, int docIdStartIndex, int docIdLength, int[] values, int valuesStartIndex)`
   Exists at the FixedBitSingleColumnSingleValueReader interface and used by the dictionary based group by executor to get dictIds for a set of docIds (monotonically increasing but not necessarily contiguous). But the API still issued single read calls underneath. This PR introduces this API at the FixedBitIntReaderWriterV2 level so that group by executor can leverage it using the bulk read semantics.  
   
   When this code is wired in, the scan operator will start using one of the second or third API.
   
   Please see the [spreadsheet ](https://docs.google.com/spreadsheets/d/1mz_TQe0rXadWPtA_Xov6cXwYrSvUpQB1p1b_ZqROTDQ/edit?usp=sharing)for performance numbers.
   
   Two kinds of tests were done:
   
   - Compare the performance of sequential consecutive reads using single read API `getInt(index)` with faster bit unpacking code.
   - Compare the performance of sequential consecutive reads using array API `readInt(int startDocId, int length, int[] buffer)` with faster bit unpacking code. 
   
   Will be adding some units tests. The current PR has performance test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-635650062


   Here are the JMH results: cc @kishoreg  @Jackie-Jiang 
   
   
   
   
   Benchmark | Score | Mode
   -- | -- | --
   BenchmarkPinotDataBitSet.twoBitBulkWithGaps | 28.537   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.twoBitBulkWithGapsFast | 44.332   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGaps | 28.262   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGapsFast | 42.449   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGaps | 26.816   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGapsFast | 38.777   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGaps | 21.095   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGapsFast | 38.380   ops/ms | Throughpput
     |   |  
   BenchmarkPinotDataBitSet.twoBitContiguous | 26.188 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitContiguousFast | 15.692 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguous | 26.113 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguousFast | 15.178 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguous | 26.693 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguousFast | 15.726 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguous | 43.968 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguousFast | 21.390 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguous | 32.504 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousFast | 14.614 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguous | 30.794 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousFast | 14.583 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguous | 16.525 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousFast | 10.777 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguous | 54.731 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousFast | 19.312 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnaligned | 32.018 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnalignedFast | 14.344 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnaligned | 21.204   ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnalignedFast | 15.393   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnaligned | 20.125   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnalignedFast | 14.836   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnaligned | 58.086   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnalignedFast | 22.002   ms/op | Avgt
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-635647360


   Address the review comments. Please take another look. This is standalone code at this point so want to get this in sooner and put up follow-ups in next couple of days.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

mayankshriv commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r426967646



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {
+    Preconditions
+        .checkState(dataBuffer.size() == (int) (((long) numValues * numBitsPerValue + Byte.SIZE - 1) / Byte.SIZE));
+    _dataBitSet = PinotDataBitSetV2.createBitSet(dataBuffer, numBitsPerValue);
+    _numBitsPerValue = numBitsPerValue;
+  }
+
+  /**
+   * Read dictionaryId for a particular docId
+   * @param index docId to get the dictionaryId for
+   * @return dictionaryId
+   */
+  public int readInt(int index) {
+    return _dataBitSet.readInt(index);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for a contiguous
+   * range of docIds starting at startDocId for a given length
+   * @param startDocId docId range start
+   * @param length length of contiguous docId range
+   * @param buffer out buffer to read dictionaryIds into
+   */
+  public void readInt(int startDocId, int length, int[] buffer) {
+    _dataBitSet.readInt(startDocId, length, buffer);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for an array of docIds
+   * which are monotonically increasing but not necessarily contiguous
+   * @param docIds array of docIds to read the dictionaryIds for
+   * @param docIdStartIndex start index in docIds array
+   * @param docIdLength length to process in docIds array
+   * @param values out array to store the dictionaryIds into
+   * @param valuesStartIndex start index in values array
+   */
+  public void readValues(int[] docIds, int docIdStartIndex, int docIdLength, int[] values, int valuesStartIndex) {
+    int docIdEndIndex = docIdStartIndex + docIdLength - 1;
+    if (shouldBulkRead(docIds, docIdStartIndex, docIdEndIndex)) {
+      _dataBitSet.readInt(docIds, docIdStartIndex, docIdLength, values, valuesStartIndex);
+    } else {
+      for (int i = docIdStartIndex; i <= docIdEndIndex; i++) {
+        values[valuesStartIndex++] = _dataBitSet.readInt(docIds[i]);
+      }
+    }
+  }
+
+  private boolean shouldBulkRead(int[] docIds, int startIndex, int endIndex) {
+    int numDocsToRead = endIndex - startIndex + 1;
+    int docIdRange = docIds[endIndex] - docIds[startIndex] + 1;
+    if (docIdRange > DocIdSetPlanNode.MAX_DOC_PER_CALL) {
+      return false;
+    }
+    return numDocsToRead >= ((double)docIdRange * 0.7);

Review comment:
       Reasoning for magic number?

##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/ForwardIndexBenchmark.java
##########
@@ -0,0 +1,269 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import com.google.common.base.Stopwatch;
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.FileReader;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Random;
+import java.util.concurrent.TimeUnit;
+import me.lemire.integercompression.BitPacking;
+import org.apache.commons.math.util.MathUtils;
+import org.apache.pinot.core.io.reader.impl.v1.FixedBitSingleValueReader;
+import org.apache.pinot.core.io.util.FixedBitIntReaderWriter;
+import org.apache.pinot.core.io.util.FixedBitIntReaderWriterV2;
+import org.apache.pinot.core.io.util.FixedByteValueReaderWriter;
+import org.apache.pinot.core.io.util.PinotDataBitSet;
+import org.apache.pinot.core.io.writer.impl.v1.FixedBitSingleValueWriter;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+public class ForwardIndexBenchmark {

Review comment:
       Would be good to use JMH to make benchmark more accurate.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {

Review comment:
       Consider writing a header for backward/forward compatibility.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,418 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {

Review comment:
       This class needs exhaustive unit tests to ensure all cases are covered.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-635650062


   Here are the JMH results: cc @kishoreg  @Jackie-Jiang 
   
   Please see the [spreadsheet](https://docs.google.com/spreadsheets/d/1mz_TQe0rXadWPtA_Xov6cXwYrSvUpQB1p1b_ZqROTDQ/edit#gid=497160959) for additional performance numbers obtained via manual instrumentation
   
   
   
   
   Benchmark | Score | Mode
   -- | -- | --
   BenchmarkPinotDataBitSet.twoBitBulkWithGaps | 28.537   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.twoBitBulkWithGapsFast | 44.332   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGaps | 28.262   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGapsFast | 42.449   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGaps | 26.816   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGapsFast | 38.777   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGaps | 21.095   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGapsFast | 38.380   ops/ms | Throughpput
     |   |  
   BenchmarkPinotDataBitSet.twoBitContiguous | 26.188 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitContiguousFast | 15.692 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguous | 26.113 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguousFast | 15.178 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguous | 26.693 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguousFast | 15.726 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguous | 43.968 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguousFast | 21.390 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguous | 32.504 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousFast | 14.614 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguous | 30.794 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousFast | 14.583 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguous | 16.525 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousFast | 10.777 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguous | 54.731 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousFast | 19.312 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnaligned | 32.018 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnalignedFast | 14.344 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnaligned | 21.204   ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnalignedFast | 15.393   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnaligned | 20.125   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnalignedFast | 14.836   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnaligned | 58.086   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnalignedFast | 22.002   ms/op | Avgt
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r429077026



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,418 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {

Review comment:
       Added several covering all possible cases. Will do another round in a follow-up

##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/ForwardIndexBenchmark.java
##########
@@ -0,0 +1,269 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import com.google.common.base.Stopwatch;
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.FileReader;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Random;
+import java.util.concurrent.TimeUnit;
+import me.lemire.integercompression.BitPacking;
+import org.apache.commons.math.util.MathUtils;
+import org.apache.pinot.core.io.reader.impl.v1.FixedBitSingleValueReader;
+import org.apache.pinot.core.io.util.FixedBitIntReaderWriter;
+import org.apache.pinot.core.io.util.FixedBitIntReaderWriterV2;
+import org.apache.pinot.core.io.util.FixedByteValueReaderWriter;
+import org.apache.pinot.core.io.util.PinotDataBitSet;
+import org.apache.pinot.core.io.writer.impl.v1.FixedBitSingleValueWriter;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+public class ForwardIndexBenchmark {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-634394494


   > For the benchmark, you should also compare the worst case scenario such as 3, 5, 9, 17 bits
   
   I am still working on adding faster methods for non power of 2. Follow up will address this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r429084860



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {
+    Preconditions
+        .checkState(dataBuffer.size() == (int) (((long) numValues * numBitsPerValue + Byte.SIZE - 1) / Byte.SIZE));
+    _dataBitSet = PinotDataBitSetV2.createBitSet(dataBuffer, numBitsPerValue);
+    _numBitsPerValue = numBitsPerValue;
+  }
+
+  /**
+   * Read dictionaryId for a particular docId
+   * @param index docId to get the dictionaryId for
+   * @return dictionaryId
+   */
+  public int readInt(int index) {
+    return _dataBitSet.readInt(index);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for a contiguous
+   * range of docIds starting at startDocId for a given length
+   * @param startDocId docId range start
+   * @param length length of contiguous docId range
+   * @param buffer out buffer to read dictionaryIds into
+   */
+  public void readInt(int startDocId, int length, int[] buffer) {
+    _dataBitSet.readInt(startDocId, length, buffer);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for an array of docIds
+   * which are monotonically increasing but not necessarily contiguous
+   * @param docIds array of docIds to read the dictionaryIds for
+   * @param docIdStartIndex start index in docIds array
+   * @param docIdLength length to process in docIds array
+   * @param values out array to store the dictionaryIds into
+   * @param valuesStartIndex start index in values array
+   */
+  public void readValues(int[] docIds, int docIdStartIndex, int docIdLength, int[] values, int valuesStartIndex) {
+    int docIdEndIndex = docIdStartIndex + docIdLength - 1;
+    if (shouldBulkRead(docIds, docIdStartIndex, docIdEndIndex)) {
+      _dataBitSet.readInt(docIds, docIdStartIndex, docIdLength, values, valuesStartIndex);
+    } else {
+      for (int i = docIdStartIndex; i <= docIdEndIndex; i++) {
+        values[valuesStartIndex++] = _dataBitSet.readInt(docIds[i]);
+      }
+    }
+  }
+
+  private boolean shouldBulkRead(int[] docIds, int startIndex, int endIndex) {
+    int numDocsToRead = endIndex - startIndex + 1;
+    int docIdRange = docIds[endIndex] - docIds[startIndex] + 1;
+    if (docIdRange > DocIdSetPlanNode.MAX_DOC_PER_CALL) {
+      return false;
+    }
+    return numDocsToRead >= ((double)docIdRange * 0.7);

Review comment:
       I have provided details in javadoc. Let me know if that is helpful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r430592580



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,151 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;

Review comment:
       You don't need to make this volatile and set it to null in close(). This issue has been addressed in #4764

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient

Review comment:
       Don't limit it to `dictId` and `docId` in the javadoc. The `BitSet` is a general reader/writer which can be used for different purposes.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {
+    int startDocId = docIds[docIdsStartIndex];
+    int endDocId = docIds[docIdsStartIndex + length - 1];
+    int[] dictIds = THREAD_LOCAL_DICT_IDS.get();
+    // do a contiguous bulk read
+    readInt(startDocId, endDocId - startDocId + 1, dictIds);
+    out[outpos] = dictIds[0];
+    // set the unpacked dictId correctly. this is needed since there could
+    // be gaps and some dictIds may have to be thrown/ignored.
+    for (int i = 1; i < length; i++) {
+      out[outpos + i] = dictIds[docIds[docIdsStartIndex + i] - startDocId];
+    }
+  }
+
+  public static PinotDataBitSetV2 createBitSet(PinotDataBuffer pinotDataBuffer, int numBitsPerValue) {
+    switch (numBitsPerValue) {
+      case 2:
+        return new Bit2Encoded(pinotDataBuffer, numBitsPerValue);
+      case 4:
+        return new Bit4Encoded(pinotDataBuffer, numBitsPerValue);
+      case 8:
+        return new Bit8Encoded(pinotDataBuffer, numBitsPerValue);
+      case 16:
+        return new Bit16Encoded(pinotDataBuffer, numBitsPerValue);
+      case 32:
+        return new RawInt(pinotDataBuffer, numBitsPerValue);
+      default:
+        throw new UnsupportedOperationException(numBitsPerValue + "not supported by PinotDataBitSetV2");
+    }
+  }
+
+  public static class Bit2Encoded extends PinotDataBitSetV2 {
+    Bit2Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);

Review comment:
       Use long to index the dataBuffer so that we can handle big buffer (> 2G)

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {

Review comment:
       For performance concern, we can remove the `docIdsStartIndex` and `outpos` and always assume they are 0

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {
+    int startDocId = docIds[docIdsStartIndex];
+    int endDocId = docIds[docIdsStartIndex + length - 1];
+    int[] dictIds = THREAD_LOCAL_DICT_IDS.get();
+    // do a contiguous bulk read
+    readInt(startDocId, endDocId - startDocId + 1, dictIds);
+    out[outpos] = dictIds[0];
+    // set the unpacked dictId correctly. this is needed since there could
+    // be gaps and some dictIds may have to be thrown/ignored.
+    for (int i = 1; i < length; i++) {
+      out[outpos + i] = dictIds[docIds[docIdsStartIndex + i] - startDocId];
+    }
+  }
+
+  public static PinotDataBitSetV2 createBitSet(PinotDataBuffer pinotDataBuffer, int numBitsPerValue) {
+    switch (numBitsPerValue) {

Review comment:
       We might also have 0 (all zeros) and 1 (0/1)

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {
+    int startDocId = docIds[docIdsStartIndex];
+    int endDocId = docIds[docIdsStartIndex + length - 1];
+    int[] dictIds = THREAD_LOCAL_DICT_IDS.get();
+    // do a contiguous bulk read
+    readInt(startDocId, endDocId - startDocId + 1, dictIds);

Review comment:
       This won't work because `docIds` might not be contiguous and `endDocId - startDocId + 1` could be much larger than `DocIdSetPlanNode.MAX_DOC_PER_CALL`. Also, we should not always do such bulk read, `docIds` can be very sparse.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {
+    int startDocId = docIds[docIdsStartIndex];
+    int endDocId = docIds[docIdsStartIndex + length - 1];
+    int[] dictIds = THREAD_LOCAL_DICT_IDS.get();
+    // do a contiguous bulk read
+    readInt(startDocId, endDocId - startDocId + 1, dictIds);
+    out[outpos] = dictIds[0];
+    // set the unpacked dictId correctly. this is needed since there could
+    // be gaps and some dictIds may have to be thrown/ignored.
+    for (int i = 1; i < length; i++) {
+      out[outpos + i] = dictIds[docIds[docIdsStartIndex + i] - startDocId];
+    }
+  }
+
+  public static PinotDataBitSetV2 createBitSet(PinotDataBuffer pinotDataBuffer, int numBitsPerValue) {
+    switch (numBitsPerValue) {
+      case 2:
+        return new Bit2Encoded(pinotDataBuffer, numBitsPerValue);
+      case 4:
+        return new Bit4Encoded(pinotDataBuffer, numBitsPerValue);
+      case 8:
+        return new Bit8Encoded(pinotDataBuffer, numBitsPerValue);
+      case 16:
+        return new Bit16Encoded(pinotDataBuffer, numBitsPerValue);
+      case 32:
+        return new RawInt(pinotDataBuffer, numBitsPerValue);
+      default:
+        throw new UnsupportedOperationException(numBitsPerValue + "not supported by PinotDataBitSetV2");
+    }
+  }
+
+  public static class Bit2Encoded extends PinotDataBitSetV2 {
+    Bit2Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int val = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+      bitOffset = bitOffset & 7;
+      return  (val >>> (6 - bitOffset)) & 3;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      bitOffset = bitOffset & 7;
+      int packed = 0;
+      int i = 0;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [1 byte] - read from either the 2nd/4th/6th bit to 7th bit to unpack 1/2/3 integers
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 16 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 8 integers
+       * [1 byte] - read the byte to unpack 4 integers
+       * [1 byte] - unpack 1/2/3 integers from first 2/4/6 bits
+       */
+
+      // unaligned read within a byte
+      if (bitOffset != 0) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        if (bitOffset == 2) {
+          // unpack 3 integers from bits 2-7
+          out[0] = (packed >>> 4) & 3;
+          out[1] = (packed >>> 2) & 3;
+          out[2] = packed & 3;
+          i = 3;
+          length -= 3;
+        }
+        else if (bitOffset == 4) {
+          // unpack 2 integers from bits 4 to 7
+          out[0] = (packed >>> 2) & 3;
+          out[1] = packed & 3;
+          i = 2;
+          length -= 2;
+        } else {
+          // unpack integer from bits 6 to 7
+          out[0] = packed & 3;
+          i = 1;
+          length -= 1;
+        }
+        byteOffset++;
+      }
+
+      // aligned reads at 4-byte boundary to unpack 16 integers
+      while (length >= 16) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 30;
+        out[i + 1] = (packed >>> 28) & 3;
+        out[i + 2] = (packed >>> 26) & 3;
+        out[i + 3] = (packed >>> 24) & 3;
+        out[i + 4] = (packed >>> 22) & 3;
+        out[i + 5] = (packed >>> 20) & 3;
+        out[i + 6] = (packed >>> 18) & 3;
+        out[i + 7] = (packed >>> 16) & 3;
+        out[i + 8] = (packed >>> 14) & 3;
+        out[i + 9] = (packed >>> 12) & 3;
+        out[i + 10] = (packed >>> 10) & 3;
+        out[i + 11] = (packed >>> 8) & 3;
+        out[i + 12] = (packed >>> 6) & 3;
+        out[i + 13] = (packed >>> 4) & 3;
+        out[i + 14] = (packed >>> 2) & 3;
+        out[i + 15] = packed & 3;
+        length -= 16;
+        byteOffset += 4;
+        i += 16;
+      }
+
+      if (length >= 8) {
+        packed = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        out[i] = (packed >>> 14) & 3;
+        out[i + 1] = (packed >>> 12) & 3;
+        out[i + 2] = (packed >>> 10) & 3;
+        out[i + 3] = (packed >>> 8) & 3;
+        out[i + 4] = (packed >>> 6) & 3;
+        out[i + 5] = (packed >>> 4) & 3;
+        out[i + 6] = (packed >>> 2) & 3;
+        out[i + 7] = packed & 3;
+        length -= 8;
+        byteOffset += 2;
+        i += 8;
+      }
+
+      // aligned read at byte boundary to unpack 4 integers
+      if (length >= 4) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 6;
+        out[i + 1] = (packed >>> 4) & 3;
+        out[i + 2] = (packed >>> 2) & 3;
+        out[i + 3] = packed & 3;
+        length -= 4;
+        byteOffset++;
+        i += 4;
+      }
+
+      // handle spill-over
+
+      if (length > 0) {
+        // unpack from bits 0-1
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 6;
+        length--;
+      }
+
+      if (length > 0) {
+        // unpack from bits 2-3
+        out[i + 1] = (packed >>> 4) & 3;
+        length--;
+      }
+
+      if (length > 0) {
+        // unpack from bits 4-5
+        out[i + 2] = (packed >>> 2) & 3;
+        length--;
+      }
+    }
+  }
+
+  public static class Bit4Encoded extends PinotDataBitSetV2 {
+    Bit4Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int val = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+      bitOffset = bitOffset & 7;
+      return (bitOffset == 0) ? val >>> 4 : val & 0xf;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      bitOffset = bitOffset & 7;
+      int packed = 0;
+      int i = 0;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [1 byte] - read from the 4th bit to 7th bit to unpack 1 integer
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 8 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 4 integers
+       * [1 byte] - unpack 1 integer from first 4 bits
+       */
+
+      // unaligned read within a byte from bits 4-7
+      if (bitOffset != 0) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[0] = packed & 0xf;
+        i = 1;
+        byteOffset++;
+        length--;
+      }
+
+      // aligned read at 4-byte boundary to unpack 8 integers
+      while (length >= 8) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 28;
+        out[i + 1] = (packed >>> 24) & 0xf;
+        out[i + 2] = (packed >>> 20) & 0xf;
+        out[i + 3] = (packed >>> 16) & 0xf;
+        out[i + 4] = (packed >>> 12) & 0xf;
+        out[i + 5] = (packed >>> 8) & 0xf;
+        out[i + 6] = (packed >>> 4) & 0xf;
+        out[i + 7] = packed & 0xf;
+        length -= 8;
+        i += 8;
+        byteOffset += 4;
+      }
+
+      // aligned read at 2-byte boundary to unpack 4 integers
+      if (length >= 4) {
+        packed = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        out[i] = (packed >>> 12) & 0xf;
+        out[i + 1] = (packed >>> 8) & 0xf;
+        out[i + 2] = (packed >>> 4) & 0xf;
+        out[i + 3] = packed & 0xf;
+        length -= 4;
+        i += 4;
+        byteOffset += 2;
+      }
+
+      // aligned read at byte boundary to unpack 2 integers
+      if (length >= 2) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 4;
+        out[i + 1] = packed & 0xf;
+        length -= 2;
+        i += 2;
+        byteOffset++;
+      }
+
+      // handle spill over -- unpack from bits 0-3
+      if (length > 0) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 4;
+        length--;
+      }
+    }
+  }
+
+  public static class Bit8Encoded extends PinotDataBitSetV2 {
+    Bit8Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      return ((int)_dataBuffer.getByte(byteOffset)) & 0xff;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int i = 0;
+      int packed = 0;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 4 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 4 integers
+       * [1 byte] - unpack 1 integer from first 4 bits
+       */
+
+      // aligned read at 4-byte boundary to unpack 4 integers
+      while (length >= 4) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 24;
+        out[i + 1] = (packed >>> 16) & 0xff;
+        out[i + 2] = (packed >>> 8) & 0xff;
+        out[i + 3] = packed & 0xff;
+        length -= 4;
+        byteOffset += 4;
+        i += 4;
+      }
+
+      // aligned read at 2-byte boundary to unpack 2 integers
+      if (length >= 2) {
+        packed = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        out[i] = (packed >>> 8) & 0xff;
+        out[i + 1] = packed & 0xff;
+        length -= 2;
+        byteOffset += 2;
+        i += 2;
+      }
+
+      // handle spill over at byte boundary to unpack 1 integer
+      if (length > 0) {
+        out[i] = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        length--;
+      }
+    }
+  }
+
+  public static class Bit16Encoded extends PinotDataBitSetV2 {
+    Bit16Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      return ((int)_dataBuffer.getShort(byteOffset)) & 0xffff;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int i = 0;
+      int packed;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 2 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 1 integer
+       */
+
+      // aligned reads at 4-byte boundary to unpack 2 integers
+      while (length >= 2) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 16;
+        out[i + 1] = packed & 0xffff;
+        length -= 2;
+        i += 2;
+        byteOffset += 4;
+      }
+
+      // handle spill over at 2-byte boundary to unpack 1 integer
+      if (length > 0) {
+        out[i] = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        length--;
+      }
+    }
+  }
+
+  public static class RawInt extends PinotDataBitSetV2 {
+    RawInt(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      return _dataBuffer.getInt(index * Integer.BYTES);
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      int byteOffset = startIndex * Integer.BYTES;
+      for (int i = 0; i < length; i++) {
+        out[i] = _dataBuffer.getInt(byteOffset);
+        byteOffset += 4;
+      }
+    }
+  }
+
+  protected void writeInt(int index, int value) {
+    long bitOffset = (long) index * _numBitsPerValue;
+    int byteOffset = (int) (bitOffset / Byte.SIZE);
+    int bitOffsetInFirstByte = (int) (bitOffset % Byte.SIZE);
+
+    int firstByte = _dataBuffer.getByte(byteOffset);
+
+    int firstByteMask = BYTE_MASK >>> bitOffsetInFirstByte;
+    int numBitsLeft = _numBitsPerValue - (Byte.SIZE - bitOffsetInFirstByte);
+    if (numBitsLeft <= 0) {
+      // The value is inside the first byte
+      firstByteMask &= BYTE_MASK << -numBitsLeft;
+      _dataBuffer.putByte(byteOffset, (byte) ((firstByte & ~firstByteMask) | (value << -numBitsLeft)));
+    } else {
+      // The value is in multiple bytes
+      _dataBuffer
+          .putByte(byteOffset, (byte) ((firstByte & ~firstByteMask) | ((value >>> numBitsLeft) & firstByteMask)));
+      while (numBitsLeft > Byte.SIZE) {
+        numBitsLeft -= Byte.SIZE;
+        _dataBuffer.putByte(++byteOffset, (byte) (value >> numBitsLeft));
+      }
+      int lastByte = _dataBuffer.getByte(++byteOffset);
+      _dataBuffer.putByte(byteOffset,
+          (byte) ((lastByte & (BYTE_MASK >>> numBitsLeft)) | (value << (Byte.SIZE - numBitsLeft))));
+    }
+  }
+
+  public void writeInt(int startIndex, int length, int[] values) {
+    long startBitOffset = (long) startIndex * _numBitsPerValue;
+    int byteOffset = (int) (startBitOffset / Byte.SIZE);
+    int bitOffsetInFirstByte = (int) (startBitOffset % Byte.SIZE);
+
+    int firstByte = _dataBuffer.getByte(byteOffset);
+
+    for (int i = 0; i < length; i++) {
+      int value = values[i];
+      if (bitOffsetInFirstByte == Byte.SIZE) {
+        bitOffsetInFirstByte = 0;
+        firstByte = _dataBuffer.getByte(++byteOffset);
+      }
+      int firstByteMask = BYTE_MASK >>> bitOffsetInFirstByte;
+      int numBitsLeft = _numBitsPerValue - (Byte.SIZE - bitOffsetInFirstByte);
+      if (numBitsLeft <= 0) {
+        // The value is inside the first byte
+        firstByteMask &= BYTE_MASK << -numBitsLeft;
+        firstByte = ((firstByte & ~firstByteMask) | (value << -numBitsLeft));
+        _dataBuffer.putByte(byteOffset, (byte) firstByte);
+        bitOffsetInFirstByte = Byte.SIZE + numBitsLeft;
+      } else {
+        // The value is in multiple bytes
+        _dataBuffer
+            .putByte(byteOffset, (byte) ((firstByte & ~firstByteMask) | ((value >>> numBitsLeft) & firstByteMask)));
+        while (numBitsLeft > Byte.SIZE) {
+          numBitsLeft -= Byte.SIZE;
+          _dataBuffer.putByte(++byteOffset, (byte) (value >> numBitsLeft));
+        }
+        int lastByte = _dataBuffer.getByte(++byteOffset);
+        firstByte = (lastByte & (0xFF >>> numBitsLeft)) | (value << (Byte.SIZE - numBitsLeft));
+        _dataBuffer.putByte(byteOffset, (byte) firstByte);
+        bitOffsetInFirstByte = numBitsLeft;
+      }
+    }
+  }
+
+  @Override
+  public void close()
+      throws IOException {
+    _dataBuffer.close();

Review comment:
       Do not close the buffer (see #5400)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r430205187



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {
+    Preconditions
+        .checkState(dataBuffer.size() == (int) (((long) numValues * numBitsPerValue + Byte.SIZE - 1) / Byte.SIZE));
+    _dataBitSet = PinotDataBitSetV2.createBitSet(dataBuffer, numBitsPerValue);
+    _numBitsPerValue = numBitsPerValue;
+  }
+
+  /**
+   * Read dictionaryId for a particular docId
+   * @param index docId to get the dictionaryId for
+   * @return dictionaryId
+   */
+  public int readInt(int index) {
+    return _dataBitSet.readInt(index);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for a contiguous
+   * range of docIds starting at startDocId for a given length
+   * @param startDocId docId range start
+   * @param length length of contiguous docId range
+   * @param buffer out buffer to read dictionaryIds into
+   */
+  public void readInt(int startDocId, int length, int[] buffer) {
+    _dataBitSet.readInt(startDocId, length, buffer);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for an array of docIds
+   * which are monotonically increasing but not necessarily contiguous
+   * @param docIds array of docIds to read the dictionaryIds for
+   * @param docIdStartIndex start index in docIds array
+   * @param docIdLength length to process in docIds array
+   * @param values out array to store the dictionaryIds into
+   * @param valuesStartIndex start index in values array
+   */
+  public void readValues(int[] docIds, int docIdStartIndex, int docIdLength, int[] values, int valuesStartIndex) {
+    int docIdEndIndex = docIdStartIndex + docIdLength - 1;
+    if (shouldBulkRead(docIds, docIdStartIndex, docIdEndIndex)) {
+      _dataBitSet.readInt(docIds, docIdStartIndex, docIdLength, values, valuesStartIndex);
+    } else {
+      for (int i = docIdStartIndex; i <= docIdEndIndex; i++) {
+        values[valuesStartIndex++] = _dataBitSet.readInt(docIds[i]);
+      }
+    }
+  }
+
+  private boolean shouldBulkRead(int[] docIds, int startIndex, int endIndex) {
+    int numDocsToRead = endIndex - startIndex + 1;
+    int docIdRange = docIds[endIndex] - docIds[startIndex] + 1;
+    if (docIdRange > DocIdSetPlanNode.MAX_DOC_PER_CALL) {

Review comment:
       I think this is coming from the previous commit. The latest version of the PE doesn't have this check.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,151 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {
+    int startDocId = docIds[docIdsStartIndex];
+    int endDocId = docIds[docIdsStartIndex + length - 1];
+    int[] dictIds = THREAD_LOCAL_DICT_IDS.get();
+    // do a contiguous bulk read
+    readInt(startDocId, endDocId - startDocId + 1, dictIds);

Review comment:
       See the outer API in FixedBitIntReaderWriterV2 that calls this. That API tries to judge sparseness before deciding to do bulk read. The decision is made on a chunk of values at a time 

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {

Review comment:
       > For the benchmark, you should also compare the worst case scenario such as 3, 5, 9, 17 bits
   
   I am still working on adding faster methods for non power of 2. Follow up will address this.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {
+    int startDocId = docIds[docIdsStartIndex];
+    int endDocId = docIds[docIdsStartIndex + length - 1];
+    int[] dictIds = THREAD_LOCAL_DICT_IDS.get();
+    // do a contiguous bulk read
+    readInt(startDocId, endDocId - startDocId + 1, dictIds);
+    out[outpos] = dictIds[0];
+    // set the unpacked dictId correctly. this is needed since there could
+    // be gaps and some dictIds may have to be thrown/ignored.
+    for (int i = 1; i < length; i++) {
+      out[outpos + i] = dictIds[docIds[docIdsStartIndex + i] - startDocId];
+    }
+  }
+
+  public static PinotDataBitSetV2 createBitSet(PinotDataBuffer pinotDataBuffer, int numBitsPerValue) {
+    switch (numBitsPerValue) {
+      case 2:
+        return new Bit2Encoded(pinotDataBuffer, numBitsPerValue);
+      case 4:
+        return new Bit4Encoded(pinotDataBuffer, numBitsPerValue);
+      case 8:
+        return new Bit8Encoded(pinotDataBuffer, numBitsPerValue);
+      case 16:
+        return new Bit16Encoded(pinotDataBuffer, numBitsPerValue);
+      case 32:
+        return new RawInt(pinotDataBuffer, numBitsPerValue);
+      default:
+        throw new UnsupportedOperationException(numBitsPerValue + "not supported by PinotDataBitSetV2");
+    }
+  }
+
+  public static class Bit2Encoded extends PinotDataBitSetV2 {
+    Bit2Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {

Review comment:
       See the outer API in FixedBitIntReaderWriterV2 that calls this. That API tries to judge sparseness before deciding to do bulk read. The decision is made on a chunk of values at a time and this bulk API is called for each chunk.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r429077524



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {

Review comment:
       Yes, in the follow-up when this code is wired with reader and writer (FixedBitSingleValueReader and FixedBitSingleValueWriter) and the scan operator, I will consider if the format has to be changed and bump the version and write a header. Ri




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-635650062


   Here are the JMH results:
   
   
   `
   
   Benchmark | Score | Mode
   -- | -- | --
   BenchmarkPinotDataBitSet.twoBitBulkWithGaps | 28.537   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.twoBitBulkWithGapsFast | 44.332   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGaps | 28.262   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGapsFast | 42.449   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGaps | 26.816   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGapsFast | 38.777   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGaps | 21.095   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGapsFast | 38.380   ops/ms | Throughpput
     |   |  
   BenchmarkPinotDataBitSet.twoBitContiguous | 26.188 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitContiguousFast | 15.692 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguous | 26.113 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguousFast | 15.178 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguous | 26.693 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguousFast | 15.726 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguous | 43.968 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguousFast | 21.390 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguous | 32.504 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousFast | 14.614 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguous | 30.794 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousFast | 14.583 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguous | 16.525 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousFast | 10.777 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguous | 54.731 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousFast | 19.312 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnaligned | 32.018 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnalignedFast | 14.344 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnaligned | 21.204   ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnalignedFast | 15.393   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnaligned | 20.125   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnalignedFast | 14.836   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnaligned | 58.086   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnalignedFast | 22.002   ms/op | Avgt
   
   `
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-635647360


   Address the review comments. Please take another look. This is standalone code at this point so want to get this in sooner and put up follow-ups in next couple of days.
   
   Here are the JMH results:
   `Benchmark                                               	     Mode  Cnt   Score  Units
   BenchmarkPinotDataBitSet.twoBitBulkWithGaps            		  	thrpt    3  28.537   ops/ms
   BenchmarkPinotDataBitSet.twoBitBulkWithGapsFast        		  	thrpt    3  44.332   ops/ms
   BenchmarkPinotDataBitSet.fourBitBulkWithGaps           		  	thrpt    3  28.262   ops/ms
   BenchmarkPinotDataBitSet.fourBitBulkWithGapsFast       		  	thrpt    3  42.449   ops/ms
   BenchmarkPinotDataBitSet.eightBitBulkWithGaps          		  	thrpt    3  26.816   ops/ms
   BenchmarkPinotDataBitSet.eightBitBulkWithGapsFast      		  	thrpt    3  38.777   ops/ms
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGaps        		  	thrpt    3  21.095   ops/ms
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGapsFast    	 	  	thrpt    3  38.380   ops/ms
   
   BenchmarkPinotDataBitSet.twoBitContiguous               	  	avgt     3  26.188   ms/op
   BenchmarkPinotDataBitSet.twoBitContiguousFast           	  	avgt     3  15.692   ms/op
   BenchmarkPinotDataBitSet.fourBitContiguous              	  	avgt     3  26.113   ms/op
   BenchmarkPinotDataBitSet.fourBitContiguousFast          	  	avgt     3  15.178   ms/op
   BenchmarkPinotDataBitSet.eightBitContiguous             	  	avgt     3  26.693   ms/op
   BenchmarkPinotDataBitSet.eightBitContiguousFast          	  	avgt     3  15.726   ms/op
   BenchmarkPinotDataBitSet.sixteenBitContiguous           	  	avgt     3  43.968   ms/op
   BenchmarkPinotDataBitSet.sixteenBitContiguousFast       	  	avgt     3  21.390   ms/op
   
   BenchmarkPinotDataBitSet.twoBitBulkContiguous           	  	avgt     3  32.504   ms/op
   BenchmarkPinotDataBitSet.twoBitBulkContiguousFast       	  	avgt     3  14.614   ms/op
   BenchmarkPinotDataBitSet.fourBitBulkContiguous          	  	avgt     3  30.794   ms/op
   BenchmarkPinotDataBitSet.fourBitBulkContiguousFast      	  	avgt     3  14.583   ms/op
   BenchmarkPinotDataBitSet.eightBitBulkContiguous         	  	avgt     3  16.525   ms/op
   BenchmarkPinotDataBitSet.eightBitBulkContiguousFast     	  	avgt     3  10.777    ms/op
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguous       	  	avgt     3  54.731   ms/op
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousFast   	  	avgt     3  19.312   ms/op
   
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnaligned          avgt   	 2  32.018   ms/op
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnalignedFast    	avgt     2  14.344   ms/op
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnaligned        avgt     2  21.204   ms/op
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnalignedFast    avgt     2  15.393   ms/op
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnaligned         avgt     2  20.125   ms/op
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnalignedFast     avgt     2  14.836   ms/op
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnaligned      avgt     2  58.086   ms/op
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnalignedFast  avgt     2  22.002   ms/op`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-635650062


   Here are the JMH results: cc @kishoreg  @Jackie-Jiang 
   
   
   `
   
   Benchmark | Score | Mode
   -- | -- | --
   BenchmarkPinotDataBitSet.twoBitBulkWithGaps | 28.537   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.twoBitBulkWithGapsFast | 44.332   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGaps | 28.262   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGapsFast | 42.449   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGaps | 26.816   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGapsFast | 38.777   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGaps | 21.095   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGapsFast | 38.380   ops/ms | Throughpput
     |   |  
   BenchmarkPinotDataBitSet.twoBitContiguous | 26.188 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitContiguousFast | 15.692 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguous | 26.113 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguousFast | 15.178 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguous | 26.693 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguousFast | 15.726 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguous | 43.968 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguousFast | 21.390 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguous | 32.504 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousFast | 14.614 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguous | 30.794 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousFast | 14.583 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguous | 16.525 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousFast | 10.777 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguous | 54.731 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousFast | 19.312 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnaligned | 32.018 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnalignedFast | 14.344 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnaligned | 21.204   ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnalignedFast | 15.393   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnaligned | 20.125   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnalignedFast | 14.836   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnaligned | 58.086   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnalignedFast | 22.002   ms/op | Avgt
   
   `
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r432163975



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/PinotDataBitSetV2.java
##########
@@ -0,0 +1,516 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public abstract class PinotDataBitSetV2 implements Closeable {
+  private static final int BYTE_MASK = 0xFF;
+  static final int MAX_VALUES_UNPACKED_SINGLE_ALIGNED_READ = 16; // comes from 2-bit encoding
+
+  private static final ThreadLocal<int[]> THREAD_LOCAL_DICT_IDS =
+      ThreadLocal.withInitial(() -> new int[DocIdSetPlanNode.MAX_DOC_PER_CALL]);
+
+  protected PinotDataBuffer _dataBuffer;
+  protected int _numBitsPerValue;
+
+  /**
+   * Unpack single dictId at the given docId. This is efficient
+   * because of simplified bitmath.
+   * @param index docId
+   * @return unpacked dictId
+   */
+  public abstract int readInt(int index);
+
+  /**
+   * Unpack dictIds for a contiguous range of docIds represented by startIndex
+   * and length. This uses vectorization as much as possible for all the aligned
+   * reads and also takes care of the small byte-sized window of unaligned read.
+   * @param startIndex start docId
+   * @param length length
+   * @param out out array to store the unpacked dictIds
+   */
+  public abstract void readInt(int startIndex, int length, int[] out);
+
+  /**
+   * Unpack dictIds for an array of docIds[] which is not necessarily
+   * contiguous. So there could be gaps in the array:
+   * e.g: [1, 3, 7, 9, 11, 12]
+   * The actual read is done by the previous API since that is efficient
+   * as it exploits contiguity and uses vectorization. However, since
+   * the out[] array has to be correctly populated with the unpacked dictId
+   * for each docId, a post-processing step is needed after the bulk contiguous
+   * read to correctly set the unpacked dictId into the out array throwing away
+   * the unnecessary dictIds unpacked as part of contiguous read
+   * @param docIds docIds array
+   * @param docIdsStartIndex starting index in the docIds array
+   * @param length length to read (number of docIds to read in the array)
+   * @param out out array to store the unpacked dictIds
+   * @param outpos starting index in the out array
+   */
+  public void readInt(int[] docIds, int docIdsStartIndex, int length, int[] out, int outpos) {
+    int startDocId = docIds[docIdsStartIndex];
+    int endDocId = docIds[docIdsStartIndex + length - 1];
+    int[] dictIds = THREAD_LOCAL_DICT_IDS.get();
+    // do a contiguous bulk read
+    readInt(startDocId, endDocId - startDocId + 1, dictIds);
+    out[outpos] = dictIds[0];
+    // set the unpacked dictId correctly. this is needed since there could
+    // be gaps and some dictIds may have to be thrown/ignored.
+    for (int i = 1; i < length; i++) {
+      out[outpos + i] = dictIds[docIds[docIdsStartIndex + i] - startDocId];
+    }
+  }
+
+  public static PinotDataBitSetV2 createBitSet(PinotDataBuffer pinotDataBuffer, int numBitsPerValue) {
+    switch (numBitsPerValue) {
+      case 2:
+        return new Bit2Encoded(pinotDataBuffer, numBitsPerValue);
+      case 4:
+        return new Bit4Encoded(pinotDataBuffer, numBitsPerValue);
+      case 8:
+        return new Bit8Encoded(pinotDataBuffer, numBitsPerValue);
+      case 16:
+        return new Bit16Encoded(pinotDataBuffer, numBitsPerValue);
+      case 32:
+        return new RawInt(pinotDataBuffer, numBitsPerValue);
+      default:
+        throw new UnsupportedOperationException(numBitsPerValue + "not supported by PinotDataBitSetV2");
+    }
+  }
+
+  public static class Bit2Encoded extends PinotDataBitSetV2 {
+    Bit2Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int val = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+      bitOffset = bitOffset & 7;
+      return  (val >>> (6 - bitOffset)) & 3;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      bitOffset = bitOffset & 7;
+      int packed = 0;
+      int i = 0;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [1 byte] - read from either the 2nd/4th/6th bit to 7th bit to unpack 1/2/3 integers
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 16 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 8 integers
+       * [1 byte] - read the byte to unpack 4 integers
+       * [1 byte] - unpack 1/2/3 integers from first 2/4/6 bits
+       */
+
+      // unaligned read within a byte
+      if (bitOffset != 0) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        if (bitOffset == 2) {
+          // unpack 3 integers from bits 2-7
+          out[0] = (packed >>> 4) & 3;
+          out[1] = (packed >>> 2) & 3;
+          out[2] = packed & 3;
+          i = 3;
+          length -= 3;
+        }
+        else if (bitOffset == 4) {
+          // unpack 2 integers from bits 4 to 7
+          out[0] = (packed >>> 2) & 3;
+          out[1] = packed & 3;
+          i = 2;
+          length -= 2;
+        } else {
+          // unpack integer from bits 6 to 7
+          out[0] = packed & 3;
+          i = 1;
+          length -= 1;
+        }
+        byteOffset++;
+      }
+
+      // aligned reads at 4-byte boundary to unpack 16 integers
+      while (length >= 16) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 30;
+        out[i + 1] = (packed >>> 28) & 3;
+        out[i + 2] = (packed >>> 26) & 3;
+        out[i + 3] = (packed >>> 24) & 3;
+        out[i + 4] = (packed >>> 22) & 3;
+        out[i + 5] = (packed >>> 20) & 3;
+        out[i + 6] = (packed >>> 18) & 3;
+        out[i + 7] = (packed >>> 16) & 3;
+        out[i + 8] = (packed >>> 14) & 3;
+        out[i + 9] = (packed >>> 12) & 3;
+        out[i + 10] = (packed >>> 10) & 3;
+        out[i + 11] = (packed >>> 8) & 3;
+        out[i + 12] = (packed >>> 6) & 3;
+        out[i + 13] = (packed >>> 4) & 3;
+        out[i + 14] = (packed >>> 2) & 3;
+        out[i + 15] = packed & 3;
+        length -= 16;
+        byteOffset += 4;
+        i += 16;
+      }
+
+      if (length >= 8) {
+        packed = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        out[i] = (packed >>> 14) & 3;
+        out[i + 1] = (packed >>> 12) & 3;
+        out[i + 2] = (packed >>> 10) & 3;
+        out[i + 3] = (packed >>> 8) & 3;
+        out[i + 4] = (packed >>> 6) & 3;
+        out[i + 5] = (packed >>> 4) & 3;
+        out[i + 6] = (packed >>> 2) & 3;
+        out[i + 7] = packed & 3;
+        length -= 8;
+        byteOffset += 2;
+        i += 8;
+      }
+
+      // aligned read at byte boundary to unpack 4 integers
+      if (length >= 4) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 6;
+        out[i + 1] = (packed >>> 4) & 3;
+        out[i + 2] = (packed >>> 2) & 3;
+        out[i + 3] = packed & 3;
+        length -= 4;
+        byteOffset++;
+        i += 4;
+      }
+
+      // handle spill-over
+
+      if (length > 0) {
+        // unpack from bits 0-1
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 6;
+        length--;
+      }
+
+      if (length > 0) {
+        // unpack from bits 2-3
+        out[i + 1] = (packed >>> 4) & 3;
+        length--;
+      }
+
+      if (length > 0) {
+        // unpack from bits 4-5
+        out[i + 2] = (packed >>> 2) & 3;
+        length--;
+      }
+    }
+  }
+
+  public static class Bit4Encoded extends PinotDataBitSetV2 {
+    Bit4Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int val = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+      bitOffset = bitOffset & 7;
+      return (bitOffset == 0) ? val >>> 4 : val & 0xf;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      bitOffset = bitOffset & 7;
+      int packed = 0;
+      int i = 0;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [1 byte] - read from the 4th bit to 7th bit to unpack 1 integer
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 8 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 4 integers
+       * [1 byte] - unpack 1 integer from first 4 bits
+       */
+
+      // unaligned read within a byte from bits 4-7
+      if (bitOffset != 0) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[0] = packed & 0xf;
+        i = 1;
+        byteOffset++;
+        length--;
+      }
+
+      // aligned read at 4-byte boundary to unpack 8 integers
+      while (length >= 8) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 28;
+        out[i + 1] = (packed >>> 24) & 0xf;
+        out[i + 2] = (packed >>> 20) & 0xf;
+        out[i + 3] = (packed >>> 16) & 0xf;
+        out[i + 4] = (packed >>> 12) & 0xf;
+        out[i + 5] = (packed >>> 8) & 0xf;
+        out[i + 6] = (packed >>> 4) & 0xf;
+        out[i + 7] = packed & 0xf;
+        length -= 8;
+        i += 8;
+        byteOffset += 4;
+      }
+
+      // aligned read at 2-byte boundary to unpack 4 integers
+      if (length >= 4) {
+        packed = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        out[i] = (packed >>> 12) & 0xf;
+        out[i + 1] = (packed >>> 8) & 0xf;
+        out[i + 2] = (packed >>> 4) & 0xf;
+        out[i + 3] = packed & 0xf;
+        length -= 4;
+        i += 4;
+        byteOffset += 2;
+      }
+
+      // aligned read at byte boundary to unpack 2 integers
+      if (length >= 2) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 4;
+        out[i + 1] = packed & 0xf;
+        length -= 2;
+        i += 2;
+        byteOffset++;
+      }
+
+      // handle spill over -- unpack from bits 0-3
+      if (length > 0) {
+        packed = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        out[i] = packed >>> 4;
+        length--;
+      }
+    }
+  }
+
+  public static class Bit8Encoded extends PinotDataBitSetV2 {
+    Bit8Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      return ((int)_dataBuffer.getByte(byteOffset)) & 0xff;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int i = 0;
+      int packed = 0;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 4 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 4 integers
+       * [1 byte] - unpack 1 integer from first 4 bits
+       */
+
+      // aligned read at 4-byte boundary to unpack 4 integers
+      while (length >= 4) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 24;
+        out[i + 1] = (packed >>> 16) & 0xff;
+        out[i + 2] = (packed >>> 8) & 0xff;
+        out[i + 3] = packed & 0xff;
+        length -= 4;
+        byteOffset += 4;
+        i += 4;
+      }
+
+      // aligned read at 2-byte boundary to unpack 2 integers
+      if (length >= 2) {
+        packed = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        out[i] = (packed >>> 8) & 0xff;
+        out[i + 1] = packed & 0xff;
+        length -= 2;
+        byteOffset += 2;
+        i += 2;
+      }
+
+      // handle spill over at byte boundary to unpack 1 integer
+      if (length > 0) {
+        out[i] = (int)_dataBuffer.getByte(byteOffset) & 0xff;
+        length--;
+      }
+    }
+  }
+
+  public static class Bit16Encoded extends PinotDataBitSetV2 {
+    Bit16Encoded(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      long bitOffset = (long) index * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      return ((int)_dataBuffer.getShort(byteOffset)) & 0xffff;
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      long bitOffset = (long) startIndex * _numBitsPerValue;
+      int byteOffset = (int) (bitOffset / Byte.SIZE);
+      int i = 0;
+      int packed;
+
+      /*
+       * Bytes are read as follows to get maximum vectorization
+       *
+       * [chunks of 4 bytes] - read 4 bytes at a time to unpack 2 integers
+       * [1 chunk of 2 bytes] - read 2 bytes to unpack 1 integer
+       */
+
+      // aligned reads at 4-byte boundary to unpack 2 integers
+      while (length >= 2) {
+        packed = _dataBuffer.getInt(byteOffset);
+        out[i] = packed >>> 16;
+        out[i + 1] = packed & 0xffff;
+        length -= 2;
+        i += 2;
+        byteOffset += 4;
+      }
+
+      // handle spill over at 2-byte boundary to unpack 1 integer
+      if (length > 0) {
+        out[i] = (int)_dataBuffer.getShort(byteOffset) & 0xffff;
+        length--;
+      }
+    }
+  }
+
+  public static class RawInt extends PinotDataBitSetV2 {
+    RawInt(PinotDataBuffer dataBuffer, int numBits) {
+      _dataBuffer = dataBuffer;
+      _numBitsPerValue = numBits;
+    }
+
+    @Override
+    public int readInt(int index) {
+      return _dataBuffer.getInt(index * Integer.BYTES);
+    }
+
+    @Override
+    public void readInt(int startIndex, int length, int[] out) {
+      int byteOffset = startIndex * Integer.BYTES;
+      for (int i = 0; i < length; i++) {
+        out[i] = _dataBuffer.getInt(byteOffset);
+        byteOffset += 4;
+      }
+    }
+  }
+
+  protected void writeInt(int index, int value) {
+    long bitOffset = (long) index * _numBitsPerValue;
+    int byteOffset = (int) (bitOffset / Byte.SIZE);
+    int bitOffsetInFirstByte = (int) (bitOffset % Byte.SIZE);
+
+    int firstByte = _dataBuffer.getByte(byteOffset);
+
+    int firstByteMask = BYTE_MASK >>> bitOffsetInFirstByte;
+    int numBitsLeft = _numBitsPerValue - (Byte.SIZE - bitOffsetInFirstByte);
+    if (numBitsLeft <= 0) {
+      // The value is inside the first byte
+      firstByteMask &= BYTE_MASK << -numBitsLeft;
+      _dataBuffer.putByte(byteOffset, (byte) ((firstByte & ~firstByteMask) | (value << -numBitsLeft)));
+    } else {
+      // The value is in multiple bytes
+      _dataBuffer
+          .putByte(byteOffset, (byte) ((firstByte & ~firstByteMask) | ((value >>> numBitsLeft) & firstByteMask)));
+      while (numBitsLeft > Byte.SIZE) {
+        numBitsLeft -= Byte.SIZE;
+        _dataBuffer.putByte(++byteOffset, (byte) (value >> numBitsLeft));
+      }
+      int lastByte = _dataBuffer.getByte(++byteOffset);
+      _dataBuffer.putByte(byteOffset,
+          (byte) ((lastByte & (BYTE_MASK >>> numBitsLeft)) | (value << (Byte.SIZE - numBitsLeft))));
+    }
+  }
+
+  public void writeInt(int startIndex, int length, int[] values) {
+    long startBitOffset = (long) startIndex * _numBitsPerValue;
+    int byteOffset = (int) (startBitOffset / Byte.SIZE);
+    int bitOffsetInFirstByte = (int) (startBitOffset % Byte.SIZE);
+
+    int firstByte = _dataBuffer.getByte(byteOffset);
+
+    for (int i = 0; i < length; i++) {
+      int value = values[i];
+      if (bitOffsetInFirstByte == Byte.SIZE) {
+        bitOffsetInFirstByte = 0;
+        firstByte = _dataBuffer.getByte(++byteOffset);
+      }
+      int firstByteMask = BYTE_MASK >>> bitOffsetInFirstByte;
+      int numBitsLeft = _numBitsPerValue - (Byte.SIZE - bitOffsetInFirstByte);
+      if (numBitsLeft <= 0) {
+        // The value is inside the first byte
+        firstByteMask &= BYTE_MASK << -numBitsLeft;
+        firstByte = ((firstByte & ~firstByteMask) | (value << -numBitsLeft));
+        _dataBuffer.putByte(byteOffset, (byte) firstByte);
+        bitOffsetInFirstByte = Byte.SIZE + numBitsLeft;
+      } else {
+        // The value is in multiple bytes
+        _dataBuffer
+            .putByte(byteOffset, (byte) ((firstByte & ~firstByteMask) | ((value >>> numBitsLeft) & firstByteMask)));
+        while (numBitsLeft > Byte.SIZE) {
+          numBitsLeft -= Byte.SIZE;
+          _dataBuffer.putByte(++byteOffset, (byte) (value >> numBitsLeft));
+        }
+        int lastByte = _dataBuffer.getByte(++byteOffset);
+        firstByte = (lastByte & (0xFF >>> numBitsLeft)) | (value << (Byte.SIZE - numBitsLeft));
+        _dataBuffer.putByte(byteOffset, (byte) firstByte);
+        bitOffsetInFirstByte = numBitsLeft;
+      }
+    }
+  }
+
+  @Override
+  public void close()
+      throws IOException {
+    _dataBuffer.close();

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-635650062


   Here are the JMH results: cc @kishoreg  @Jackie-Jiang 
   
   Please see the spreadsheet for additional performance numbers obtained via manual instrumentation
   
   
   
   
   Benchmark | Score | Mode
   -- | -- | --
   BenchmarkPinotDataBitSet.twoBitBulkWithGaps | 28.537   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.twoBitBulkWithGapsFast | 44.332   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGaps | 28.262   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.fourBitBulkWithGapsFast | 42.449   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGaps | 26.816   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.eightBitBulkWithGapsFast | 38.777   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGaps | 21.095   ops/ms | Throughpput
   BenchmarkPinotDataBitSet.sixteenBitBulkWithGapsFast | 38.380   ops/ms | Throughpput
     |   |  
   BenchmarkPinotDataBitSet.twoBitContiguous | 26.188 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitContiguousFast | 15.692 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguous | 26.113 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitContiguousFast | 15.178 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguous | 26.693 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitContiguousFast | 15.726 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguous | 43.968 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitContiguousFast | 21.390 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguous | 32.504 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousFast | 14.614 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguous | 30.794 ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousFast | 14.583 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguous | 16.525 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousFast | 10.777 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguous | 54.731 ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousFast | 19.312 ms/op | Avgt
     |   |  
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnaligned | 32.018 ms/op | Avgt
   BenchmarkPinotDataBitSet.twoBitBulkContiguousUnalignedFast | 14.344 ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnaligned | 21.204   ms/op | Avgt
   BenchmarkPinotDataBitSet.eightBitBulkContiguousUnalignedFast | 15.393   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnaligned | 20.125   ms/op | Avgt
   BenchmarkPinotDataBitSet.fourBitBulkContiguousUnalignedFast | 14.836   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnaligned | 58.086   ms/op | Avgt
   BenchmarkPinotDataBitSet.sixteenBitBulkContiguousUnalignedFast | 22.002   ms/op | Avgt
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

kishoreg commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r426940117



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {
+    Preconditions
+        .checkState(dataBuffer.size() == (int) (((long) numValues * numBitsPerValue + Byte.SIZE - 1) / Byte.SIZE));
+    _dataBitSet = PinotDataBitSetV2.createBitSet(dataBuffer, numBitsPerValue);
+    _numBitsPerValue = numBitsPerValue;
+  }
+
+  /**
+   * Read dictionaryId for a particular docId
+   * @param index docId to get the dictionaryId for
+   * @return dictionaryId
+   */
+  public int readInt(int index) {
+    return _dataBitSet.readInt(index);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for a contiguous
+   * range of docIds starting at startDocId for a given length
+   * @param startDocId docId range start
+   * @param length length of contiguous docId range
+   * @param buffer out buffer to read dictionaryIds into
+   */
+  public void readInt(int startDocId, int length, int[] buffer) {
+    _dataBitSet.readInt(startDocId, length, buffer);
+  }
+
+  /**
+   * Array based API to read dictionaryIds for an array of docIds
+   * which are monotonically increasing but not necessarily contiguous
+   * @param docIds array of docIds to read the dictionaryIds for
+   * @param docIdStartIndex start index in docIds array
+   * @param docIdLength length to process in docIds array
+   * @param values out array to store the dictionaryIds into
+   * @param valuesStartIndex start index in values array
+   */
+  public void readValues(int[] docIds, int docIdStartIndex, int docIdLength, int[] values, int valuesStartIndex) {
+    int docIdEndIndex = docIdStartIndex + docIdLength - 1;
+    if (shouldBulkRead(docIds, docIdStartIndex, docIdEndIndex)) {
+      _dataBitSet.readInt(docIds, docIdStartIndex, docIdLength, values, valuesStartIndex);
+    } else {
+      for (int i = docIdStartIndex; i <= docIdEndIndex; i++) {
+        values[valuesStartIndex++] = _dataBitSet.readInt(docIds[i]);
+      }
+    }
+  }
+
+  private boolean shouldBulkRead(int[] docIds, int startIndex, int endIndex) {
+    int numDocsToRead = endIndex - startIndex + 1;
+    int docIdRange = docIds[endIndex] - docIds[startIndex] + 1;
+    if (docIdRange > DocIdSetPlanNode.MAX_DOC_PER_CALL) {

Review comment:
       why is this check needed?
   

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {

Review comment:
       when you do this, please write a standalone header buffer that can be used in other places




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia merged pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia merged pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-636192732


   Merging it. 
   
   Follow-ups coming next:
   
   - Wire with the reader and writer. Currently the fast methods here can be used for existing reader and writer if the index is power of 2 bit encoded. 
   - Consider changing the format to Little Endian
   - Consider aligning the bytes on the writer side. This will remove a few branches for 2/4 bit encoding to handle unaligned reads. It will also make it easier to add fast methods for non power of 2 bit encodings. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-634189690


   For the benchmark, you should also compare the worst case scenario such as 3, 5, 9, 17 bits


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on a change in pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#discussion_r429077524



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/util/FixedBitIntReaderWriterV2.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.io.util;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+
+
+public final class FixedBitIntReaderWriterV2 implements Closeable {
+  private volatile PinotDataBitSetV2 _dataBitSet;
+  private final int _numBitsPerValue;
+
+  public FixedBitIntReaderWriterV2(PinotDataBuffer dataBuffer, int numValues, int numBitsPerValue) {

Review comment:
       Yes, in the follow-up when this code is wired with reader and writer (FixedBitSingleValueReader and FixedBitSingleValueWriter) and the scan operator, I will consider if the format has to be changed and bump the version and write a header. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5409: Faster bit unpacking (Part 1)

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #5409:
URL: https://github.com/apache/incubator-pinot/pull/5409#issuecomment-634394494


   > For the benchmark, you should also compare the worst case scenario such as 3, 5, 9, 17 bits
   
   I am still working on adding faster methods for non power of 2. Follow up will address this. Right now it is standalone code (yet to wire in).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org