You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "Jackie-Jiang (via GitHub)" <gi...@apache.org> on 2023/11/12 23:08:22 UTC

[PR] Add a new MV forward index to only store unique MV values [pinot]

Jackie-Jiang opened a new pull request, #11993:
URL: https://github.com/apache/pinot/pull/11993

   Add a new MV dictionary encoded forward index format that only stores the unique MV values. This new index format can significantly reduce the index size when the MV values repeat a lot.
   
   Index layout:
   - Header
   - Map from doc id to MV value id (bit compressed)
   - Start offsets of each MV value in the value buffer (bit compressed)
   - Value buffer (bit compressed)
   
   To enable the new index format, set the index format in the `FieldConfig`:
   - Using old properties field:
   ```
   {
     "name": "myCol",
     "encodingType": "DICTIONARY",
     "properties": {
       "mvForwardIndexFormat": "dedup"
     }
   }
   ```
   - Using new index JSON:
   
   ```
   {
     "name": "myCol",
     "encodingType": "DICTIONARY",
     "indexes": {
       "forward": {
         "mvIndexFormat": "dedup"
       }
     }
   }
   ```
   
   Currently the new index format can be enabled during index creation, or derived column creation. Changing the index format for existing column is not supported yet.
   
   Also fixes a bug on using MV column as derived column input.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #11993:
URL: https://github.com/apache/pinot/pull/11993#issuecomment-1808843105

   @kishoreg Good suggestion! I was actually thinking the same. Currently the compression codec is preserved only for raw index, but I think we should be able to open it for dictionary encoding as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11993:
URL: https://github.com/apache/pinot/pull/11993#issuecomment-1807330531

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11993](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (f63f5e1) into [master](https://app.codecov.io/gh/apache/pinot/commit/b38bd9fb34aa06e53e11f540c1315306d023a9c5?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (b38bd9f) will **decrease** coverage by `14.96%`.
   > The diff coverage is `4.25%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #11993       +/-   ##
   =============================================
   - Coverage     61.63%   46.67%   -14.96%     
   - Complexity      207      927      +720     
   =============================================
     Files          2385     1790      -595     
     Lines        129214    93804    -35410     
     Branches      20003    15162     -4841     
   =============================================
   - Hits          79640    43785    -35855     
   - Misses        43773    46900     +3127     
   + Partials       5801     3119     -2682     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.67% <4.25%> (-14.84%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.67% <4.25%> (-14.82%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.67% <4.25%> (-14.96%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.67% <4.25%> (-14.96%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.67% <4.25%> (-0.28%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11993/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...org/apache/pinot/spi/config/table/FieldConfig.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZpZWxkQ29uZmlnLmphdmE=) | `51.56% <ø> (ø)` | |
   | [.../local/segment/index/forward/ForwardIndexType.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2ZvcndhcmQvRm9yd2FyZEluZGV4VHlwZS5qYXZh) | `0.00% <0.00%> (-92.50%)` | :arrow_down: |
   | [...ment/index/forward/ForwardIndexCreatorFactory.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2ZvcndhcmQvRm9yd2FyZEluZGV4Q3JlYXRvckZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-87.50%)` | :arrow_down: |
   | [...gment/index/forward/ForwardIndexReaderFactory.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2ZvcndhcmQvRm9yd2FyZEluZGV4UmVhZGVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-85.72%)` | :arrow_down: |
   | [...he/pinot/segment/spi/index/ForwardIndexConfig.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L0ZvcndhcmRJbmRleENvbmZpZy5qYXZh) | `40.67% <38.46%> (+0.67%)` | :arrow_up: |
   | [...che/pinot/segment/spi/index/reader/Dictionary.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3JlYWRlci9EaWN0aW9uYXJ5LmphdmE=) | `52.83% <25.00%> (-8.15%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (-56.10%)` | :arrow_down: |
   | [...r/impl/fwd/MultiValueDedupForwardIndexCreator.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTXVsdGlWYWx1ZURlZHVwRm9yd2FyZEluZGV4Q3JlYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [.../local/segment/index/readers/StringDictionary.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvU3RyaW5nRGljdGlvbmFyeS5qYXZh) | `0.00% <0.00%> (-97.23%)` | :arrow_down: |
   | [...ocal/segment/readers/PinotSegmentColumnReader.java](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3JlYWRlcnMvUGlub3RTZWdtZW50Q29sdW1uUmVhZGVyLmphdmE=) | `0.00% <0.00%> (-83.34%)` | :arrow_down: |
   | ... and [2 more](https://app.codecov.io/gh/apache/pinot/pull/11993?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | |
   
   ... and [973 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11993/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in [Chrome](https://chrome.google.com/webstore/detail/codecov/gedikamndpbemklijjkncpnolildpbgo) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/codecov/) today!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406973517


##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/forward/FixedBitMVEntryDictForwardIndexTest.java:
##########
@@ -0,0 +1,98 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.forward;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Random;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.FixedBitMVEntryDictForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.FixedBitMVEntryDictForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+
+
+public class FixedBitMVEntryDictForwardIndexTest {

Review Comment:
   Added (even though we do not support empty array as MV as of now)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406858182


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/ForwardIndexConfig.java:
##########
@@ -65,25 +70,32 @@ public int getRawIndexWriterVersion() {
     return _rawIndexWriterVersion;
   }
 
+  @Nullable
+  public DictCompressionType getDictCompressionType() {

Review Comment:
   Not sure if this is the right thing, feeling we shouldn't leak abstraction of dictionary to forward index?



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/ForwardIndexConfig.java:
##########
@@ -65,25 +70,32 @@ public int getRawIndexWriterVersion() {
     return _rawIndexWriterVersion;
   }
 
+  @Nullable
+  public DictCompressionType getDictCompressionType() {

Review Comment:
   Not sure if this is the right, feeling we shouldn't leak abstraction of dictionary to forward index?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1393458717


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/FixedBitDedupMVForwardIndexWriter.java:
##########
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.writer.impl;
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.objects.Object2IntMap;
+import it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap;
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteOrder;
+import org.apache.pinot.segment.local.io.util.FixedBitIntReaderWriter;
+import org.apache.pinot.segment.local.io.util.PinotDataBitSet;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+
+
+/**
+ * Bit-compressed dictionary-encoded forward index writer for multi-value columns, where a map from doc id to
+ * multi-value id and unique multi-values are stored.
+ *
+ * Index layout:
+ * - Index header (24 bytes)
+ * - ID buffer (stores the multi-value id for each doc id)
+ * - Offset buffer (stores the start offset of each multi-value, followed by end offset of the last value)
+ * - Value buffer (stores the actual values)
+ *
+ * Header layout:
+ * - Magic marker (4 bytes)
+ * - Version (2 bytes)
+ * - Bits per value (1 byte)
+ * - Bits per id (1 byte)
+ * - Number of unique values (4 bytes)
+ * - Number of total entries (4 bytes)
+ * - Start offset of offset buffer (4 bytes)
+ * - Start offset of value buffer (4 bytes)
+ */
+public class FixedBitDedupMVForwardIndexWriter implements Closeable {
+  public static final int MAGIC_MARKER = 0xdedffded;
+  public static final short VERSION = 1;
+
+  private final Object2IntOpenHashMap<IntArrayList> _valueToIdMap = new Object2IntOpenHashMap<>();
+  private final File _file;
+  private final int _numBitsPerValue;
+  private final IntArrayList _ids;
+
+  public FixedBitDedupMVForwardIndexWriter(File file, int numDocs, int numBitsPerValue) {
+    _file = file;
+    _numBitsPerValue = numBitsPerValue;
+    _ids = new IntArrayList(numDocs);
+  }
+
+  public void putDictIds(int[] dictIds) {
+    _ids.add(_valueToIdMap.computeIntIfAbsent(IntArrayList.wrap(dictIds), k -> _valueToIdMap.size()));

Review Comment:
   perhaps add some comments for this method, as it's the key to translating an entire MV entry to a unique id and doing the dedup part (👍 for this neat implementation)



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/Dictionary.java:
##########
@@ -208,24 +208,48 @@ default void readIntValues(int[] dictIds, int length, int[] outValues) {
     }
   }
 
+  default void readIntValues(int[] dictIds, int length, Integer[] outValues) {

Review Comment:
   are those new methods to allow null values?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/FixedBitDedupMVForwardIndexWriter.java:
##########
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.writer.impl;
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.objects.Object2IntMap;
+import it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap;
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteOrder;
+import org.apache.pinot.segment.local.io.util.FixedBitIntReaderWriter;
+import org.apache.pinot.segment.local.io.util.PinotDataBitSet;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+
+
+/**
+ * Bit-compressed dictionary-encoded forward index writer for multi-value columns, where a map from doc id to
+ * multi-value id and unique multi-values are stored.
+ *
+ * Index layout:
+ * - Index header (24 bytes)
+ * - ID buffer (stores the multi-value id for each doc id)
+ * - Offset buffer (stores the start offset of each multi-value, followed by end offset of the last value)
+ * - Value buffer (stores the actual values)
+ *
+ * Header layout:
+ * - Magic marker (4 bytes)
+ * - Version (2 bytes)
+ * - Bits per value (1 byte)
+ * - Bits per id (1 byte)
+ * - Number of unique values (4 bytes)
+ * - Number of total entries (4 bytes)
+ * - Start offset of offset buffer (4 bytes)
+ * - Start offset of value buffer (4 bytes)
+ */
+public class FixedBitDedupMVForwardIndexWriter implements Closeable {
+  public static final int MAGIC_MARKER = 0xdedffded;
+  public static final short VERSION = 1;
+
+  private final Object2IntOpenHashMap<IntArrayList> _valueToIdMap = new Object2IntOpenHashMap<>();
+  private final File _file;
+  private final int _numBitsPerValue;
+  private final IntArrayList _ids;
+
+  public FixedBitDedupMVForwardIndexWriter(File file, int numDocs, int numBitsPerValue) {
+    _file = file;
+    _numBitsPerValue = numBitsPerValue;
+    _ids = new IntArrayList(numDocs);
+  }
+
+  public void putDictIds(int[] dictIds) {
+    _ids.add(_valueToIdMap.computeIntIfAbsent(IntArrayList.wrap(dictIds), k -> _valueToIdMap.size()));

Review Comment:
   perhaps add some comments for this method, as it's the key to translating an entire MV entry to a unique id and doing the dedup part (👍 for this neat implementation)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406963059


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/ForwardIndexConfig.java:
##########
@@ -65,25 +70,32 @@ public int getRawIndexWriterVersion() {
     return _rawIndexWriterVersion;
   }
 
+  @Nullable
+  public DictCompressionType getDictCompressionType() {

Review Comment:
   Dictionary id has some special properties that we can leverage. Changed it to `DictIdCompressionType` to be more clear



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406982900


##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java:
##########
@@ -119,7 +119,19 @@ public enum IndexType {
   }
 
   public enum CompressionCodec {

Review Comment:
   This is the old format. I separated it in the new `ForwardIndexConfig`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406959636


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/FixedBitMVEntryDictForwardIndexReader.java:
##########
@@ -0,0 +1,155 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.readers.forward;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.pinot.segment.local.io.util.FixedBitIntReaderWriter;
+import org.apache.pinot.segment.local.io.util.PinotDataBitSet;
+import org.apache.pinot.segment.local.io.writer.impl.FixedBitMVEntryDictForwardIndexWriter;
+import org.apache.pinot.segment.spi.compression.DictCompressionType;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReader;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReaderContext;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Bit-compressed dictionary-encoded forward index reader for multi-value columns, where a second level dictionary
+ * encoding for multi-value entries (instead of individual values within the entry) are maintained within the forward
+ * index.
+ * See {@link FixedBitMVEntryDictForwardIndexWriter} for index layout.
+ */
+public final class FixedBitMVEntryDictForwardIndexReader implements ForwardIndexReader<ForwardIndexReaderContext> {
+  public static final int MAGIC_MARKER = FixedBitMVEntryDictForwardIndexWriter.MAGIC_MARKER;
+  public static final short VERSION = FixedBitMVEntryDictForwardIndexWriter.VERSION;
+  private static final int HEADER_SIZE = 24;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406958490


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -381,37 +368,64 @@ private boolean shouldDisableDictionary(String column, ColumnMetadata existingCo
     return true;
   }
 
-  private boolean shouldChangeCompressionType(String column, SegmentDirectory.Reader segmentReader)
+  private boolean shouldChangeRawCompressionType(String column, SegmentDirectory.Reader segmentReader)
       throws Exception {
-    ColumnMetadata existingColMetadata = _segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
-
     // The compression type for an existing segment can only be determined by reading the forward index header.
+    ColumnMetadata existingColMetadata = _segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
+    ChunkCompressionType existingCompressionType;
     try (ForwardIndexReader<?> fwdIndexReader = ForwardIndexType.read(segmentReader, existingColMetadata)) {
-      ChunkCompressionType existingCompressionType = fwdIndexReader.getCompressionType();
+      existingCompressionType = fwdIndexReader.getCompressionType();
       Preconditions.checkState(existingCompressionType != null,
           "Existing compressionType cannot be null for raw forward index column=" + column);
+    }
 
-      // Get the new compression type.
-      ChunkCompressionType newCompressionType = _fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward())
-          .getChunkCompressionType();
+    // Get the new compression type.
+    ChunkCompressionType newCompressionType =
+        _fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward()).getChunkCompressionType();
 
-      // Note that default compression type (PASS_THROUGH for metric and LZ4 for dimension) is not considered if the
-      // compressionType is not explicitly provided in tableConfig. This is to avoid incorrectly rewriting all the
-      // forward indexes during segmentReload when the default compressionType changes.
-      return newCompressionType != null && existingCompressionType != newCompressionType;
+    // Note that default compression type (PASS_THROUGH for metric and LZ4 for dimension) is not considered if the
+    // compressionType is not explicitly provided in tableConfig. This is to avoid incorrectly rewriting all the
+    // forward indexes during segmentReload when the default compressionType changes.
+    return newCompressionType != null && existingCompressionType != newCompressionType;
+  }
+
+  private boolean shouldChangeDictCompressionType(String column, SegmentDirectory.Reader segmentReader)
+      throws Exception {
+    // The compression type for an existing segment can only be determined by reading the forward index header.
+    ColumnMetadata existingColMetadata = _segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
+    DictCompressionType existingCompressionType;
+    try (ForwardIndexReader<?> fwdIndexReader = ForwardIndexType.read(segmentReader, existingColMetadata)) {
+      existingCompressionType = fwdIndexReader.getDictCompressionType();
     }
+
+    // Get the new compression type.
+    DictCompressionType newCompressionType =
+        _fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward()).getDictCompressionType();
+    // MV_ENTRY_DICT can only be applied to multi-value columns.
+    if (newCompressionType == DictCompressionType.MV_ENTRY_DICT && existingColMetadata.isSingleValue()) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406738352


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/compression/DictCompressionType.java:
##########
@@ -0,0 +1,23 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.compression;
+
+public enum DictCompressionType {

Review Comment:
   Do you foresee there will be single value dict compression type?
   If so, maybe we can use one bit or a few bits to represent some feature can has builtin method like `isMvDictionary()`.
   
   



##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java:
##########
@@ -119,7 +119,19 @@ public enum IndexType {
   }
 
   public enum CompressionCodec {

Review Comment:
   Shall we separate since raw and dictionary compression has no overlap? 



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -381,37 +368,64 @@ private boolean shouldDisableDictionary(String column, ColumnMetadata existingCo
     return true;
   }
 
-  private boolean shouldChangeCompressionType(String column, SegmentDirectory.Reader segmentReader)
+  private boolean shouldChangeRawCompressionType(String column, SegmentDirectory.Reader segmentReader)
       throws Exception {
-    ColumnMetadata existingColMetadata = _segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
-
     // The compression type for an existing segment can only be determined by reading the forward index header.
+    ColumnMetadata existingColMetadata = _segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
+    ChunkCompressionType existingCompressionType;
     try (ForwardIndexReader<?> fwdIndexReader = ForwardIndexType.read(segmentReader, existingColMetadata)) {
-      ChunkCompressionType existingCompressionType = fwdIndexReader.getCompressionType();
+      existingCompressionType = fwdIndexReader.getCompressionType();
       Preconditions.checkState(existingCompressionType != null,
           "Existing compressionType cannot be null for raw forward index column=" + column);
+    }
 
-      // Get the new compression type.
-      ChunkCompressionType newCompressionType = _fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward())
-          .getChunkCompressionType();
+    // Get the new compression type.
+    ChunkCompressionType newCompressionType =
+        _fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward()).getChunkCompressionType();
 
-      // Note that default compression type (PASS_THROUGH for metric and LZ4 for dimension) is not considered if the
-      // compressionType is not explicitly provided in tableConfig. This is to avoid incorrectly rewriting all the
-      // forward indexes during segmentReload when the default compressionType changes.
-      return newCompressionType != null && existingCompressionType != newCompressionType;
+    // Note that default compression type (PASS_THROUGH for metric and LZ4 for dimension) is not considered if the
+    // compressionType is not explicitly provided in tableConfig. This is to avoid incorrectly rewriting all the
+    // forward indexes during segmentReload when the default compressionType changes.
+    return newCompressionType != null && existingCompressionType != newCompressionType;
+  }
+
+  private boolean shouldChangeDictCompressionType(String column, SegmentDirectory.Reader segmentReader)
+      throws Exception {
+    // The compression type for an existing segment can only be determined by reading the forward index header.
+    ColumnMetadata existingColMetadata = _segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
+    DictCompressionType existingCompressionType;
+    try (ForwardIndexReader<?> fwdIndexReader = ForwardIndexType.read(segmentReader, existingColMetadata)) {
+      existingCompressionType = fwdIndexReader.getDictCompressionType();
     }
+
+    // Get the new compression type.
+    DictCompressionType newCompressionType =
+        _fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward()).getDictCompressionType();
+    // MV_ENTRY_DICT can only be applied to multi-value columns.
+    if (newCompressionType == DictCompressionType.MV_ENTRY_DICT && existingColMetadata.isSingleValue()) {

Review Comment:
   This if check will be hard to maintain, shall we consider having a static method in `DictCompressionType` like `isMultiValueDictionary(DictCompressionType type)`?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/FixedBitMVEntryDictForwardIndexReader.java:
##########
@@ -0,0 +1,155 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.readers.forward;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.pinot.segment.local.io.util.FixedBitIntReaderWriter;
+import org.apache.pinot.segment.local.io.util.PinotDataBitSet;
+import org.apache.pinot.segment.local.io.writer.impl.FixedBitMVEntryDictForwardIndexWriter;
+import org.apache.pinot.segment.spi.compression.DictCompressionType;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReader;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReaderContext;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Bit-compressed dictionary-encoded forward index reader for multi-value columns, where a second level dictionary
+ * encoding for multi-value entries (instead of individual values within the entry) are maintained within the forward
+ * index.
+ * See {@link FixedBitMVEntryDictForwardIndexWriter} for index layout.
+ */
+public final class FixedBitMVEntryDictForwardIndexReader implements ForwardIndexReader<ForwardIndexReaderContext> {
+  public static final int MAGIC_MARKER = FixedBitMVEntryDictForwardIndexWriter.MAGIC_MARKER;
+  public static final short VERSION = FixedBitMVEntryDictForwardIndexWriter.VERSION;
+  private static final int HEADER_SIZE = 24;

Review Comment:
   This can also use FixedBitMVEntryDictForwardIndexWriter.HEADER_SIZE?



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/forward/FixedBitMVEntryDictForwardIndexTest.java:
##########
@@ -0,0 +1,98 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.forward;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Random;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.io.writer.impl.FixedBitMVEntryDictForwardIndexWriter;
+import org.apache.pinot.segment.local.segment.index.readers.forward.FixedBitMVEntryDictForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+
+
+public class FixedBitMVEntryDictForwardIndexTest {

Review Comment:
   Can you add a test for all null value or all empty array ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #11993:
URL: https://github.com/apache/pinot/pull/11993


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #11993:
URL: https://github.com/apache/pinot/pull/11993#issuecomment-1876750417

   For coherence, shouldn't be this property part of the dictionary type config instead of forward dictionary config?
   
   Also for coherence as well: When using the older config style, we cannot define a chuck compression and a dictionary compression (which makes sense). Therefore we use the same field. Shouldn't we do the same in the new syntax?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406935559


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/compression/DictCompressionType.java:
##########
@@ -0,0 +1,23 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.compression;
+
+public enum DictCompressionType {

Review Comment:
   I don't see it necessary as of now. We may add it when adding other compression types



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on PR #11993:
URL: https://github.com/apache/pinot/pull/11993#issuecomment-1808793304

   Can we come up with a better property name? instead of format - can we call it encoding or compression codec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a new MV forward index to only store unique MV values [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #11993:
URL: https://github.com/apache/pinot/pull/11993#issuecomment-1876803035

   I've opened https://github.com/apache/pinot/pull/12218 in order to add the option to use `compressionCodec` in ForwardIndexTypeConfig.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org