You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/04 19:58:26 UTC

[GitHub] [incubator-pinot] GSharayu opened a new pull request #6876: Support ZSTD compression codec for raw index

GSharayu opened a new pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876


   When the forward index is not dictionary encoded, we have 2 choices:
   
   store the data as is (RAW)
   store the data snappy compressed - using snappy compression codec library
   
   ZSTD library in Java - https://github.com/luben/zstd-jni
   
   We get good compression ratio. So based on the user requirements, user can configure via table config on a per column basis. The default behavior continues to remain the same. It is snappy for dimension columns and no compression for metric columns. 
   
   Issue (#6804)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832468828


   > Looks like this library is using JNI.Pinot is going to be dependent on the OS architecture with this feature PR and it's a backward-incompatible change.
   > 
   > We should definitely discuss this and consider alternative implementations
   
   @kishoreg  A lot of the compression algorithms are not natively available in Java since they are written in C/C++. Pure Java only implementations which are well tested are unlikely to be available especially for algorithms not yet as popular as Snappy.
   
   Apache commons library has implementation for ZSTD but the API is byte array based not direct byte buffer based. It also relies on JNI bindings underneath. 
   
   Pretty much all low level stuff is available in Java via JNI bridge. I don't think there is any platform specific issue. This library is also used in other Java based projects (e.g Arrow)
   
   There is nothing backward incompatible about this change. I think you mean forward compatibility. If someone upgrades to newer version of Pinot and enables ZSTD and then downgrades, then old release can't read the new segments. We have labeled with release notes and kept the default behavior intact. 
   
   Backward compatibility is still there since we can still read old segments with SNAPPY and SNAPPY continues to be default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626096328



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/util/TableConfigUtilsTest.java
##########
@@ -712,14 +712,25 @@ public void testValidateFieldConfig() {
         .setNoDictionaryColumns(Arrays.asList("myCol1")).build();
     try {
       FieldConfig fieldConfig =
-          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null);
+          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null, null);
       tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
       TableConfigUtils.validate(tableConfig, schema);
       Assert.fail("Should fail since field name is not present in schema");
     } catch (Exception e) {
       Assert.assertEquals(e.getMessage(),
           "Column Name myCol21 defined in field config list must be a valid column defined in the schema");
     }
+
+    tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME).build();
+    try {
+      FieldConfig fieldConfig =
+          new FieldConfig("intCol", FieldConfig.EncodingType.DICTIONARY, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.ZSTANDARD);
+      tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
+      TableConfigUtils.validate(tableConfig, schema);
+      Assert.fail("Should fail since dictionary encoding does not support compression codec zstandard ");
+    } catch (Exception e) {

Review comment:
       The same failure is expected for SNAPPY and PASS_THROUGH right ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626849816



##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/LuceneRealtimeClusterIntegrationTest.java
##########
@@ -102,7 +102,7 @@ protected String getSortedColumn() {
   @Override
   protected List<FieldConfig> getFieldConfigs() {
     return Collections.singletonList(
-        new FieldConfig(TEXT_COLUMN_NAME, FieldConfig.EncodingType.RAW, FieldConfig.IndexType.TEXT, null));
+        new FieldConfig(TEXT_COLUMN_NAME, FieldConfig.EncodingType.RAW, FieldConfig.IndexType.TEXT, null,null));

Review comment:
       done!

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/util/TableConfigUtilsTest.java
##########
@@ -712,14 +712,36 @@ public void testValidateFieldConfig() {
         .setNoDictionaryColumns(Arrays.asList("myCol1")).build();
     try {
       FieldConfig fieldConfig =
-          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null);
+          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null, null);
       tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
       TableConfigUtils.validate(tableConfig, schema);
       Assert.fail("Should fail since field name is not present in schema");
     } catch (Exception e) {
       Assert.assertEquals(e.getMessage(),
           "Column Name myCol21 defined in field config list must be a valid column defined in the schema");
     }
+
+    tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME).build();
+    try {
+      FieldConfig fieldConfig =
+          new FieldConfig("intCol", FieldConfig.EncodingType.DICTIONARY, null, null, FieldConfig.NoDictionaryColumnCompressionCodec.SNAPPY);
+      tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
+      TableConfigUtils.validate(tableConfig, schema);
+      Assert.fail("Should fail since dictionary encoding does not support compression codec snappy ");

Review comment:
       done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626893747



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +263,19 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {

Review comment:
       (nit) Rename to `extractCompressionCodecConfigsFromTableConfig`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626110847



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +267,19 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressorCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(tableConfig.getIndexingConfig(), rawIndexColumns);

Review comment:
       so, if the noDictinaryColumnMap is empty/null this indicates that no column name to compressionType is set right? So you are right, the field config map will not be set here. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626772295



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -51,11 +52,13 @@
   public FieldConfig(@JsonProperty(value = "name", required = true) String name,
       @JsonProperty(value = "encodingType") @Nullable EncodingType encodingType,
       @JsonProperty(value = "indexType") @Nullable IndexType indexType,
-      @JsonProperty(value = "properties") @Nullable Map<String, String> properties) {
+      @JsonProperty(value = "properties") @Nullable Map<String, String> properties,
+      @JsonProperty(value = "noDictionaryColumnCompressionCodec") @Nullable NoDictionaryColumnCompressionCodec noDictionaryColumnCompressionCodec) {

Review comment:
       Shall we simplify the field name, e.g. `compressionCodec`? This long name is slightly hard to config, and we don't need to separate the codec of raw vs dictionary encoded

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/LuceneRealtimeClusterIntegrationTest.java
##########
@@ -102,7 +102,7 @@ protected String getSortedColumn() {
   @Override
   protected List<FieldConfig> getFieldConfigs() {
     return Collections.singletonList(
-        new FieldConfig(TEXT_COLUMN_NAME, FieldConfig.EncodingType.RAW, FieldConfig.IndexType.TEXT, null));
+        new FieldConfig(TEXT_COLUMN_NAME, FieldConfig.EncodingType.RAW, FieldConfig.IndexType.TEXT, null,null));

Review comment:
       Reformat

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/util/TableConfigUtilsTest.java
##########
@@ -712,14 +712,36 @@ public void testValidateFieldConfig() {
         .setNoDictionaryColumns(Arrays.asList("myCol1")).build();
     try {
       FieldConfig fieldConfig =
-          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null);
+          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null, null);
       tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
       TableConfigUtils.validate(tableConfig, schema);
       Assert.fail("Should fail since field name is not present in schema");
     } catch (Exception e) {
       Assert.assertEquals(e.getMessage(),
           "Column Name myCol21 defined in field config list must be a valid column defined in the schema");
     }
+
+    tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME).build();
+    try {
+      FieldConfig fieldConfig =
+          new FieldConfig("intCol", FieldConfig.EncodingType.DICTIONARY, null, null, FieldConfig.NoDictionaryColumnCompressionCodec.SNAPPY);
+      tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
+      TableConfigUtils.validate(tableConfig, schema);
+      Assert.fail("Should fail since dictionary encoding does not support compression codec snappy ");

Review comment:
       (nit) extra space in the end

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -51,11 +52,13 @@
   public FieldConfig(@JsonProperty(value = "name", required = true) String name,
       @JsonProperty(value = "encodingType") @Nullable EncodingType encodingType,
       @JsonProperty(value = "indexType") @Nullable IndexType indexType,
-      @JsonProperty(value = "properties") @Nullable Map<String, String> properties) {
+      @JsonProperty(value = "properties") @Nullable Map<String, String> properties,
+      @JsonProperty(value = "noDictionaryColumnCompressionCodec") @Nullable NoDictionaryColumnCompressionCodec noDictionaryColumnCompressionCodec) {

Review comment:
       Move it in front of `properties` to match the declaration order




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626087902



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.compression;
+
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.segment.spi.compression.ChunkCompressor;
+
+/**
+ * Implementation of {@link ChunkCompressor} using Zstandard(Zstd).
+ */
+public class ZstandardCompressor implements ChunkCompressor {

Review comment:
       (nit) using Zstandard compression algorithm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832315766


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6876](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c456fd6) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1c09b7866b4d1c8267847f464e9c4618787b5a15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1c09b78) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   > :exclamation: Current head c456fd6 differs from pull request most recent head c527ca7. Consider uploading reports for the commit c527ca7 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6876/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6876      +/-   ##
   ============================================
   + Coverage     65.48%   65.51%   +0.03%     
     Complexity       12       12              
   ============================================
     Files          1421     1423       +2     
     Lines         69980    70005      +25     
     Branches      10112    10116       +4     
   ============================================
   + Hits          45825    45865      +40     
   + Misses        20874    20858      -16     
   - Partials       3281     3282       +1     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | unittests | `65.51% <100.00%> (+0.03%)` | `12.00 <0.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...t/local/io/compression/ChunkCompressorFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9DaHVua0NvbXByZXNzb3JGYWN0b3J5LmphdmE=) | `50.00% <100.00%> (+10.00%)` | `0.00 <0.00> (ø)` | |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | [...nt/local/io/compression/ZstandardDecompressor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmREZWNvbXByZXNzb3IuamF2YQ==) | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | [...he/pinot/segment/local/utils/TableConfigUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9UYWJsZUNvbmZpZ1V0aWxzLmphdmE=) | `78.47% <100.00%> (+0.06%)` | `0.00 <0.00> (ø)` | |
   | [.../segment/spi/compression/ChunkCompressionType.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NvbXByZXNzaW9uL0NodW5rQ29tcHJlc3Npb25UeXBlLmphdmE=) | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...ot/segment/spi/creator/SegmentGeneratorConfig.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvU2VnbWVudEdlbmVyYXRvckNvbmZpZy5qYXZh) | `81.00% <100.00%> (+0.65%)` | `0.00 <0.00> (ø)` | |
   | [...org/apache/pinot/spi/config/table/FieldConfig.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZpZWxkQ29uZmlnLmphdmE=) | `96.66% <100.00%> (+0.51%)` | `0.00 <0.00> (ø)` | |
   | [...lix/core/realtime/PinotRealtimeSegmentManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3JlYWx0aW1lL1Bpbm90UmVhbHRpbWVTZWdtZW50TWFuYWdlci5qYXZh) | `78.46% <0.00%> (-1.03%)` | `0.00% <0.00%> (ø%)` | |
   | [...rg/apache/pinot/broker/routing/RoutingManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9Sb3V0aW5nTWFuYWdlci5qYXZh) | `71.94% <0.00%> (-0.72%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [8 more](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [1c09b78...c527ca7](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626095384



##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/CompressionCodecQueriesTest.java
##########
@@ -0,0 +1,264 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.queries;
+
+import java.io.File;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.core.operator.blocks.IntermediateResultsBlock;
+import org.apache.pinot.core.operator.query.SelectionOnlyOperator;
+import org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader;
+import org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl;
+import org.apache.pinot.segment.local.segment.index.loader.IndexLoadingConfig;
+import org.apache.pinot.segment.local.segment.readers.GenericRowRecordReader;
+import org.apache.pinot.segment.spi.ImmutableSegment;
+import org.apache.pinot.segment.spi.IndexSegment;
+import org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig;
+import org.apache.pinot.spi.config.table.FieldConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+
+/**
+ * Functional tests for compression type feature.
+ * The tests use three kinds of input data
+ * (1) string
+ * (2) integer
+ * (3) long
+ */
+public class CompressionCodecQueriesTest extends BaseQueriesTest {
+  private static final File INDEX_DIR = new File(FileUtils.getTempDirectory(), "CompressionCodecQueriesTest");
+  private static final String TABLE_NAME = "MyTable";
+  private static final String SEGMENT_NAME = "testSegment";
+
+  private static final String SNAPPY_STRING = "SNAPPY_STRING";
+  private static final String PASS_THROUGH_STRING = "PASS_THROUGH_STRING";
+  private static final String ZSTANDARD_STRING = "ZSTANDARD_STRING";
+
+  private static final String SNAPPY_LONG = "SNAPPY_LONG";
+  private static final String PASS_THROUGH_LONG = "PASS_THROUGH_LONG";
+  private static final String ZSTANDARD_LONG = "ZSTANDARD_LONG";
+
+  private static final String SNAPPY_INTEGER = "SNAPPY_INTEGER";
+  private static final String PASS_THROUGH_INTEGER = "PASS_THROUGH_INTEGER";
+  private static final String ZSTANDARD_INTEGER = "ZSTANDARD_INTEGER";
+
+  private static final List<String> RAW_SNAPPY_INDEX_COLUMNS = Arrays
+      .asList(SNAPPY_STRING, SNAPPY_LONG, SNAPPY_INTEGER);
+
+  private static final List<String> RAW_ZSTANDARD_INDEX_COLUMNS = Arrays
+      .asList(ZSTANDARD_STRING, ZSTANDARD_LONG, ZSTANDARD_INTEGER);
+
+  private static final List<String> RAW_PASS_THROUGH_INDEX_COLUMNS = Arrays
+      .asList(PASS_THROUGH_STRING, PASS_THROUGH_LONG, PASS_THROUGH_INTEGER);
+
+  private final List<GenericRow> _rows = new ArrayList<>();
+
+  private IndexSegment _indexSegment;
+  private List<IndexSegment> _indexSegments;
+  private List<GenericRow> rows;
+
+  @Override
+  protected String getFilter() {
+    return "";
+  }
+
+  @Override
+  protected IndexSegment getIndexSegment() {
+    return _indexSegment;
+  }
+
+  @Override
+  protected List<IndexSegment> getIndexSegments() {
+    return _indexSegments;
+  }
+
+  @BeforeClass
+  public void setUp()
+      throws Exception {
+    FileUtils.deleteQuietly(INDEX_DIR);
+
+    buildSegment();
+    IndexLoadingConfig indexLoadingConfig = new IndexLoadingConfig();
+    Set<String> indexColumns = new HashSet<>();
+    indexColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+
+    indexLoadingConfig.getNoDictionaryColumns().addAll(indexColumns);
+    ImmutableSegment immutableSegment =
+        ImmutableSegmentLoader.load(new File(INDEX_DIR, SEGMENT_NAME), indexLoadingConfig);
+    _indexSegment = immutableSegment;
+    _indexSegments = Arrays.asList(immutableSegment, immutableSegment);
+  }
+
+  @AfterClass
+  public void tearDown() {
+    _indexSegment.destroy();
+    FileUtils.deleteQuietly(INDEX_DIR);
+  }
+
+  private void buildSegment()
+      throws Exception {
+    rows = createTestData();
+
+    List<FieldConfig> fieldConfigs = new ArrayList<>(RAW_SNAPPY_INDEX_COLUMNS.size() + RAW_ZSTANDARD_INDEX_COLUMNS.size() + RAW_PASS_THROUGH_INDEX_COLUMNS.size());
+    for (String indexColumn : RAW_SNAPPY_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.SNAPPY));
+    }
+
+    for (String indexColumn : RAW_ZSTANDARD_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.ZSTANDARD));
+    }
+
+    for (String indexColumn : RAW_PASS_THROUGH_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.PASS_THROUGH));
+    }
+
+    List<String> _noDictionaryColumns = new ArrayList<>();
+    _noDictionaryColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+
+    TableConfig tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME)
+        .setNoDictionaryColumns(_noDictionaryColumns)
+        .setFieldConfigList(fieldConfigs).build();
+    Schema schema = new Schema.SchemaBuilder().setSchemaName(TABLE_NAME)
+        .addSingleValueDimension(SNAPPY_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(PASS_THROUGH_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(ZSTANDARD_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(SNAPPY_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(ZSTANDARD_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(PASS_THROUGH_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(SNAPPY_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(ZSTANDARD_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(PASS_THROUGH_LONG, FieldSpec.DataType.LONG)
+        .build();
+    SegmentGeneratorConfig config = new SegmentGeneratorConfig(tableConfig, schema);
+    config.setOutDir(INDEX_DIR.getPath());
+    config.setTableName(TABLE_NAME);
+    config.setSegmentName(SEGMENT_NAME);
+    SegmentIndexCreationDriverImpl driver = new SegmentIndexCreationDriverImpl();
+    try (RecordReader recordReader = new GenericRowRecordReader(rows)) {
+      driver.init(config, recordReader);
+      driver.build();
+    }
+  }
+
+  private List<GenericRow> createTestData()
+      throws Exception {
+    List<GenericRow> rows = new ArrayList<>();
+
+    //Generate random data
+    int rowLength = 1000;
+    Random random = new Random();
+    String[] tempStringRows = new String[rowLength];
+    Integer[] tempIntRows = new Integer[rowLength];
+    Long[] tempLongRows = new Long[rowLength];
+
+    for (int i = 0; i < rowLength; i++) {
+      tempStringRows[i] = RandomStringUtils.random(random.nextInt(100), true, true);;
+      tempIntRows[i] = RandomUtils.nextInt();
+      tempLongRows[i] = RandomUtils.nextLong();
+    }
+
+    for (int i = 0; i < rowLength; i++) {
+      GenericRow row = new GenericRow();
+      row.putValue(SNAPPY_STRING, tempStringRows[i]);
+      row.putValue(ZSTANDARD_STRING, tempStringRows[i]);
+      row.putValue(PASS_THROUGH_STRING, tempStringRows[i]);
+      row.putValue(SNAPPY_INTEGER, tempIntRows[i]);
+      row.putValue(ZSTANDARD_INTEGER, tempIntRows[i]);
+      row.putValue(PASS_THROUGH_INTEGER, tempIntRows[i]);
+      row.putValue(SNAPPY_LONG, tempLongRows[i]);
+      row.putValue(ZSTANDARD_LONG, tempLongRows[i]);
+      row.putValue(PASS_THROUGH_LONG, tempLongRows[i]);
+      rows.add(row);
+    }
+    return rows;
+  }
+
+  /**
+   * Tests for basic compression codec queries.
+   */
+  @Test
+  public void testQueriesWithSnappyCompressionCodec()
+      throws Exception {
+
+    String query =
+        "SELECT SNAPPY_STRING, ZSTANDARD_STRING, PASS_THROUGH_STRING, SNAPPY_INTEGER, ZSTANDARD_INTEGER, PASS_THROUGH_INTEGER, "
+            + "SNAPPY_LONG, ZSTANDARD_LONG, PASS_THROUGH_LONG FROM MyTable LIMIT 1000";
+    ArrayList<Serializable[]> expected = new ArrayList<>();
+

Review comment:
       Can we please add another test query with filter on one or more of these raw columns ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832251598


   > Nit: You may want to note that if someone upgrades and then enables ZSTD for new segments, and for some reason has to roll back their deployment, the segments will not be readable.
   > One way of avoiding this will be to introduce a config to enable the feature, but that could be an overkill. It should be enough to note in the release notes that this is the case for the next release (so, basically mark it for release notes)
   
   Table level config is there. I mean it was already there. Just reused the same in FieldConfig (the new model). 
   Yes, let's label it for release notes and also call the upgrade aspect in PR description. 
   
   @GSharayu let's also mention that just like other table level changes (column renaming, type changing, column dropping, index dropping) which are currently not allowed, changing the compression codec on an existing noDictionary column from snappy to zstd or vice-versa will not happy since we currently don't have a mechanism for doing this in-place in the segment file. Newly pushed segments will pick up the new codec and since the codec type is written into the file header, we will be able to read both old and new segments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626830017



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -51,11 +52,13 @@
   public FieldConfig(@JsonProperty(value = "name", required = true) String name,
       @JsonProperty(value = "encodingType") @Nullable EncodingType encodingType,
       @JsonProperty(value = "indexType") @Nullable IndexType indexType,
-      @JsonProperty(value = "properties") @Nullable Map<String, String> properties) {
+      @JsonProperty(value = "properties") @Nullable Map<String, String> properties,
+      @JsonProperty(value = "noDictionaryColumnCompressionCodec") @Nullable NoDictionaryColumnCompressionCodec noDictionaryColumnCompressionCodec) {

Review comment:
       `compressionCodec` works, will update!
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626085761



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -69,6 +72,10 @@ public FieldConfig(@JsonProperty(value = "name", required = true) String name,
     INVERTED, SORTED, TEXT, FST, H3
   }
 
+  public enum NoDictionaryColumnCompressorCodec {

Review comment:
       (nit) Suggest using NoDictionaryColumnCompressionCodec




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832468828


   > Looks like this library is using JNI.Pinot is going to be dependent on the OS architecture with this feature PR and it's a backward-incompatible change.
   > 
   > We should definitely discuss this and consider alternative implementations
   
   @kishoreg  A lot of the compression algorithms are not natively available in Java since they are written in C/C++. Pure Java only implementations which are well tested are unlikely to be available especially for algorithms not yet as popular as Snappy.
   
   Apache commons library has implementation for ZSTD but the API is byte array based not direct byte buffer based. It also relies on JNI bindings underneath. 
   
   Pretty much all low level stuff is available in Java via JNI bridge. I don't think there is any platform specific issue. This library is also used in other Java based projects (e.g Arrow)
   
   There is nothing backward incompatible about this change. I think you mean forward compatibility. If someone upgrades to newer version of Pinot and enables ZSTD and then downgrades, then old release can't read the new segments. We have labeled with release notes and kept the label intact. 
   
   Backward compatibility is still there since we can still read old segments with SNAPPY and SNAPPY continues to be default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia merged pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia merged pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626094689



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkIntegerCompressionSpeed.java
##########
@@ -0,0 +1,168 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.TimeUnit;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.segment.local.io.compression.SnappyCompressor;
+import org.apache.pinot.segment.local.io.compression.SnappyDecompressor;
+import org.apache.pinot.segment.local.io.compression.ZstandardCompressor;
+import org.apache.pinot.segment.local.io.compression.ZstandardDecompressor;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+
+
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@Fork(1)
+@Warmup(iterations = 3)
+@Measurement(iterations = 5)
+@State(Scope.Benchmark)
+// Test to get memory statistics for snappy and zstandard integer compression techniques
+public class BenchmarkIntegerCompressionSpeed {

Review comment:
       Same for other benchmark classes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-833012603


   > LGTM. @kishoreg Do you still have concern on this PR? This library seems commonly used in the maven repository (115 usages)
   
   Yes the native library comes embedded in the java library and it automatically picks up the right library. Same thing for snappy-java we have been using for quite some time and also for this one that uses zstd-jni


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626744351



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +267,19 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressorCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(tableConfig.getIndexingConfig(), rawIndexColumns);

Review comment:
       updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626130909



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +267,19 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressorCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(tableConfig.getIndexingConfig(), rawIndexColumns);

Review comment:
       In any case, please make sure to run the end-to-end query execution test in debug mode and ensure that segment generation code is correctly picking up the compression codec config




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626077773



##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/CompressionCodecQueriesTest.java
##########
@@ -0,0 +1,264 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.queries;
+
+import java.io.File;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.core.operator.blocks.IntermediateResultsBlock;
+import org.apache.pinot.core.operator.query.SelectionOnlyOperator;
+import org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader;
+import org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl;
+import org.apache.pinot.segment.local.segment.index.loader.IndexLoadingConfig;
+import org.apache.pinot.segment.local.segment.readers.GenericRowRecordReader;
+import org.apache.pinot.segment.spi.ImmutableSegment;
+import org.apache.pinot.segment.spi.IndexSegment;
+import org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig;
+import org.apache.pinot.spi.config.table.FieldConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+
+/**
+ * Functional tests for compression type feature.
+ * The tests use three kinds of input data
+ * (1) string
+ * (2) integer
+ * (3) long
+ */
+public class CompressionCodecQueriesTest extends BaseQueriesTest {
+  private static final File INDEX_DIR = new File(FileUtils.getTempDirectory(), "CompressionCodecQueriesTest");
+  private static final String TABLE_NAME = "MyTable";
+  private static final String SEGMENT_NAME = "testSegment";
+
+  private static final String SNAPPY_STRING = "SNAPPY_STRING";
+  private static final String PASS_THROUGH_STRING = "PASS_THROUGH_STRING";
+  private static final String ZSTANDARD_STRING = "ZSTANDARD_STRING";
+
+  private static final String SNAPPY_LONG = "SNAPPY_LONG";
+  private static final String PASS_THROUGH_LONG = "PASS_THROUGH_LONG";
+  private static final String ZSTANDARD_LONG = "ZSTANDARD_LONG";
+
+  private static final String SNAPPY_INTEGER = "SNAPPY_INTEGER";
+  private static final String PASS_THROUGH_INTEGER = "PASS_THROUGH_INTEGER";
+  private static final String ZSTANDARD_INTEGER = "ZSTANDARD_INTEGER";
+
+  private static final List<String> RAW_SNAPPY_INDEX_COLUMNS = Arrays
+      .asList(SNAPPY_STRING, SNAPPY_LONG, SNAPPY_INTEGER);
+
+  private static final List<String> RAW_ZSTANDARD_INDEX_COLUMNS = Arrays
+      .asList(ZSTANDARD_STRING, ZSTANDARD_LONG, ZSTANDARD_INTEGER);
+
+  private static final List<String> RAW_PASS_THROUGH_INDEX_COLUMNS = Arrays
+      .asList(PASS_THROUGH_STRING, PASS_THROUGH_LONG, PASS_THROUGH_INTEGER);
+
+  private final List<GenericRow> _rows = new ArrayList<>();
+
+  private IndexSegment _indexSegment;
+  private List<IndexSegment> _indexSegments;
+  private List<GenericRow> rows;
+
+  @Override
+  protected String getFilter() {
+    return "";
+  }
+
+  @Override
+  protected IndexSegment getIndexSegment() {
+    return _indexSegment;
+  }
+
+  @Override
+  protected List<IndexSegment> getIndexSegments() {
+    return _indexSegments;
+  }
+
+  @BeforeClass
+  public void setUp()
+      throws Exception {
+    FileUtils.deleteQuietly(INDEX_DIR);
+
+    buildSegment();
+    IndexLoadingConfig indexLoadingConfig = new IndexLoadingConfig();
+    Set<String> indexColumns = new HashSet<>();
+    indexColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+
+    indexLoadingConfig.getNoDictionaryColumns().addAll(indexColumns);
+    ImmutableSegment immutableSegment =
+        ImmutableSegmentLoader.load(new File(INDEX_DIR, SEGMENT_NAME), indexLoadingConfig);
+    _indexSegment = immutableSegment;
+    _indexSegments = Arrays.asList(immutableSegment, immutableSegment);
+  }
+
+  @AfterClass
+  public void tearDown() {
+    _indexSegment.destroy();
+    FileUtils.deleteQuietly(INDEX_DIR);
+  }
+
+  private void buildSegment()
+      throws Exception {
+    rows = createTestData();
+
+    List<FieldConfig> fieldConfigs = new ArrayList<>(RAW_SNAPPY_INDEX_COLUMNS.size() + RAW_ZSTANDARD_INDEX_COLUMNS.size() + RAW_PASS_THROUGH_INDEX_COLUMNS.size());
+    for (String indexColumn : RAW_SNAPPY_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.SNAPPY));
+    }
+
+    for (String indexColumn : RAW_ZSTANDARD_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.ZSTANDARD));
+    }
+
+    for (String indexColumn : RAW_PASS_THROUGH_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.PASS_THROUGH));
+    }
+
+    List<String> _noDictionaryColumns = new ArrayList<>();
+    _noDictionaryColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+
+    TableConfig tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME)
+        .setNoDictionaryColumns(_noDictionaryColumns)
+        .setFieldConfigList(fieldConfigs).build();
+    Schema schema = new Schema.SchemaBuilder().setSchemaName(TABLE_NAME)
+        .addSingleValueDimension(SNAPPY_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(PASS_THROUGH_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(ZSTANDARD_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(SNAPPY_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(ZSTANDARD_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(PASS_THROUGH_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(SNAPPY_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(ZSTANDARD_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(PASS_THROUGH_LONG, FieldSpec.DataType.LONG)
+        .build();
+    SegmentGeneratorConfig config = new SegmentGeneratorConfig(tableConfig, schema);
+    config.setOutDir(INDEX_DIR.getPath());
+    config.setTableName(TABLE_NAME);
+    config.setSegmentName(SEGMENT_NAME);
+    SegmentIndexCreationDriverImpl driver = new SegmentIndexCreationDriverImpl();
+    try (RecordReader recordReader = new GenericRowRecordReader(rows)) {
+      driver.init(config, recordReader);
+      driver.build();
+    }
+  }
+
+  private List<GenericRow> createTestData()
+      throws Exception {
+    List<GenericRow> rows = new ArrayList<>();
+
+    //Generate random data
+    int rowLength = 1000;
+    Random random = new Random();
+    String[] tempStringRows = new String[rowLength];
+    Integer[] tempIntRows = new Integer[rowLength];
+    Long[] tempLongRows = new Long[rowLength];
+
+    for (int i = 0; i < rowLength; i++) {
+      tempStringRows[i] = RandomStringUtils.random(random.nextInt(100), true, true);;
+      tempIntRows[i] = RandomUtils.nextInt();
+      tempLongRows[i] = RandomUtils.nextLong();
+    }
+
+    for (int i = 0; i < rowLength; i++) {
+      GenericRow row = new GenericRow();
+      row.putValue(SNAPPY_STRING, tempStringRows[i]);
+      row.putValue(ZSTANDARD_STRING, tempStringRows[i]);
+      row.putValue(PASS_THROUGH_STRING, tempStringRows[i]);
+      row.putValue(SNAPPY_INTEGER, tempIntRows[i]);
+      row.putValue(ZSTANDARD_INTEGER, tempIntRows[i]);
+      row.putValue(PASS_THROUGH_INTEGER, tempIntRows[i]);
+      row.putValue(SNAPPY_LONG, tempLongRows[i]);
+      row.putValue(ZSTANDARD_LONG, tempLongRows[i]);
+      row.putValue(PASS_THROUGH_LONG, tempLongRows[i]);
+      rows.add(row);
+    }
+    return rows;
+  }
+
+  /**
+   * Tests for basic compression codec queries.
+   */
+  @Test
+  public void testQueriesWithSnappyCompressionCodec()
+      throws Exception {
+
+    String query =
+        "SELECT SNAPPY_STRING, ZSTANDARD_STRING, PASS_THROUGH_STRING, SNAPPY_INTEGER, ZSTANDARD_INTEGER, PASS_THROUGH_INTEGER, "
+            + "SNAPPY_LONG, ZSTANDARD_LONG, PASS_THROUGH_LONG FROM MyTable LIMIT 1000";
+    ArrayList<Serializable[]> expected = new ArrayList<>();
+
+    for(GenericRow row: rows) {
+      expected.add(new Serializable[]{
+          String.valueOf(row.getValue(SNAPPY_STRING)), String.valueOf(row.getValue(ZSTANDARD_STRING)), String.valueOf(row.getValue(PASS_THROUGH_STRING)),
+          (Integer) row.getValue(SNAPPY_INTEGER), (Integer) row.getValue(ZSTANDARD_INTEGER), (Integer) row.getValue(PASS_THROUGH_INTEGER),
+          (Long) row.getValue(SNAPPY_LONG), (Long)row.getValue(ZSTANDARD_LONG), (Long) row.getValue(PASS_THROUGH_LONG),
+       });
+    }
+    testSelectQueryHelper(query, expected.size(), expected);
+  }
+
+  /*
+   * Helper methods for tests
+   */
+  private void testSelectQueryHelper(String query, int expectedResultSize, List<Serializable[]> expectedResults)
+      throws Exception {
+    SelectionOnlyOperator operator = getOperatorForPqlQuery(query);

Review comment:
       Use getOperatorForSqlQuery




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626849653



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -51,11 +52,13 @@
   public FieldConfig(@JsonProperty(value = "name", required = true) String name,
       @JsonProperty(value = "encodingType") @Nullable EncodingType encodingType,
       @JsonProperty(value = "indexType") @Nullable IndexType indexType,
-      @JsonProperty(value = "properties") @Nullable Map<String, String> properties) {
+      @JsonProperty(value = "properties") @Nullable Map<String, String> properties,
+      @JsonProperty(value = "noDictionaryColumnCompressionCodec") @Nullable NoDictionaryColumnCompressionCodec noDictionaryColumnCompressionCodec) {

Review comment:
       done!

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -51,11 +52,13 @@
   public FieldConfig(@JsonProperty(value = "name", required = true) String name,
       @JsonProperty(value = "encodingType") @Nullable EncodingType encodingType,
       @JsonProperty(value = "indexType") @Nullable IndexType indexType,
-      @JsonProperty(value = "properties") @Nullable Map<String, String> properties) {
+      @JsonProperty(value = "properties") @Nullable Map<String, String> properties,
+      @JsonProperty(value = "noDictionaryColumnCompressionCodec") @Nullable NoDictionaryColumnCompressionCodec noDictionaryColumnCompressionCodec) {

Review comment:
       done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626087784



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardDecompressor.java
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.compression;
+
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.segment.spi.compression.ChunkDecompressor;
+
+/**
+ * Implementation of {@link ChunkDecompressor} using Zstandard(Zstd).

Review comment:
       (nit) using Zstandard compression algorithm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626272227



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +266,21 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    Map<String, String> rawIndexColumnsToCompressionTypeMap = new HashMap<>();
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressionCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+          rawIndexColumnsToCompressionTypeMap.put(fieldConfig.getName(), fieldConfig.getNoDictionaryColumnCompressionCodec().name());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(rawIndexColumns, rawIndexColumnsToCompressionTypeMap);

Review comment:
       The current code won't be able to handle the compression config set through old way
   
   Lets say column1 and column2 in an existing table T.
   
   For column1, someone specified compression config through old way (noDictionaryConfig map)
   For column2, someone specified compression config (SNAPPY, ZSTD) through new way (FieldConfig)
   
   The current code won't be able to preserve the config of column1 which we need to handle until everything is migrated from existing way to FieldConfig
   
   I haven't seen the old way of config being used so far at Li in production. But we don't know if someone in open source is using it or not. If they are, it will break things for them




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626095832



##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/CompressionCodecQueriesTest.java
##########
@@ -0,0 +1,264 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.queries;
+
+import java.io.File;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.core.operator.blocks.IntermediateResultsBlock;
+import org.apache.pinot.core.operator.query.SelectionOnlyOperator;
+import org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader;
+import org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl;
+import org.apache.pinot.segment.local.segment.index.loader.IndexLoadingConfig;
+import org.apache.pinot.segment.local.segment.readers.GenericRowRecordReader;
+import org.apache.pinot.segment.spi.ImmutableSegment;
+import org.apache.pinot.segment.spi.IndexSegment;
+import org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig;
+import org.apache.pinot.spi.config.table.FieldConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+
+/**
+ * Functional tests for compression type feature.
+ * The tests use three kinds of input data
+ * (1) string
+ * (2) integer
+ * (3) long
+ */
+public class CompressionCodecQueriesTest extends BaseQueriesTest {

Review comment:
       (nit) Suggest renaming to `NoDictionaryCompressionQueriesTest`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626968162



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +263,19 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {

Review comment:
       done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626081079



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
##########
@@ -573,6 +573,9 @@ private static void validateFieldConfigList(@Nullable List<FieldConfig> fieldCon
             Preconditions.checkArgument(!noDictionaryColumns.contains(columnName),
                 "FieldConfig encoding type is different from indexingConfig for column: " + columnName);
           }
+          Preconditions.checkArgument(fieldConfig.getNoDictionaryColumnCompressorCodec() == null,
+              "FieldConfig column compression codec is only supported for single value raw encoding type");

Review comment:
       (nit) Also add "Set compression codec to null for dictionary encoding type"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832486737


   @kishoreg , Even the current Snappy library that is being used in Pinot uses JNI underneath for actual compress/decompress


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832224423


   Nit: You may want to note that if someone upgrades and then enables ZSTD for new segments, and for some reason has to roll back their deployment, the segments will not be readable.
   One way of avoiding this will be to introduce a config to enable the feature, but that could be an overkill. It should be enough to note in the release notes that this is the case for the next release (so, basically mark it for release notes)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626743674



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardDecompressor.java
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.compression;
+
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.segment.spi.compression.ChunkDecompressor;
+
+/**
+ * Implementation of {@link ChunkDecompressor} using Zstandard(Zstd).

Review comment:
       done!

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.compression;
+
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.segment.spi.compression.ChunkCompressor;
+
+/**
+ * Implementation of {@link ChunkCompressor} using Zstandard(Zstd).
+ */
+public class ZstandardCompressor implements ChunkCompressor {

Review comment:
       done!

##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkIntegerCompressionSpeed.java
##########
@@ -0,0 +1,168 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.TimeUnit;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.segment.local.io.compression.SnappyCompressor;
+import org.apache.pinot.segment.local.io.compression.SnappyDecompressor;
+import org.apache.pinot.segment.local.io.compression.ZstandardCompressor;
+import org.apache.pinot.segment.local.io.compression.ZstandardDecompressor;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+
+
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@Fork(1)
+@Warmup(iterations = 3)
+@Measurement(iterations = 5)
+@State(Scope.Benchmark)
+// Test to get memory statistics for snappy and zstandard integer compression techniques
+public class BenchmarkIntegerCompressionSpeed {

Review comment:
       updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626744028



##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/CompressionCodecQueriesTest.java
##########
@@ -0,0 +1,264 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.queries;
+
+import java.io.File;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.core.operator.blocks.IntermediateResultsBlock;
+import org.apache.pinot.core.operator.query.SelectionOnlyOperator;
+import org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader;
+import org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl;
+import org.apache.pinot.segment.local.segment.index.loader.IndexLoadingConfig;
+import org.apache.pinot.segment.local.segment.readers.GenericRowRecordReader;
+import org.apache.pinot.segment.spi.ImmutableSegment;
+import org.apache.pinot.segment.spi.IndexSegment;
+import org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig;
+import org.apache.pinot.spi.config.table.FieldConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+
+/**
+ * Functional tests for compression type feature.
+ * The tests use three kinds of input data
+ * (1) string
+ * (2) integer
+ * (3) long
+ */
+public class CompressionCodecQueriesTest extends BaseQueriesTest {
+  private static final File INDEX_DIR = new File(FileUtils.getTempDirectory(), "CompressionCodecQueriesTest");
+  private static final String TABLE_NAME = "MyTable";
+  private static final String SEGMENT_NAME = "testSegment";
+
+  private static final String SNAPPY_STRING = "SNAPPY_STRING";
+  private static final String PASS_THROUGH_STRING = "PASS_THROUGH_STRING";
+  private static final String ZSTANDARD_STRING = "ZSTANDARD_STRING";
+
+  private static final String SNAPPY_LONG = "SNAPPY_LONG";
+  private static final String PASS_THROUGH_LONG = "PASS_THROUGH_LONG";
+  private static final String ZSTANDARD_LONG = "ZSTANDARD_LONG";
+
+  private static final String SNAPPY_INTEGER = "SNAPPY_INTEGER";
+  private static final String PASS_THROUGH_INTEGER = "PASS_THROUGH_INTEGER";
+  private static final String ZSTANDARD_INTEGER = "ZSTANDARD_INTEGER";
+
+  private static final List<String> RAW_SNAPPY_INDEX_COLUMNS = Arrays
+      .asList(SNAPPY_STRING, SNAPPY_LONG, SNAPPY_INTEGER);
+
+  private static final List<String> RAW_ZSTANDARD_INDEX_COLUMNS = Arrays
+      .asList(ZSTANDARD_STRING, ZSTANDARD_LONG, ZSTANDARD_INTEGER);
+
+  private static final List<String> RAW_PASS_THROUGH_INDEX_COLUMNS = Arrays
+      .asList(PASS_THROUGH_STRING, PASS_THROUGH_LONG, PASS_THROUGH_INTEGER);
+
+  private final List<GenericRow> _rows = new ArrayList<>();
+
+  private IndexSegment _indexSegment;
+  private List<IndexSegment> _indexSegments;
+  private List<GenericRow> rows;
+
+  @Override
+  protected String getFilter() {
+    return "";
+  }
+
+  @Override
+  protected IndexSegment getIndexSegment() {
+    return _indexSegment;
+  }
+
+  @Override
+  protected List<IndexSegment> getIndexSegments() {
+    return _indexSegments;
+  }
+
+  @BeforeClass
+  public void setUp()
+      throws Exception {
+    FileUtils.deleteQuietly(INDEX_DIR);
+
+    buildSegment();
+    IndexLoadingConfig indexLoadingConfig = new IndexLoadingConfig();
+    Set<String> indexColumns = new HashSet<>();
+    indexColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+
+    indexLoadingConfig.getNoDictionaryColumns().addAll(indexColumns);
+    ImmutableSegment immutableSegment =
+        ImmutableSegmentLoader.load(new File(INDEX_DIR, SEGMENT_NAME), indexLoadingConfig);
+    _indexSegment = immutableSegment;
+    _indexSegments = Arrays.asList(immutableSegment, immutableSegment);
+  }
+
+  @AfterClass
+  public void tearDown() {
+    _indexSegment.destroy();
+    FileUtils.deleteQuietly(INDEX_DIR);
+  }
+
+  private void buildSegment()
+      throws Exception {
+    rows = createTestData();
+
+    List<FieldConfig> fieldConfigs = new ArrayList<>(RAW_SNAPPY_INDEX_COLUMNS.size() + RAW_ZSTANDARD_INDEX_COLUMNS.size() + RAW_PASS_THROUGH_INDEX_COLUMNS.size());
+    for (String indexColumn : RAW_SNAPPY_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.SNAPPY));
+    }
+
+    for (String indexColumn : RAW_ZSTANDARD_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.ZSTANDARD));
+    }
+
+    for (String indexColumn : RAW_PASS_THROUGH_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.PASS_THROUGH));
+    }
+
+    List<String> _noDictionaryColumns = new ArrayList<>();
+    _noDictionaryColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+
+    TableConfig tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME)
+        .setNoDictionaryColumns(_noDictionaryColumns)
+        .setFieldConfigList(fieldConfigs).build();
+    Schema schema = new Schema.SchemaBuilder().setSchemaName(TABLE_NAME)
+        .addSingleValueDimension(SNAPPY_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(PASS_THROUGH_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(ZSTANDARD_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(SNAPPY_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(ZSTANDARD_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(PASS_THROUGH_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(SNAPPY_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(ZSTANDARD_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(PASS_THROUGH_LONG, FieldSpec.DataType.LONG)
+        .build();
+    SegmentGeneratorConfig config = new SegmentGeneratorConfig(tableConfig, schema);
+    config.setOutDir(INDEX_DIR.getPath());
+    config.setTableName(TABLE_NAME);
+    config.setSegmentName(SEGMENT_NAME);
+    SegmentIndexCreationDriverImpl driver = new SegmentIndexCreationDriverImpl();
+    try (RecordReader recordReader = new GenericRowRecordReader(rows)) {
+      driver.init(config, recordReader);
+      driver.build();
+    }
+  }
+
+  private List<GenericRow> createTestData()
+      throws Exception {
+    List<GenericRow> rows = new ArrayList<>();
+
+    //Generate random data
+    int rowLength = 1000;
+    Random random = new Random();
+    String[] tempStringRows = new String[rowLength];
+    Integer[] tempIntRows = new Integer[rowLength];
+    Long[] tempLongRows = new Long[rowLength];
+
+    for (int i = 0; i < rowLength; i++) {
+      tempStringRows[i] = RandomStringUtils.random(random.nextInt(100), true, true);;
+      tempIntRows[i] = RandomUtils.nextInt();
+      tempLongRows[i] = RandomUtils.nextLong();
+    }
+
+    for (int i = 0; i < rowLength; i++) {
+      GenericRow row = new GenericRow();
+      row.putValue(SNAPPY_STRING, tempStringRows[i]);
+      row.putValue(ZSTANDARD_STRING, tempStringRows[i]);
+      row.putValue(PASS_THROUGH_STRING, tempStringRows[i]);
+      row.putValue(SNAPPY_INTEGER, tempIntRows[i]);
+      row.putValue(ZSTANDARD_INTEGER, tempIntRows[i]);
+      row.putValue(PASS_THROUGH_INTEGER, tempIntRows[i]);
+      row.putValue(SNAPPY_LONG, tempLongRows[i]);
+      row.putValue(ZSTANDARD_LONG, tempLongRows[i]);
+      row.putValue(PASS_THROUGH_LONG, tempLongRows[i]);
+      rows.add(row);
+    }
+    return rows;
+  }
+
+  /**
+   * Tests for basic compression codec queries.
+   */
+  @Test
+  public void testQueriesWithSnappyCompressionCodec()
+      throws Exception {
+
+    String query =
+        "SELECT SNAPPY_STRING, ZSTANDARD_STRING, PASS_THROUGH_STRING, SNAPPY_INTEGER, ZSTANDARD_INTEGER, PASS_THROUGH_INTEGER, "
+            + "SNAPPY_LONG, ZSTANDARD_LONG, PASS_THROUGH_LONG FROM MyTable LIMIT 1000";
+    ArrayList<Serializable[]> expected = new ArrayList<>();
+

Review comment:
       added few test scenarios!

##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/CompressionCodecQueriesTest.java
##########
@@ -0,0 +1,264 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.queries;
+
+import java.io.File;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.core.operator.blocks.IntermediateResultsBlock;
+import org.apache.pinot.core.operator.query.SelectionOnlyOperator;
+import org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader;
+import org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl;
+import org.apache.pinot.segment.local.segment.index.loader.IndexLoadingConfig;
+import org.apache.pinot.segment.local.segment.readers.GenericRowRecordReader;
+import org.apache.pinot.segment.spi.ImmutableSegment;
+import org.apache.pinot.segment.spi.IndexSegment;
+import org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig;
+import org.apache.pinot.spi.config.table.FieldConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+
+/**
+ * Functional tests for compression type feature.
+ * The tests use three kinds of input data
+ * (1) string
+ * (2) integer
+ * (3) long
+ */
+public class CompressionCodecQueriesTest extends BaseQueriesTest {

Review comment:
       updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832461035


   Looks like this library is using JNI.Pinot is going to be dependent on the OS architecture with this feature PR and it's a backward-incompatible change. 
   
   We should definitely discuss this and consider alternative implementations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626104265



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +267,19 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressorCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(tableConfig.getIndexingConfig(), rawIndexColumns);

Review comment:
       This doesn't seem right. At line 197 it will check if the map is null or not. If null, then we won't be able to set ZSTD/SNAPPY coming via FieldConfig (from this function). Right ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626087479



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardDecompressor.java
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.compression;
+
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.segment.spi.compression.ChunkDecompressor;
+
+/**
+ * Implementation of {@link ChunkDecompressor} using Zstandard(Zstd).
+ */
+public class ZstandardDecompressor implements ChunkDecompressor {
+  @Override
+  public int decompress(ByteBuffer compressedInput, ByteBuffer decompressedOutput)
+      throws IOException {
+    int decompressedSize = Zstd.decompress(decompressedOutput, compressedInput);
+
+    // Make the output ByteBuffer ready for read.
+    decompressedOutput.flip();

Review comment:
       You might want to add one/two more lines on why flip() is necessary. I remember during debugging, it was probably not obvious unless you know how Zstd is doing internally. So adding some more context will be helpful for anyone else as well who will read/change this code in future




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626744248



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/util/TableConfigUtilsTest.java
##########
@@ -712,14 +712,25 @@ public void testValidateFieldConfig() {
         .setNoDictionaryColumns(Arrays.asList("myCol1")).build();
     try {
       FieldConfig fieldConfig =
-          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null);
+          new FieldConfig("myCol21", FieldConfig.EncodingType.RAW, FieldConfig.IndexType.FST, null, null);
       tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
       TableConfigUtils.validate(tableConfig, schema);
       Assert.fail("Should fail since field name is not present in schema");
     } catch (Exception e) {
       Assert.assertEquals(e.getMessage(),
           "Column Name myCol21 defined in field config list must be a valid column defined in the schema");
     }
+
+    tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME).build();
+    try {
+      FieldConfig fieldConfig =
+          new FieldConfig("intCol", FieldConfig.EncodingType.DICTIONARY, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.ZSTANDARD);
+      tableConfig.setFieldConfigList(Arrays.asList(fieldConfig));
+      TableConfigUtils.validate(tableConfig, schema);
+      Assert.fail("Should fail since dictionary encoding does not support compression codec zstandard ");
+    } catch (Exception e) {

Review comment:
       yes, added both test cases 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832251598


   > Nit: You may want to note that if someone upgrades and then enables ZSTD for new segments, and for some reason has to roll back their deployment, the segments will not be readable.
   > One way of avoiding this will be to introduce a config to enable the feature, but that could be an overkill. It should be enough to note in the release notes that this is the case for the next release (so, basically mark it for release notes)
   
   Table level config is there. I mean it was already there. Just reused the same in FieldConfig (the new model). 
   Yes, let's label it for release notes and also call the upgrade aspect in PR description. 
   
   @GSharayu let's also mention that just like other table level changes (column renaming, type changing, column dropping, index dropping) which are currently not allowed, changing the compression codec on an existing noDictionary column from snappy to zstd or vice-versa will not happen since we currently don't have a mechanism for doing this in-place in the segment file. Newly pushed segments will pick up the new codec and since the codec type is written into the index buffer header, we will be able to read both old and new segments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626094436



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkIntegerCompressionSpeed.java
##########
@@ -0,0 +1,168 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.TimeUnit;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.segment.local.io.compression.SnappyCompressor;
+import org.apache.pinot.segment.local.io.compression.SnappyDecompressor;
+import org.apache.pinot.segment.local.io.compression.ZstandardCompressor;
+import org.apache.pinot.segment.local.io.compression.ZstandardDecompressor;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+
+
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@Fork(1)
+@Warmup(iterations = 3)
+@Measurement(iterations = 5)
+@State(Scope.Benchmark)
+// Test to get memory statistics for snappy and zstandard integer compression techniques
+public class BenchmarkIntegerCompressionSpeed {

Review comment:
       (nit) suggest naming the class as `BenchmarkNoDictionaryIntegerCompression`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626128003



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +267,19 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressorCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(tableConfig.getIndexingConfig(), rawIndexColumns);

Review comment:
       I think we should not use the old method here. Let it be there. Just remove the call to it at line 280
   
   Instead, we can probably do the following. Wdyt ?
   
   `_rawIndexCreationColumns.add(fieldConfig.getName())` at line 276
   
   `_rawIndexCompressionType.put(fieldConfig.getName(), ChunkCompressionType.valueOf(fieldConfig.getNoDictionaryColumnCompressorCodec())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626232264



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +266,21 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    Map<String, String> rawIndexColumnsToCompressionTypeMap = new HashMap<>();
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressionCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+          rawIndexColumnsToCompressionTypeMap.put(fieldConfig.getName(), fieldConfig.getNoDictionaryColumnCompressionCodec().name());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(rawIndexColumns, rawIndexColumnsToCompressionTypeMap);

Review comment:
       I don't think this still fixes the problem. At line 200, the following method will be called 
   
   ```
   public void setRawIndexCompressionType(Map<String, ChunkCompressionType> rawIndexCompressionType) {
       _rawIndexCompressionType.clear();
       _rawIndexCompressionType.putAll(rawIndexCompressionType);
     }
   ```
   
   When it is called again at line 281, clear() will be called and old compression config coming from noDictionaryConfig will be wiped out from it's first invocation. 
   
   I think implementing the suggestion in https://github.com/apache/incubator-pinot/pull/6876#discussion_r626128003 is a simple fix. No need to call `setRawIndexColumnCompressionType`() from `extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626744667



##########
File path: pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
##########
@@ -262,6 +266,21 @@ private void extractH3IndexConfigsFromTableConfig(TableConfig tableConfig) {
     }
   }
 
+  private void extractNoDictionaryColumnCompressionCodecConfigsFromTableConfig(TableConfig tableConfig) {
+    Map<String, String> rawIndexColumnsToCompressionTypeMap = new HashMap<>();
+    List<String> rawIndexColumns = new ArrayList<>();
+    List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+    if (fieldConfigList != null) {
+      for (FieldConfig fieldConfig : fieldConfigList) {
+        if (fieldConfig.getEncodingType() == FieldConfig.EncodingType.RAW && fieldConfig.getNoDictionaryColumnCompressionCodec() != null) {
+          rawIndexColumns.add(fieldConfig.getName());
+          rawIndexColumnsToCompressionTypeMap.put(fieldConfig.getName(), fieldConfig.getNoDictionaryColumnCompressionCodec().name());
+        }
+      }
+    }
+    setRawIndexColumnCompressionType(rawIndexColumns, rawIndexColumnsToCompressionTypeMap);

Review comment:
       updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter commented on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832315766


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6876](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2211117) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1c09b7866b4d1c8267847f464e9c4618787b5a15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1c09b78) will **increase** coverage by `0.02%`.
   > The diff coverage is `100.00%`.
   
   > :exclamation: Current head 2211117 differs from pull request most recent head 9f23736. Consider uploading reports for the commit 9f23736 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6876/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #6876      +/-   ##
   ============================================
   + Coverage     65.48%   65.51%   +0.02%     
     Complexity       12       12              
   ============================================
     Files          1421     1423       +2     
     Lines         69980    70009      +29     
     Branches      10112    10115       +3     
   ============================================
   + Hits          45825    45863      +38     
   + Misses        20874    20868       -6     
   + Partials       3281     3278       -3     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | unittests | `65.51% <100.00%> (+0.02%)` | `12.00 <0.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...t/local/io/compression/ChunkCompressorFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9DaHVua0NvbXByZXNzb3JGYWN0b3J5LmphdmE=) | `50.00% <100.00%> (+10.00%)` | `0.00 <0.00> (ø)` | |
   | [...ment/local/io/compression/ZstandardCompressor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmRDb21wcmVzc29yLmphdmE=) | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | [...nt/local/io/compression/ZstandardDecompressor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pby9jb21wcmVzc2lvbi9ac3RhbmRhcmREZWNvbXByZXNzb3IuamF2YQ==) | `100.00% <100.00%> (ø)` | `0.00 <0.00> (?)` | |
   | [...he/pinot/segment/local/utils/TableConfigUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9UYWJsZUNvbmZpZ1V0aWxzLmphdmE=) | `78.47% <100.00%> (+0.06%)` | `0.00 <0.00> (ø)` | |
   | [.../segment/spi/compression/ChunkCompressionType.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NvbXByZXNzaW9uL0NodW5rQ29tcHJlc3Npb25UeXBlLmphdmE=) | `100.00% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | [...ot/segment/spi/creator/SegmentGeneratorConfig.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvU2VnbWVudEdlbmVyYXRvckNvbmZpZy5qYXZh) | `82.89% <100.00%> (+2.54%)` | `0.00 <0.00> (ø)` | |
   | [...org/apache/pinot/spi/config/table/FieldConfig.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZpZWxkQ29uZmlnLmphdmE=) | `96.66% <100.00%> (+0.51%)` | `0.00 <0.00> (ø)` | |
   | [.../startree/v2/builder/OffHeapSingleTreeBuilder.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zdGFydHJlZS92Mi9idWlsZGVyL09mZkhlYXBTaW5nbGVUcmVlQnVpbGRlci5qYXZh) | `82.73% <0.00%> (-4.17%)` | `0.00% <0.00%> (ø%)` | |
   | [...lix/core/realtime/PinotRealtimeSegmentManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3JlYWx0aW1lL1Bpbm90UmVhbHRpbWVTZWdtZW50TWFuYWdlci5qYXZh) | `79.48% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | ... and [6 more](https://codecov.io/gh/apache/incubator-pinot/pull/6876/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [1c09b78...9f23736](https://codecov.io/gh/apache/incubator-pinot/pull/6876?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] GSharayu commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
GSharayu commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626743217



##########
File path: pinot-core/src/test/java/org/apache/pinot/queries/CompressionCodecQueriesTest.java
##########
@@ -0,0 +1,264 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.queries;
+
+import java.io.File;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.pinot.core.operator.blocks.IntermediateResultsBlock;
+import org.apache.pinot.core.operator.query.SelectionOnlyOperator;
+import org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoader;
+import org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl;
+import org.apache.pinot.segment.local.segment.index.loader.IndexLoadingConfig;
+import org.apache.pinot.segment.local.segment.readers.GenericRowRecordReader;
+import org.apache.pinot.segment.spi.ImmutableSegment;
+import org.apache.pinot.segment.spi.IndexSegment;
+import org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig;
+import org.apache.pinot.spi.config.table.FieldConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.TableType;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.data.readers.RecordReader;
+import org.apache.pinot.spi.utils.builder.TableConfigBuilder;
+import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+
+/**
+ * Functional tests for compression type feature.
+ * The tests use three kinds of input data
+ * (1) string
+ * (2) integer
+ * (3) long
+ */
+public class CompressionCodecQueriesTest extends BaseQueriesTest {
+  private static final File INDEX_DIR = new File(FileUtils.getTempDirectory(), "CompressionCodecQueriesTest");
+  private static final String TABLE_NAME = "MyTable";
+  private static final String SEGMENT_NAME = "testSegment";
+
+  private static final String SNAPPY_STRING = "SNAPPY_STRING";
+  private static final String PASS_THROUGH_STRING = "PASS_THROUGH_STRING";
+  private static final String ZSTANDARD_STRING = "ZSTANDARD_STRING";
+
+  private static final String SNAPPY_LONG = "SNAPPY_LONG";
+  private static final String PASS_THROUGH_LONG = "PASS_THROUGH_LONG";
+  private static final String ZSTANDARD_LONG = "ZSTANDARD_LONG";
+
+  private static final String SNAPPY_INTEGER = "SNAPPY_INTEGER";
+  private static final String PASS_THROUGH_INTEGER = "PASS_THROUGH_INTEGER";
+  private static final String ZSTANDARD_INTEGER = "ZSTANDARD_INTEGER";
+
+  private static final List<String> RAW_SNAPPY_INDEX_COLUMNS = Arrays
+      .asList(SNAPPY_STRING, SNAPPY_LONG, SNAPPY_INTEGER);
+
+  private static final List<String> RAW_ZSTANDARD_INDEX_COLUMNS = Arrays
+      .asList(ZSTANDARD_STRING, ZSTANDARD_LONG, ZSTANDARD_INTEGER);
+
+  private static final List<String> RAW_PASS_THROUGH_INDEX_COLUMNS = Arrays
+      .asList(PASS_THROUGH_STRING, PASS_THROUGH_LONG, PASS_THROUGH_INTEGER);
+
+  private final List<GenericRow> _rows = new ArrayList<>();
+
+  private IndexSegment _indexSegment;
+  private List<IndexSegment> _indexSegments;
+  private List<GenericRow> rows;
+
+  @Override
+  protected String getFilter() {
+    return "";
+  }
+
+  @Override
+  protected IndexSegment getIndexSegment() {
+    return _indexSegment;
+  }
+
+  @Override
+  protected List<IndexSegment> getIndexSegments() {
+    return _indexSegments;
+  }
+
+  @BeforeClass
+  public void setUp()
+      throws Exception {
+    FileUtils.deleteQuietly(INDEX_DIR);
+
+    buildSegment();
+    IndexLoadingConfig indexLoadingConfig = new IndexLoadingConfig();
+    Set<String> indexColumns = new HashSet<>();
+    indexColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+    indexColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+
+    indexLoadingConfig.getNoDictionaryColumns().addAll(indexColumns);
+    ImmutableSegment immutableSegment =
+        ImmutableSegmentLoader.load(new File(INDEX_DIR, SEGMENT_NAME), indexLoadingConfig);
+    _indexSegment = immutableSegment;
+    _indexSegments = Arrays.asList(immutableSegment, immutableSegment);
+  }
+
+  @AfterClass
+  public void tearDown() {
+    _indexSegment.destroy();
+    FileUtils.deleteQuietly(INDEX_DIR);
+  }
+
+  private void buildSegment()
+      throws Exception {
+    rows = createTestData();
+
+    List<FieldConfig> fieldConfigs = new ArrayList<>(RAW_SNAPPY_INDEX_COLUMNS.size() + RAW_ZSTANDARD_INDEX_COLUMNS.size() + RAW_PASS_THROUGH_INDEX_COLUMNS.size());
+    for (String indexColumn : RAW_SNAPPY_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.SNAPPY));
+    }
+
+    for (String indexColumn : RAW_ZSTANDARD_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.ZSTANDARD));
+    }
+
+    for (String indexColumn : RAW_PASS_THROUGH_INDEX_COLUMNS) {
+      fieldConfigs
+          .add(new FieldConfig(indexColumn, FieldConfig.EncodingType.RAW, null, null, FieldConfig.NoDictionaryColumnCompressorCodec.PASS_THROUGH));
+    }
+
+    List<String> _noDictionaryColumns = new ArrayList<>();
+    _noDictionaryColumns.addAll(RAW_SNAPPY_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_ZSTANDARD_INDEX_COLUMNS);
+    _noDictionaryColumns.addAll(RAW_PASS_THROUGH_INDEX_COLUMNS);
+
+    TableConfig tableConfig = new TableConfigBuilder(TableType.OFFLINE).setTableName(TABLE_NAME)
+        .setNoDictionaryColumns(_noDictionaryColumns)
+        .setFieldConfigList(fieldConfigs).build();
+    Schema schema = new Schema.SchemaBuilder().setSchemaName(TABLE_NAME)
+        .addSingleValueDimension(SNAPPY_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(PASS_THROUGH_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(ZSTANDARD_STRING, FieldSpec.DataType.STRING)
+        .addSingleValueDimension(SNAPPY_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(ZSTANDARD_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(PASS_THROUGH_INTEGER, FieldSpec.DataType.INT)
+        .addSingleValueDimension(SNAPPY_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(ZSTANDARD_LONG, FieldSpec.DataType.LONG)
+        .addSingleValueDimension(PASS_THROUGH_LONG, FieldSpec.DataType.LONG)
+        .build();
+    SegmentGeneratorConfig config = new SegmentGeneratorConfig(tableConfig, schema);
+    config.setOutDir(INDEX_DIR.getPath());
+    config.setTableName(TABLE_NAME);
+    config.setSegmentName(SEGMENT_NAME);
+    SegmentIndexCreationDriverImpl driver = new SegmentIndexCreationDriverImpl();
+    try (RecordReader recordReader = new GenericRowRecordReader(rows)) {
+      driver.init(config, recordReader);
+      driver.build();
+    }
+  }
+
+  private List<GenericRow> createTestData()
+      throws Exception {
+    List<GenericRow> rows = new ArrayList<>();
+
+    //Generate random data
+    int rowLength = 1000;
+    Random random = new Random();
+    String[] tempStringRows = new String[rowLength];
+    Integer[] tempIntRows = new Integer[rowLength];
+    Long[] tempLongRows = new Long[rowLength];
+
+    for (int i = 0; i < rowLength; i++) {
+      tempStringRows[i] = RandomStringUtils.random(random.nextInt(100), true, true);;
+      tempIntRows[i] = RandomUtils.nextInt();
+      tempLongRows[i] = RandomUtils.nextLong();
+    }
+
+    for (int i = 0; i < rowLength; i++) {
+      GenericRow row = new GenericRow();
+      row.putValue(SNAPPY_STRING, tempStringRows[i]);
+      row.putValue(ZSTANDARD_STRING, tempStringRows[i]);
+      row.putValue(PASS_THROUGH_STRING, tempStringRows[i]);
+      row.putValue(SNAPPY_INTEGER, tempIntRows[i]);
+      row.putValue(ZSTANDARD_INTEGER, tempIntRows[i]);
+      row.putValue(PASS_THROUGH_INTEGER, tempIntRows[i]);
+      row.putValue(SNAPPY_LONG, tempLongRows[i]);
+      row.putValue(ZSTANDARD_LONG, tempLongRows[i]);
+      row.putValue(PASS_THROUGH_LONG, tempLongRows[i]);
+      rows.add(row);
+    }
+    return rows;
+  }
+
+  /**
+   * Tests for basic compression codec queries.
+   */
+  @Test
+  public void testQueriesWithSnappyCompressionCodec()
+      throws Exception {
+
+    String query =
+        "SELECT SNAPPY_STRING, ZSTANDARD_STRING, PASS_THROUGH_STRING, SNAPPY_INTEGER, ZSTANDARD_INTEGER, PASS_THROUGH_INTEGER, "
+            + "SNAPPY_LONG, ZSTANDARD_LONG, PASS_THROUGH_LONG FROM MyTable LIMIT 1000";
+    ArrayList<Serializable[]> expected = new ArrayList<>();
+
+    for(GenericRow row: rows) {
+      expected.add(new Serializable[]{
+          String.valueOf(row.getValue(SNAPPY_STRING)), String.valueOf(row.getValue(ZSTANDARD_STRING)), String.valueOf(row.getValue(PASS_THROUGH_STRING)),
+          (Integer) row.getValue(SNAPPY_INTEGER), (Integer) row.getValue(ZSTANDARD_INTEGER), (Integer) row.getValue(PASS_THROUGH_INTEGER),
+          (Long) row.getValue(SNAPPY_LONG), (Long)row.getValue(ZSTANDARD_LONG), (Long) row.getValue(PASS_THROUGH_LONG),
+       });
+    }
+    testSelectQueryHelper(query, expected.size(), expected);
+  }
+
+  /*
+   * Helper methods for tests
+   */
+  private void testSelectQueryHelper(String query, int expectedResultSize, List<Serializable[]> expectedResults)
+      throws Exception {
+    SelectionOnlyOperator operator = getOperatorForPqlQuery(query);

Review comment:
       done!

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
##########
@@ -573,6 +573,9 @@ private static void validateFieldConfigList(@Nullable List<FieldConfig> fieldCon
             Preconditions.checkArgument(!noDictionaryColumns.contains(columnName),
                 "FieldConfig encoding type is different from indexingConfig for column: " + columnName);
           }
+          Preconditions.checkArgument(fieldConfig.getNoDictionaryColumnCompressorCodec() == null,
+              "FieldConfig column compression codec is only supported for single value raw encoding type");

Review comment:
       done!

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -69,6 +72,10 @@ public FieldConfig(@JsonProperty(value = "name", required = true) String name,
     INVERTED, SORTED, TEXT, FST, H3
   }
 
+  public enum NoDictionaryColumnCompressorCodec {

Review comment:
       done!

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardDecompressor.java
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.io.compression;
+
+import com.github.luben.zstd.Zstd;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import org.apache.pinot.segment.spi.compression.ChunkDecompressor;
+
+/**
+ * Implementation of {@link ChunkDecompressor} using Zstandard(Zstd).
+ */
+public class ZstandardDecompressor implements ChunkDecompressor {
+  @Override
+  public int decompress(ByteBuffer compressedInput, ByteBuffer decompressedOutput)
+      throws IOException {
+    int decompressedSize = Zstd.decompress(decompressedOutput, compressedInput);
+
+    // Make the output ByteBuffer ready for read.
+    decompressedOutput.flip();

Review comment:
       updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-832486737


   @kishoreg , Even the current Snappy library that is being used uses JNI underneath


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#discussion_r626085761



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -69,6 +72,10 @@ public FieldConfig(@JsonProperty(value = "name", required = true) String name,
     INVERTED, SORTED, TEXT, FST, H3
   }
 
+  public enum NoDictionaryColumnCompressorCodec {

Review comment:
       (nit) Suggest naming NoDictionaryColumnCompressionCodec




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on pull request #6876: Support ZSTD compression codec for raw index

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on pull request #6876:
URL: https://github.com/apache/incubator-pinot/pull/6876#issuecomment-833012603


   > LGTM. @kishoreg Do you still have concern on this PR? This library seems commonly used in the maven repository (115 usages)
   
   Yes the platform specific native library comes embedded in the java library and it automatically picks up the right library. Same thing for snappy-java we have been using for quite some time and also for this one that uses zstd-jni


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org