You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "somandal (via GitHub)" <gi...@apache.org> on 2023/02/10 00:09:05 UTC

[GitHub] [pinot] somandal opened a new pull request, #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

somandal opened a new pull request, #10260:
URL: https://github.com/apache/pinot/pull/10260

   This PR relaxes the current constraint that a dictionary and inverted index must exist to allow disabling the forward index.  The following changes have been made as part of this PR:
   
   - Relax the constraint in the `TableConfigUtils` and other places such as segment creator and default index handler
   - Update the reload code path to handle toggling the dictionary for forward index disabled columns
   - Update the `NoOpForwardIndexCreator` to work for both dictionary based and non-dictionary based columns
   - Add tests
   
   Adding a new index on a column with forward index disabled along with inverted index or dictionary disabled will result in an error now in reload since the inverted index + dictionary are needed to generate a temporary forward index.
   
   The reload code path needs some significant updates in the `ForwardIndexHandler` to support this change and allow toggling the dictionary for forward index disabled columns. The approach taken to support this can be summarized as follows:
   
   - Forward index -> No Forward Index
       - Sorted columns are ignored (no-op) [current code path]
       - Existing column has dictionary
           - Dictionary being removed
               - Regenerate temporary forward index to be raw based (post updateIndex cleanup will remove it)
               - Remove dictionary
               - Update segment metadata to indicate dictionary disabled
               - Remove all dictionary indexes: inverted index and fst (as they need dictionary) and range (needs to be regenerated and is possible to regenerate on this path)
           - Dictionary being kept [current code path]
               - Remove forward index (post updateIndex cleanup)
       - Existing column doesn't have dictionary
           - Dictionary remains disabled
               - Remove forward index (post updateIndex cleanup)
           - Dictionary to be enabled [current code path]
               - Regenerate temporary forward index to be dictionary based (post updateIndex cleanup will remove it)
               - Update segment metadata to indicate dictionary disabled
               - Remove all dictionary indexes: inverted index and fst (they shouldn't even exist as they need dictionary) and range (needs to be regenerated and is possible to regenerate on this path)
   - No Forward Index -> Forward Index
       - Sorted columns are ignored (no-op) [current code path]
       - If either dictionary or inverted index is disabled on the existing column, log a warning and treat this as a no-op
       - Otherwise regenerate the forward index if dictionary enabled or disabled [current code path]
   - No Forward Index -> No Forward Index (this is not getting toggled but maybe something else is)
       - Existing column has dictionary
           - Dictionary being removed
               - Inverted index exists
                   - Temporary forward index can be regenerated as raw format from dictionary + inverted index
                   - Update segment metadata to indicate dictionary disabled
                   - Remove all dictionary indexes: inverted index and fst (as they need dictionary)  and range (needs to be regenerated and is possible to regenerate on this path)
               - Inverted Index does not exist
                   - Cannot regenerate the forward index in raw format: SHOULD FAIL if RANGE INDEX exists since range index needs to be regenerated
                   - Just remove dictionary
                   - Update segment metadata to indicate dictionary disabled
                   - Remove all dictionary indexes: inverted index and fst (as they need dictionary) and range (needs to be regenerated and cannot be regenerate on this path)
           - Dictionary being kept [current code path]
               - no-op
       - Existing column doesn't have dictionary
           - Dictionary remains removed
               - no-op
           - Dictionary being enabled
               - Cannot rebuild dictionary if forward index is not available. Throw an error
   
   cc @siddharthteotia @Jackie-Jiang @vvivekiyer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10260:
URL: https://github.com/apache/pinot/pull/10260#issuecomment-1425075161

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#10260](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f80db45) into [master](https://codecov.io/gh/apache/pinot/commit/63c1e504db3b241314899a12d251b39d2604bdfe?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (63c1e50) will **decrease** coverage by `19.08%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #10260       +/-   ##
   =============================================
   - Coverage     32.77%   13.69%   -19.08%     
   + Complexity      202      178       -24     
   =============================================
     Files          2017     1962       -55     
     Lines        109187   106829     -2358     
     Branches      16605    16344      -261     
   =============================================
   - Hits          35782    14632    -21150     
   - Misses        70296    91047    +20751     
   + Partials       3109     1150     -1959     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration2 | `?` | |
   | unittests2 | `13.69% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ment/creator/impl/DefaultIndexCreatorProvider.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9EZWZhdWx0SW5kZXhDcmVhdG9yUHJvdmlkZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <ø> (ø)` | |
   | [...ment/creator/impl/fwd/NoOpForwardIndexCreator.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9md2QvTm9PcEZvcndhcmRJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...t/local/segment/index/loader/BaseIndexHandler.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9CYXNlSW5kZXhIYW5kbGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...ocal/segment/index/loader/ForwardIndexHandler.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Gb3J3YXJkSW5kZXhIYW5kbGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...he/pinot/segment/local/utils/TableConfigUtils.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9UYWJsZUNvbmZpZ1V0aWxzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...src/main/java/org/apache/pinot/sql/FilterKind.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcWwvRmlsdGVyS2luZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ain/java/org/apache/pinot/common/CustomObject.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vQ3VzdG9tT2JqZWN0LmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...n/java/org/apache/pinot/core/data/table/Table.java](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1RhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [704 more](https://codecov.io/gh/apache/pinot/pull/10260?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1115190123


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -608,119 +568,125 @@ public void indexRow(GenericRow row)
               columnValueToIndex = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
             }
           }
-          switch (forwardIndexCreator.getValueType()) {
-            case INT:
-              forwardIndexCreator.putInt((int) columnValueToIndex);
-              break;
-            case LONG:
-              forwardIndexCreator.putLong((long) columnValueToIndex);
-              break;
-            case FLOAT:
-              forwardIndexCreator.putFloat((float) columnValueToIndex);
-              break;
-            case DOUBLE:
-              forwardIndexCreator.putDouble((double) columnValueToIndex);
-              break;
-            case BIG_DECIMAL:
-              forwardIndexCreator.putBigDecimal((BigDecimal) columnValueToIndex);
-              break;
-            case STRING:
-              forwardIndexCreator.putString((String) columnValueToIndex);
-              break;
-            case BYTES:
-              forwardIndexCreator.putBytes((byte[]) columnValueToIndex);
-              break;
-            case JSON:
-              if (columnValueToIndex instanceof String) {
+          if (forwardIndexCreator != null) {
+            switch (forwardIndexCreator.getValueType()) {
+              case INT:
+                forwardIndexCreator.putInt((int) columnValueToIndex);
+                break;
+              case LONG:
+                forwardIndexCreator.putLong((long) columnValueToIndex);
+                break;
+              case FLOAT:
+                forwardIndexCreator.putFloat((float) columnValueToIndex);
+                break;
+              case DOUBLE:
+                forwardIndexCreator.putDouble((double) columnValueToIndex);
+                break;
+              case BIG_DECIMAL:
+                forwardIndexCreator.putBigDecimal((BigDecimal) columnValueToIndex);
+                break;
+              case STRING:
                 forwardIndexCreator.putString((String) columnValueToIndex);
-              } else if (columnValueToIndex instanceof byte[]) {
+                break;
+              case BYTES:
                 forwardIndexCreator.putBytes((byte[]) columnValueToIndex);
-              }
-              break;
-            default:
-              throw new IllegalStateException();
+                break;
+              case JSON:
+                if (columnValueToIndex instanceof String) {
+                  forwardIndexCreator.putString((String) columnValueToIndex);
+                } else if (columnValueToIndex instanceof byte[]) {
+                  forwardIndexCreator.putBytes((byte[]) columnValueToIndex);
+                }
+                break;
+              default:
+                throw new IllegalStateException();
+            }
           }
         }
       } else {
         if (dictionaryCreator != null) {
           //dictionary encoded
           int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-          forwardIndexCreator.putDictIdMV(dictIds);
+          if (forwardIndexCreator != null) {
+            forwardIndexCreator.putDictIdMV(dictIds);
+          }
           DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
           if (invertedIndexCreator != null) {
             invertedIndexCreator.add(dictIds, dictIds.length);
           }
         } else {
           // for text index on raw columns, check the config to determine if actual raw value should
           // be stored or not
-          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
-            Object value = _columnProperties.get(columnName).get(FieldConfig.TEXT_INDEX_RAW_VALUE);
-            if (value == null) {
-              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
-            }
-            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
-              columnValueToIndex = new String[]{String.valueOf(value)};
-            } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
-              columnValueToIndex = new byte[][]{String.valueOf(value).getBytes(UTF_8)};
-            } else {
-              throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
-            }
-          }
-          Object[] values = (Object[]) columnValueToIndex;
-          int length = values.length;
-          switch (forwardIndexCreator.getValueType()) {
-            case INT:
-              int[] ints = new int[length];
-              for (int i = 0; i < length; i++) {
-                ints[i] = (Integer) values[i];
-              }
-              forwardIndexCreator.putIntMV(ints);
-              break;
-            case LONG:
-              long[] longs = new long[length];
-              for (int i = 0; i < length; i++) {
-                longs[i] = (Long) values[i];
+          if (forwardIndexCreator != null) {
+            if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {

Review Comment:
   merging these won't quite work, I need this same `forwardIndexCreator != null` check for the next part where I set the values in the forward index, that's why I kept this check outside.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1115197723


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -93,12 +94,10 @@ public class ForwardIndexHandler extends BaseIndexHandler {
   private final Schema _schema;
 
   protected enum Operation {

Review Comment:
   Sure, let me pick this up as a future fix (along with the changes to remove some of the tests that aren't really needed)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1103353591


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/NoOpForwardIndexCreator.java:
##########
@@ -19,12 +19,13 @@
 package org.apache.pinot.segment.local.segment.creator.impl.fwd;
 
 import java.io.IOException;
+import java.math.BigDecimal;
 import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
 import org.apache.pinot.spi.data.FieldSpec;
 
 
 /**
- * Forward index creator for dictionary-encoded single and multi-value columns with forward index disabled.
+ * Forward index creator for dictionary-encoded and raw single and multi-value columns with forward index disabled.
  * This is a no-op.
  */
 public class NoOpForwardIndexCreator implements ForwardIndexCreator {

Review Comment:
   Suggest getting rid of this class and handle disabled forward index explicitly instead of still dumping the values into the no-op creator for both readability and performance concern



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on PR #10260:
URL: https://github.com/apache/pinot/pull/10260#issuecomment-1428544993

   @Jackie-Jiang I addressed your comments, can you take a look when you get a chance? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10260:
URL: https://github.com/apache/pinot/pull/10260#issuecomment-1426505319

   > We have a scenario where the dictionary can be disabled for a forward index disabled column. For such scenarios we don't handle anything on reload today. Are you saying that even for toggling whether the dictionary is enabled or disabled we should just treat reload as a no-op rather than throwing error? What about indexes like the range index that need to be modified when the column has the dictionary enabled or disabled?
   
   Disable dictionary should be independent of disabling forward index. Currently it is no-op because we want to ensure data can be recovered. With the new change, we should allow no dictionary + no forward index. Note that disable dictionary + enable inverted index is invalid config, and we should reject it during table config validation.
   When both forward index and dictionary are disabled, no further index modification is allowed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1107376842


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -93,8 +93,10 @@ public class ForwardIndexHandler extends BaseIndexHandler {
   private final Schema _schema;
 
   protected enum Operation {
-    DISABLE_FORWARD_INDEX_FOR_DICT_COLUMN,
-    DISABLE_FORWARD_INDEX_FOR_RAW_COLUMN,
+    DISABLE_FORWARD_INDEX_FOR_DICT_OR_RAW_COLUMN,

Review Comment:
   Let me look at the code and get back to you about the feasibility of doing the above in steps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on PR #10260:
URL: https://github.com/apache/pinot/pull/10260#issuecomment-1426488746

   > I feel we are over-complicating this problem. When inverted index does not exist, disabling forward index means only allowing using the index for the column (text, FST). No scan is allowed for the column, and there is no way to recover the column values or create other indexes. On the forward index handler side we can simply remove the forward index without worrying about creating other indexes.
   
   Yes I agree, for inverted index there isn't much to do and this PR doesn't do any special handling for it except relaxing the constraint. The main complexity here comes from handling changes to the dictionary.
   
   We have a scenario where the dictionary can be disabled for a forward index disabled column. For such scenarios we don't handle anything on reload today. Are you saying that even for toggling whether the dictionary is enabled or disabled we should just treat reload as a no-op rather than throwing error? What about indexes like the range index that need to be modified when the column has the dictionary enabled or disabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1115197451


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -110,86 +109,81 @@ public ForwardIndexHandler(SegmentDirectory segmentDirectory, IndexLoadingConfig
   @Override
   public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader)
       throws Exception {
-    Map<String, Operation> columnOperationMap = computeOperation(segmentReader);
-    return !columnOperationMap.isEmpty();
+    Map<String, List<Operation>> columnOperationsMap = computeOperations(segmentReader);
+    return !columnOperationsMap.isEmpty();
   }
 
   @Override
   public void updateIndices(SegmentDirectory.Writer segmentWriter, IndexCreatorProvider indexCreatorProvider)
       throws Exception {
-    Map<String, Operation> columnOperationMap = computeOperation(segmentWriter);
-    if (columnOperationMap.isEmpty()) {
+    Map<String, List<Operation>> columnOperationsMap = computeOperations(segmentWriter);
+    if (columnOperationsMap.isEmpty()) {
       return;
     }
 
-    for (Map.Entry<String, Operation> entry : columnOperationMap.entrySet()) {
+    for (Map.Entry<String, List<Operation>> entry : columnOperationsMap.entrySet()) {
       String column = entry.getKey();
-      Operation operation = entry.getValue();
-
-      switch (operation) {
-        case DISABLE_FORWARD_INDEX_FOR_DICT_COLUMN: {
-          // Deletion of the forward index will be handled outside the index handler to ensure that other index
-          // handlers that need the forward index to construct their own indexes will have it available.
-          // The existing forward index must be in dictionary format for this to be a no-op.
-          _tmpForwardIndexColumns.add(column);
-          break;
-        }
-        case DISABLE_FORWARD_INDEX_FOR_RAW_COLUMN: {
-          // The forward index has been disabled for a column which has a noDictionary based forward index. A dictionary
-          // and inverted index need to be created before we can delete the forward index. We create a dictionary here,
-          // but let the InvertedIndexHandler handle the creation of the inverted index. We create a temporary
-          // forward index here which is dictionary based and allow the post deletion step handle the actual deletion
-          // of the forward index.
-          createDictBasedForwardIndex(column, segmentWriter, indexCreatorProvider);
-          if (!segmentWriter.hasIndexFor(column, ColumnIndexType.FORWARD_INDEX)) {
-            throw new IOException(String.format("Temporary forward index was not created for column: %s", column));
-          }
-          _tmpForwardIndexColumns.add(column);
-          break;
-        }
-        case ENABLE_FORWARD_INDEX_FOR_DICT_COLUMN: {
-          createForwardIndexIfNeeded(segmentWriter, column, indexCreatorProvider, false);
-          if (!segmentWriter.hasIndexFor(column, ColumnIndexType.DICTIONARY)) {
-            throw new IOException(
-                String.format("Dictionary should still exist after rebuilding forward index for dictionary column: %s",
-                    column));
-          }
-          break;
-        }
-        case ENABLE_FORWARD_INDEX_FOR_RAW_COLUMN: {
-          createForwardIndexIfNeeded(segmentWriter, column, indexCreatorProvider, false);
-          if (segmentWriter.hasIndexFor(column, ColumnIndexType.DICTIONARY)) {
-            throw new IOException(
-                String.format("Dictionary should not exist after rebuilding forward index for raw column: %s", column));
-          }
-          break;
+      List<Operation> operations = entry.getValue();
+
+      for (Operation operation : operations) {
+        switch (operation) {
+          case DISABLE_FORWARD_INDEX:
+            // Deletion of the forward index will be handled outside the index handler to ensure that other index
+            // handlers that need the forward index to construct their own indexes will have it available.
+            _tmpForwardIndexColumns.add(column);
+            break;
+          case ENABLE_FORWARD_INDEX:
+            ColumnMetadata columnMetadata = createForwardIndexIfNeeded(segmentWriter, column, indexCreatorProvider,
+                false);
+            if (columnMetadata.hasDictionary()) {
+              if (!segmentWriter.hasIndexFor(column, ColumnIndexType.DICTIONARY)) {
+                throw new IOException(String.format("Dictionary should still exist after rebuilding forward index "

Review Comment:
   done



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/DefaultIndexCreatorProvider.java:
##########
@@ -76,6 +75,10 @@ public ForwardIndexCreator newForwardIndexCreator(IndexCreationContext.Forward c
       throws Exception {
     if (!context.hasDictionary()) {
       // Dictionary disabled columns
+      if (context.forwardIndexDisabled()) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1103366670


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/NoOpForwardIndexCreator.java:
##########
@@ -19,12 +19,13 @@
 package org.apache.pinot.segment.local.segment.creator.impl.fwd;
 
 import java.io.IOException;
+import java.math.BigDecimal;
 import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
 import org.apache.pinot.spi.data.FieldSpec;
 
 
 /**
- * Forward index creator for dictionary-encoded single and multi-value columns with forward index disabled.
+ * Forward index creator for dictionary-encoded and raw single and multi-value columns with forward index disabled.
  * This is a no-op.
  */
 public class NoOpForwardIndexCreator implements ForwardIndexCreator {

Review Comment:
   Sure Jackie, but let me handle removing this as a separate change rather than mixing this into this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #10260:
URL: https://github.com/apache/pinot/pull/10260


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1106502542


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -93,8 +93,10 @@ public class ForwardIndexHandler extends BaseIndexHandler {
   private final Schema _schema;
 
   protected enum Operation {
-    DISABLE_FORWARD_INDEX_FOR_DICT_COLUMN,
-    DISABLE_FORWARD_INDEX_FOR_RAW_COLUMN,
+    DISABLE_FORWARD_INDEX_FOR_DICT_OR_RAW_COLUMN,

Review Comment:
   This part is very complicated. Are we able to simplify the operations allowed?
   I feel we can simplify it by making changes in steps:
   1. Check if we need to enable/disable dictionary (switch between dictionary encoding and raw forward index)
   2. Check if we need to enable/disable the forward index
   3. Check if we need to change the raw index compression type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1115123268


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/DefaultIndexCreatorProvider.java:
##########
@@ -76,6 +75,10 @@ public ForwardIndexCreator newForwardIndexCreator(IndexCreationContext.Forward c
       throws Exception {
     if (!context.hasDictionary()) {
       // Dictionary disabled columns
+      if (context.forwardIndexDisabled()) {

Review Comment:
   (minor) Suggest moving the check to the caller side to be consistent with other index creators. Basically when invoking this method, forward index should be created



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -93,12 +94,10 @@ public class ForwardIndexHandler extends BaseIndexHandler {
   private final Schema _schema;
 
   protected enum Operation {

Review Comment:
   (optional) Ideally we can remove the `Operation` enum, and update the index in steps. Can be handled separately



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -608,119 +568,125 @@ public void indexRow(GenericRow row)
               columnValueToIndex = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
             }
           }
-          switch (forwardIndexCreator.getValueType()) {
-            case INT:
-              forwardIndexCreator.putInt((int) columnValueToIndex);
-              break;
-            case LONG:
-              forwardIndexCreator.putLong((long) columnValueToIndex);
-              break;
-            case FLOAT:
-              forwardIndexCreator.putFloat((float) columnValueToIndex);
-              break;
-            case DOUBLE:
-              forwardIndexCreator.putDouble((double) columnValueToIndex);
-              break;
-            case BIG_DECIMAL:
-              forwardIndexCreator.putBigDecimal((BigDecimal) columnValueToIndex);
-              break;
-            case STRING:
-              forwardIndexCreator.putString((String) columnValueToIndex);
-              break;
-            case BYTES:
-              forwardIndexCreator.putBytes((byte[]) columnValueToIndex);
-              break;
-            case JSON:
-              if (columnValueToIndex instanceof String) {
+          if (forwardIndexCreator != null) {
+            switch (forwardIndexCreator.getValueType()) {
+              case INT:
+                forwardIndexCreator.putInt((int) columnValueToIndex);
+                break;
+              case LONG:
+                forwardIndexCreator.putLong((long) columnValueToIndex);
+                break;
+              case FLOAT:
+                forwardIndexCreator.putFloat((float) columnValueToIndex);
+                break;
+              case DOUBLE:
+                forwardIndexCreator.putDouble((double) columnValueToIndex);
+                break;
+              case BIG_DECIMAL:
+                forwardIndexCreator.putBigDecimal((BigDecimal) columnValueToIndex);
+                break;
+              case STRING:
                 forwardIndexCreator.putString((String) columnValueToIndex);
-              } else if (columnValueToIndex instanceof byte[]) {
+                break;
+              case BYTES:
                 forwardIndexCreator.putBytes((byte[]) columnValueToIndex);
-              }
-              break;
-            default:
-              throw new IllegalStateException();
+                break;
+              case JSON:
+                if (columnValueToIndex instanceof String) {
+                  forwardIndexCreator.putString((String) columnValueToIndex);
+                } else if (columnValueToIndex instanceof byte[]) {
+                  forwardIndexCreator.putBytes((byte[]) columnValueToIndex);
+                }
+                break;
+              default:
+                throw new IllegalStateException();
+            }
           }
         }
       } else {
         if (dictionaryCreator != null) {
           //dictionary encoded
           int[] dictIds = dictionaryCreator.indexOfMV(columnValueToIndex);
-          forwardIndexCreator.putDictIdMV(dictIds);
+          if (forwardIndexCreator != null) {
+            forwardIndexCreator.putDictIdMV(dictIds);
+          }
           DictionaryBasedInvertedIndexCreator invertedIndexCreator = _invertedIndexCreatorMap.get(columnName);
           if (invertedIndexCreator != null) {
             invertedIndexCreator.add(dictIds, dictIds.length);
           }
         } else {
           // for text index on raw columns, check the config to determine if actual raw value should
           // be stored or not
-          if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
-            Object value = _columnProperties.get(columnName).get(FieldConfig.TEXT_INDEX_RAW_VALUE);
-            if (value == null) {
-              value = FieldConfig.TEXT_INDEX_DEFAULT_RAW_VALUE;
-            }
-            if (forwardIndexCreator.getValueType().getStoredType() == DataType.STRING) {
-              columnValueToIndex = new String[]{String.valueOf(value)};
-            } else if (forwardIndexCreator.getValueType().getStoredType() == DataType.BYTES) {
-              columnValueToIndex = new byte[][]{String.valueOf(value).getBytes(UTF_8)};
-            } else {
-              throw new RuntimeException("Text Index is only supported for STRING and BYTES stored type");
-            }
-          }
-          Object[] values = (Object[]) columnValueToIndex;
-          int length = values.length;
-          switch (forwardIndexCreator.getValueType()) {
-            case INT:
-              int[] ints = new int[length];
-              for (int i = 0; i < length; i++) {
-                ints[i] = (Integer) values[i];
-              }
-              forwardIndexCreator.putIntMV(ints);
-              break;
-            case LONG:
-              long[] longs = new long[length];
-              for (int i = 0; i < length; i++) {
-                longs[i] = (Long) values[i];
+          if (forwardIndexCreator != null) {
+            if (textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {

Review Comment:
   (minor) Merge these 2 if conditions
   ```suggestion
             if (forwardIndexCreator != null && textIndexCreator != null && !shouldStoreRawValueForTextIndex(columnName)) {
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -110,86 +109,81 @@ public ForwardIndexHandler(SegmentDirectory segmentDirectory, IndexLoadingConfig
   @Override
   public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader)
       throws Exception {
-    Map<String, Operation> columnOperationMap = computeOperation(segmentReader);
-    return !columnOperationMap.isEmpty();
+    Map<String, List<Operation>> columnOperationsMap = computeOperations(segmentReader);
+    return !columnOperationsMap.isEmpty();
   }
 
   @Override
   public void updateIndices(SegmentDirectory.Writer segmentWriter, IndexCreatorProvider indexCreatorProvider)
       throws Exception {
-    Map<String, Operation> columnOperationMap = computeOperation(segmentWriter);
-    if (columnOperationMap.isEmpty()) {
+    Map<String, List<Operation>> columnOperationsMap = computeOperations(segmentWriter);
+    if (columnOperationsMap.isEmpty()) {
       return;
     }
 
-    for (Map.Entry<String, Operation> entry : columnOperationMap.entrySet()) {
+    for (Map.Entry<String, List<Operation>> entry : columnOperationsMap.entrySet()) {
       String column = entry.getKey();
-      Operation operation = entry.getValue();
-
-      switch (operation) {
-        case DISABLE_FORWARD_INDEX_FOR_DICT_COLUMN: {
-          // Deletion of the forward index will be handled outside the index handler to ensure that other index
-          // handlers that need the forward index to construct their own indexes will have it available.
-          // The existing forward index must be in dictionary format for this to be a no-op.
-          _tmpForwardIndexColumns.add(column);
-          break;
-        }
-        case DISABLE_FORWARD_INDEX_FOR_RAW_COLUMN: {
-          // The forward index has been disabled for a column which has a noDictionary based forward index. A dictionary
-          // and inverted index need to be created before we can delete the forward index. We create a dictionary here,
-          // but let the InvertedIndexHandler handle the creation of the inverted index. We create a temporary
-          // forward index here which is dictionary based and allow the post deletion step handle the actual deletion
-          // of the forward index.
-          createDictBasedForwardIndex(column, segmentWriter, indexCreatorProvider);
-          if (!segmentWriter.hasIndexFor(column, ColumnIndexType.FORWARD_INDEX)) {
-            throw new IOException(String.format("Temporary forward index was not created for column: %s", column));
-          }
-          _tmpForwardIndexColumns.add(column);
-          break;
-        }
-        case ENABLE_FORWARD_INDEX_FOR_DICT_COLUMN: {
-          createForwardIndexIfNeeded(segmentWriter, column, indexCreatorProvider, false);
-          if (!segmentWriter.hasIndexFor(column, ColumnIndexType.DICTIONARY)) {
-            throw new IOException(
-                String.format("Dictionary should still exist after rebuilding forward index for dictionary column: %s",
-                    column));
-          }
-          break;
-        }
-        case ENABLE_FORWARD_INDEX_FOR_RAW_COLUMN: {
-          createForwardIndexIfNeeded(segmentWriter, column, indexCreatorProvider, false);
-          if (segmentWriter.hasIndexFor(column, ColumnIndexType.DICTIONARY)) {
-            throw new IOException(
-                String.format("Dictionary should not exist after rebuilding forward index for raw column: %s", column));
-          }
-          break;
+      List<Operation> operations = entry.getValue();
+
+      for (Operation operation : operations) {
+        switch (operation) {
+          case DISABLE_FORWARD_INDEX:
+            // Deletion of the forward index will be handled outside the index handler to ensure that other index
+            // handlers that need the forward index to construct their own indexes will have it available.
+            _tmpForwardIndexColumns.add(column);
+            break;
+          case ENABLE_FORWARD_INDEX:
+            ColumnMetadata columnMetadata = createForwardIndexIfNeeded(segmentWriter, column, indexCreatorProvider,
+                false);
+            if (columnMetadata.hasDictionary()) {
+              if (!segmentWriter.hasIndexFor(column, ColumnIndexType.DICTIONARY)) {
+                throw new IOException(String.format("Dictionary should still exist after rebuilding forward index "

Review Comment:
   Not introduced in this PR, but suggest changing it to `IllegalStateException` since it is not really `IOException`. Same for other places where the operation is illegal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #10260:
URL: https://github.com/apache/pinot/pull/10260#discussion_r1113613195


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/NoOpForwardIndexCreator.java:
##########
@@ -19,12 +19,13 @@
 package org.apache.pinot.segment.local.segment.creator.impl.fwd;
 
 import java.io.IOException;
+import java.math.BigDecimal;
 import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
 import org.apache.pinot.spi.data.FieldSpec;
 
 
 /**
- * Forward index creator for dictionary-encoded single and multi-value columns with forward index disabled.
+ * Forward index creator for dictionary-encoded and raw single and multi-value columns with forward index disabled.
  * This is a no-op.
  */
 public class NoOpForwardIndexCreator implements ForwardIndexCreator {

Review Comment:
   @Jackie-Jiang I've removed this as part of this PR itself



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on PR #10260:
URL: https://github.com/apache/pinot/pull/10260#issuecomment-1426595563

   > > We have a scenario where the dictionary can be disabled for a forward index disabled column. For such scenarios we don't handle anything on reload today. Are you saying that even for toggling whether the dictionary is enabled or disabled we should just treat reload as a no-op rather than throwing error? What about indexes like the range index that need to be modified when the column has the dictionary enabled or disabled?
   > 
   > Disable dictionary should be independent of disabling forward index. Currently it is no-op because we want to ensure data can be recovered. With the new change, we should allow no dictionary + no forward index. Note that disable dictionary + enable inverted index is invalid config, and we should reject it during table config validation. When both forward index and dictionary are disabled, no further index modification is allowed.
   
   Sure that makes sense. So if I'm understanding correctly I should make the following changes:
   
   - When dictionary is disabled for a forward index disabled column we just remove the dictionary, update the segment metadata and remove the related indexes. If range index is present we can throw an error since we shouldn't try to regenerate it under any circumstances.
   - If someone tries to enable the dictionary on a column without forward index, again we should avoid any regeneration business and just throw an error on reload.
   - Add validations that inverted index and FST are also disabled if dictionary is disabled on a forward index disabled column (if such a compatibility check doesn't already exist for all columns)
   
   Let me know if the above sounds good or not. I'll keep all code paths marked as [current code path] in the list in the main comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on pull request #10260: Relax the constraint to have a dictionary and inverted index when forward index is disabled

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on PR #10260:
URL: https://github.com/apache/pinot/pull/10260#issuecomment-1441159975

   > LGTM with minor comments.
   > 
   > Thanks for adding the detailed tests, but it will be good if we can merge some similar tests to keep the test smaller and with the same coverage. It should also help reduce the CI run time. Currently the total lines of test for the forward index handler is probably close to 5K lines, and is quite hard to manage. This can be addressed separately
   
   Thanks for the detailed review @Jackie-Jiang ! I've addressed most of your comments. Great point about the test explosion. Let me spend some time going over the tests in more detail and come up with that list which can be removed. I'll address this in a follow up (along with changes to remove that Operation enum).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org