You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/05/03 16:26:29 UTC

[GitHub] [pinot] gortiz opened a new pull request, #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

gortiz opened a new pull request, #10715:
URL: https://github.com/apache/pinot/pull/10715

   This is a #10713 bugfix. As explained there, the problem occurs when user specifies that the column should not have a dictionary but that column is also included as a sorted column.
   
   In that case, pre `index-spi` we just ignored the `noDictionaryColumns`, but the newer code honored the config.
   An alternative is to read the dictionary even if the config says it should be disabled.
   But I think is more difficult to make mistakes if we actually introduce this (probably undesired) special case when reading the config in the old syntax.
   
   Therefore this PR modifies the deserializer used by DictionaryIndexType to add the following logic:
   - If the user disabled the dictionary using the older config BUT set the same column as the sorted column, the DictionaryConfig for that column is considered enabled.
   - If the user disabled the dictionary using the new syntax BUT set the same column as a sorted column, an error is thrown saying that the sorted column must be dictionary encoded.
   
   By doing that we preserve backward compatibility while we encourage people to use a coherent TableConfig once they migrate to the new configuration syntax


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10715:
URL: https://github.com/apache/pinot/pull/10715#discussion_r1203169463


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java:
##########
@@ -45,6 +45,18 @@ R createIndexReader(SegmentDirectory.Reader segmentReader, FieldIndexConfigs fie
     protected abstract R createIndexReader(PinotDataBuffer dataBuffer, ColumnMetadata metadata, C indexConfig)
         throws IOException, IndexReaderConstraintException;
 
+    /**
+     * Sometimes the index configuration indicates that the index should be disabled but the reader actually contains
+     * a buffer for the index type.
+     *
+     * By default, the buffer has priority over the configuration, so in case we have a buffer we would create an index

Review Comment:
   Sorted column config is actually quite tricky. It is used when creating the segment (on the minion side, or when sealing the consuming segment), but once the segment is created, it is no longer honored, and we rely on the metadata to determine whether a column is sorted. There could be multiple columns sorted, but they might not be configured as sorted column.
   When a column is sorted, but configured as no-dictionary column, we choose to ignore the no-dictionary config because it is almost always more efficient to use dictionary encoding for sorted column. The most common scenario would be user added a new column to an existing table, and wants it to be no-dictionary. For the existing segments, Pinot will backfill default value for the new column, so it contains only one value thus being sorted. In such case, we don't want to use no-dictionary because that won't be efficient.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10715:
URL: https://github.com/apache/pinot/pull/10715#issuecomment-1533454111

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10715?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#10715](https://app.codecov.io/gh/apache/pinot/pull/10715?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (66ef233) into [master](https://app.codecov.io/gh/apache/pinot/commit/cad764dcbe8793f7fb1cbec37051c27a753e4f30?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cad764d) will **decrease** coverage by `0.01%`.
   > The diff coverage is `83.01%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #10715      +/-   ##
   ============================================
   - Coverage     70.32%   70.32%   -0.01%     
   - Complexity     6429     6443      +14     
   ============================================
     Files          2112     2112              
     Lines        114056   114102      +46     
     Branches      17226    17239      +13     
   ============================================
   + Hits          80213    80240      +27     
   - Misses        28244    28249       +5     
   - Partials       5599     5613      +14     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `24.33% <0.00%> (-0.02%)` | :arrow_down: |
   | integration2 | `24.08% <0.00%> (+0.05%)` | :arrow_up: |
   | unittests1 | `67.83% <83.01%> (+<0.01%)` | :arrow_up: |
   | unittests2 | `13.81% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10715?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...pinot/segment/spi/index/DictionaryIndexConfig.java](https://app.codecov.io/gh/apache/pinot/pull/10715?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L0RpY3Rpb25hcnlJbmRleENvbmZpZy5qYXZh) | `66.66% <0.00%> (ø)` | |
   | [...org/apache/pinot/spi/config/table/IndexConfig.java](https://app.codecov.io/gh/apache/pinot/pull/10715?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0luZGV4Q29uZmlnLmphdmE=) | `62.50% <33.33%> (-37.50%)` | :arrow_down: |
   | [.../segment/index/dictionary/DictionaryIndexType.java](https://app.codecov.io/gh/apache/pinot/pull/10715?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2RpY3Rpb25hcnkvRGljdGlvbmFyeUluZGV4VHlwZS5qYXZh) | `92.35% <95.34%> (+0.69%)` | :arrow_up: |
   
   ... and [35 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/10715/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10715:
URL: https://github.com/apache/pinot/pull/10715#discussion_r1185741967


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java:
##########
@@ -45,6 +45,18 @@ R createIndexReader(SegmentDirectory.Reader segmentReader, FieldIndexConfigs fie
     protected abstract R createIndexReader(PinotDataBuffer dataBuffer, ColumnMetadata metadata, C indexConfig)
         throws IOException, IndexReaderConstraintException;
 
+    /**
+     * Sometimes the index configuration indicates that the index should be disabled but the reader actually contains
+     * a buffer for the index type.
+     *
+     * By default, the buffer has priority over the configuration, so in case we have a buffer we would create an index

Review Comment:
   But do we want to keep the previous behavior or do we want to be resilient in cases where the table config says one thing and the metainfo says otherwise? If we are going to decide what to do in the general case (in IndexType) based on what PhysicalColumnIndexContainer did, we should offer the two options (use config or use metainfo) and let the caller decide.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] saurabhd336 commented on a diff in pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "saurabhd336 (via GitHub)" <gi...@apache.org>.
saurabhd336 commented on code in PR #10715:
URL: https://github.com/apache/pinot/pull/10715#discussion_r1185759408


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java:
##########
@@ -45,6 +45,18 @@ R createIndexReader(SegmentDirectory.Reader segmentReader, FieldIndexConfigs fie
     protected abstract R createIndexReader(PinotDataBuffer dataBuffer, ColumnMetadata metadata, C indexConfig)
         throws IOException, IndexReaderConstraintException;
 
+    /**
+     * Sometimes the index configuration indicates that the index should be disabled but the reader actually contains
+     * a buffer for the index type.
+     *
+     * By default, the buffer has priority over the configuration, so in case we have a buffer we would create an index

Review Comment:
   In typical pinot server loading a segment case, I honestly don't see why it would be a problem. When servers come up, If there's a mismatch b/w table config and local segment metadata, the segment is always preprocessed as per the latest config before loading. So post preprocessing, the segment metadata should always be the source of truth. i.e. you're right that at the IndexReader level, we can always give preference to buffer over table config.
   
   Although I'd let @Jackie-Jiang  / @npawar / @klsince share their thoughts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] saurabhd336 commented on a diff in pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "saurabhd336 (via GitHub)" <gi...@apache.org>.
saurabhd336 commented on code in PR #10715:
URL: https://github.com/apache/pinot/pull/10715#discussion_r1185722776


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java:
##########
@@ -45,6 +45,18 @@ R createIndexReader(SegmentDirectory.Reader segmentReader, FieldIndexConfigs fie
     protected abstract R createIndexReader(PinotDataBuffer dataBuffer, ColumnMetadata metadata, C indexConfig)
         throws IOException, IndexReaderConstraintException;
 
+    /**
+     * Sometimes the index configuration indicates that the index should be disabled but the reader actually contains
+     * a buffer for the index type.
+     *
+     * By default, the buffer has priority over the configuration, so in case we have a buffer we would create an index

Review Comment:
   Don't think this is the case. Originally, (https://github.com/apache/pinot/pull/10184/files#diff-271b081664e0c34f39b5ee282f33dc95bcc6cc45e94daa0aa89e8b6bc53213d6L82-L89)
   most index types would be config > buffer except a few (like dictionary).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #10715:
URL: https://github.com/apache/pinot/pull/10715#issuecomment-1577694878

   #10851 should be the correct way to fix this problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz closed pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz closed pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…
URL: https://github.com/apache/pinot/pull/10715


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10715:
URL: https://github.com/apache/pinot/pull/10715#discussion_r1185818858


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java:
##########
@@ -45,6 +45,18 @@ R createIndexReader(SegmentDirectory.Reader segmentReader, FieldIndexConfigs fie
     protected abstract R createIndexReader(PinotDataBuffer dataBuffer, ColumnMetadata metadata, C indexConfig)
         throws IOException, IndexReaderConstraintException;
 
+    /**
+     * Sometimes the index configuration indicates that the index should be disabled but the reader actually contains
+     * a buffer for the index type.
+     *
+     * By default, the buffer has priority over the configuration, so in case we have a buffer we would create an index

Review Comment:
   What about this approach? Now the decision between prioritizing buffers or not is delegated to the caller. By default indexes will honor the config, but dictionary and null value will not.
   
   I've also found that PhysicalColumnIndexContainer does not load range index when the column is sorted, while RangeIndexType does. I'm going to leave it as it is right now. I don't think we should add that condition into RangeIndexType and I would prefer to do not add a special case in PhysicalColumnIndexContainer. I assume the reason why PhysicalColumnIndexContainer did that was because it wasn't actually necessary to use the range index in that situation, but I guess that should be a responsibility of the planer.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] saurabhd336 commented on pull request #10715: Fix #10713 by considering tableConfig.indexingConfig.sortedColumns as…

Posted by "saurabhd336 (via GitHub)" <gi...@apache.org>.
saurabhd336 commented on PR #10715:
URL: https://github.com/apache/pinot/pull/10715#issuecomment-1534489360

   @gortiz The issue described in the issue is about when the segment metadata says the column is sorted / has inv index / has fst index. The changes you've made only looks for sorted columns in current table config?
   
   https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java#L362 for reference


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org