You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jadami10 (via GitHub)" <gi...@apache.org> on 2024/02/22 00:52:03 UTC

[PR] require noDictionaryColumns with aggregationConfigs [pinot]

jadami10 opened a new pull request, #12464:
URL: https://github.com/apache/pinot/pull/12464

   This is a `bugfix` to ensure someone does not incorrectly set up ingestion aggregation. Currently, `MutableSegmentImpl` will just disable aggregation if you do not have `noDictionaryColumns` for all `columnName` in `aggregationConfigs`. This caused an issue in one of our clusters because all of the aggregations showed up as `0`.
   
   This will not break existing tables as validation is only done when the table is created or updated. But even if anyone was using these configs incorrectly and runs into this error message, they can just delete the configs since they're doing nothing anyway.
   
   This was tested on an internal cluster where I verified:
   - you cannot remove existing aggregation columns from noDictionaryConfigs
   - you cannot add new aggregation columns without adding them to noDictionaryConfigs
   - you cannot make a new table without all aggregation columns in noDictionaryConfigs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] require noDictionaryColumns with aggregationConfigs [pinot]

Posted by "jadami10 (via GitHub)" <gi...@apache.org>.
jadami10 commented on code in PR #12464:
URL: https://github.com/apache/pinot/pull/12464#discussion_r1501147536


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java:
##########
@@ -490,6 +493,19 @@ public static void validateIngestionConfig(TableConfig tableConfig, @Nullable Sc
           Preconditions.checkState(new HashSet<>(schema.getMetricNames()).equals(aggregationColumns),
               "all metric columns must be aggregated");
         }
+
+        // This is required by MutableSegmentImpl.enableMetricsAggregationIfPossible().
+        // That code will disable ingestion aggregation if all metrics aren't noDictionaryColumns.
+        // But if you do that after the table is already created, all future aggregations will
+        // just be the default value.
+        Map<String, FieldIndexConfigs> fieldIndexConfigsMap = FieldIndexConfigsUtil.createIndexConfigsByColName(

Review Comment:
   great suggestion, thank you. I think is should be `.isDisabled()`. Or at least that's what tests say.
   
   There's honestly a lot of places we might want to clean this up. The invertedIndex check is based solely on the `noDictionaryColumn` config for example. I'm surprised more people aren't running into this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] require noDictionaryColumns with aggregationConfigs [pinot]

Posted by "jadami10 (via GitHub)" <gi...@apache.org>.
jadami10 commented on PR #12464:
URL: https://github.com/apache/pinot/pull/12464#issuecomment-1960737917

   @Jackie-Jiang, good call on using `FieldIndexConfigsUtil`. That definitely caught some other cases like `noDictionaryConfigs` that we use as well. I retested creating new tables and updating existing tables to ensure they can't be created/updated. Existing tables are unaffected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] require noDictionaryColumns with aggregationConfigs [pinot]

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #12464:
URL: https://github.com/apache/pinot/pull/12464#discussion_r1500325084


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java:
##########
@@ -490,6 +493,19 @@ public static void validateIngestionConfig(TableConfig tableConfig, @Nullable Sc
           Preconditions.checkState(new HashSet<>(schema.getMetricNames()).equals(aggregationColumns),
               "all metric columns must be aggregated");
         }
+
+        // This is required by MutableSegmentImpl.enableMetricsAggregationIfPossible().
+        // That code will disable ingestion aggregation if all metrics aren't noDictionaryColumns.
+        // But if you do that after the table is already created, all future aggregations will
+        // just be the default value.
+        Map<String, FieldIndexConfigs> fieldIndexConfigsMap = FieldIndexConfigsUtil.createIndexConfigsByColName(

Review Comment:
   As far as I know there is none. But in this case we can directly call `StandardIndexes.dictionary().getConfig(tableConfig, schema)`.
   
   Specifically we could change this code to do something like:
   
   ```java
          Map<String, DictionaryIndexConfig> configPerCol = StandardIndexes.dictionary().getConfig(tableConfig, schema);
           aggregationColumns.forEach(column -> {
             DictionaryIndexConfig dictConfig = configPerCol.get(column);
             Preconditions.checkState(dictConfig != null && dictConfig.isEnabled(),
               "Aggregated column: %s must be a no-dictionary column", column);
             });
           }
   ```
   
   `FieldIndexConfigsUtil.createIndexConfigsByColName` is designed to be used when we need to calculate the indexes for all columns and it includes a deserializer in order to optionally alter the way index configs are read (which is necessary in order to use IndexLoadingConfig in tests)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] require noDictionaryColumns with aggregationConfigs [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12464:
URL: https://github.com/apache/pinot/pull/12464#discussion_r1500266732


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java:
##########
@@ -490,6 +493,19 @@ public static void validateIngestionConfig(TableConfig tableConfig, @Nullable Sc
           Preconditions.checkState(new HashSet<>(schema.getMetricNames()).equals(aggregationColumns),
               "all metric columns must be aggregated");
         }
+
+        // This is required by MutableSegmentImpl.enableMetricsAggregationIfPossible().
+        // That code will disable ingestion aggregation if all metrics aren't noDictionaryColumns.
+        // But if you do that after the table is already created, all future aggregations will
+        // just be the default value.
+        Map<String, FieldIndexConfigs> fieldIndexConfigsMap = FieldIndexConfigsUtil.createIndexConfigsByColName(

Review Comment:
   @gortiz Is there a way to only extract the `DictionaryIndexConfig` for all columns? Here we just need to find all the no-dictionary columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] require noDictionaryColumns with aggregationConfigs [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12464:
URL: https://github.com/apache/pinot/pull/12464#issuecomment-1958495576

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12464?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `5 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`e161b78`)](https://app.codecov.io/gh/apache/pinot/commit/e161b786f5e4426fcfee1d788d7def46b1d8c24b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.73% compared to head [(`e9b601d`)](https://app.codecov.io/gh/apache/pinot/pull/12464?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 0.00%.
   > Report is 1 commits behind head on master.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12464?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [...he/pinot/segment/local/utils/TableConfigUtils.java](https://app.codecov.io/gh/apache/pinot/pull/12464?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9UYWJsZUNvbmZpZ1V0aWxzLmphdmE=) | 0.00% | [5 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12464?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12464       +/-   ##
   =============================================
   - Coverage     61.73%    0.00%   -61.74%     
   =============================================
     Files          2436     2360       -76     
     Lines        133185   129442     -3743     
     Branches      20629    20071      -558     
   =============================================
   - Hits          82227        0    -82227     
   - Misses        44911   129442    +84531     
   + Partials       6047        0     -6047     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-0.01%)` | :arrow_down: |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.62%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.60%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.74%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12464/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12464?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] require noDictionaryColumns with aggregationConfigs [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #12464:
URL: https://github.com/apache/pinot/pull/12464


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org