You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "soumitra-st (via GitHub)" <gi...@apache.org> on 2023/11/16 21:47:39 UTC

[PR] Add a check to enable size based threshold for realtime tables [pinot]

soumitra-st opened a new pull request, #12016:
URL: https://github.com/apache/pinot/pull/12016

   As per the [Pinot document](https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime#controlling-number-of-rows-in-consuming-segment), to enable the segment size based threshold rows must be set to 0. Adding a check during table creation, if realtime.segment.flush.threshold.segment.size is set, realtime.segment.flush.threshold.rows must be set to 0.
   
   Added a test, and also ran below curl command to ensure if the table config has invalid combination, then the API fails:
   
   % curl -X POST http://localhost:9000/tables -H 'accept: application/json' -H 'Content-Type: application/json' -d @/Users/soumitra/pinot-tutorial/transcript/transcript-table-realtime.json
   {"code":400,"error":"Could not create StreamConfig using the streamConfig map: Invalid config: realtime.segment.flush.threshold.rows=1000000, it must be 0 for size based segment to work."}
   
   `backward-incompat`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #12016:
URL: https://github.com/apache/pinot/pull/12016


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12016:
URL: https://github.com/apache/pinot/pull/12016#discussion_r1397982032


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java:
##########
@@ -157,7 +157,8 @@ public static void validate(TableConfig tableConfig, @Nullable Schema schema, @N
           // Validate that StreamConfig can be created
           streamConfig = new StreamConfig(tableConfig.getTableName(), streamConfigMap);
         } catch (Exception e) {
-          throw new IllegalStateException("Could not create StreamConfig using the streamConfig map", e);
+          throw new IllegalStateException("Could not create StreamConfig using the streamConfig map: " + e.getMessage(),

Review Comment:
   Since we already included the exception, do not add message again



##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamConfig.java:
##########
@@ -172,7 +172,7 @@ public StreamConfig(String tableNameWithType, Map<String, String> streamConfigMa
 
     _flushThresholdRows = extractFlushThresholdRows(streamConfigMap);
     _flushThresholdTimeMillis = extractFlushThresholdTimeMillis(streamConfigMap);
-    _flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap);
+    _flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap, _flushThresholdRows);

Review Comment:
   Do not add check within the constructor because it will break existing table configs. For new checks we want to enforce, add it into `TableConfigUtils.validate()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12016:
URL: https://github.com/apache/pinot/pull/12016#issuecomment-1815439243

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `18 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`1ab9e62`)](https://app.codecov.io/gh/apache/pinot/commit/1ab9e62b6d1908dda598eaaff9f6a2d8cc5a7b65?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.63% compared to head [(`237faac`)](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 27.59%.
   > Report is 3 commits behind head on master.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [...e/data/manager/realtime/IngestionDelayTracker.java](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvSW5nZXN0aW9uRGVsYXlUcmFja2VyLmphdmE=) | 0.00% | [13 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...java/org/apache/pinot/spi/stream/StreamConfig.java](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvc3RyZWFtL1N0cmVhbUNvbmZpZy5qYXZh) | 0.00% | [4 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   | [...he/pinot/segment/local/utils/TableConfigUtils.java](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9UYWJsZUNvbmZpZ1V0aWxzLmphdmE=) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12016       +/-   ##
   =============================================
   - Coverage     61.63%   27.59%   -34.05%     
   + Complexity     1151      207      -944     
   =============================================
     Files          2385     2385               
     Lines        129271   129278        +7     
     Branches      20016    20017        +1     
   =============================================
   - Hits          79682    35679    -44003     
   - Misses        43792    90907    +47115     
   + Partials       5797     2692     -3105     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (ø)` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (ø)` | |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (ø)` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.00%)` | :arrow_down: |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.51%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.03%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.49%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.05%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.05%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (+0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12016:
URL: https://github.com/apache/pinot/pull/12016#discussion_r1396526847


##########
pinot-core/src/test/java/org/apache/pinot/core/realtime/stream/StreamConfigTest.java:
##########
@@ -279,6 +285,16 @@ public void testStreamConfigValidations() {
     streamConfig = new StreamConfig(tableName, streamConfigMap);
     assertEquals(streamConfig.getFlushThresholdSegmentSizeBytes(),
         StreamConfig.DEFAULT_FLUSH_THRESHOLD_SEGMENT_SIZE_BYTES);
+
+    // If size based threshold is set, then rows must be 0
+    streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_ROWS, "1000000");
+    streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_SEGMENT_SIZE, "100M");
+    try {
+      new StreamConfig(tableName, streamConfigMap);
+      fail("Invalid config: flush threshold rows must be 0, when flush threshold size is set.");
+    } catch (Exception e) {
+      // Expected
+    }

Review Comment:
   i am not sure this is what the document intended to mean
   what i read was:
   ```
   You can pick the appropriate value for segment size and number of hours in the table config, and set the number of rows to zero. Note that you don't have to pick values exactly as given in each of these combinations (they are calculated guesses anyway).
   ```
   it wouldn't throw exception if num rows is set to non-zero. 
   
   my concern is that this will cause issue during cluster upgrade. if any user have both configuration set table will be unable to load in an error state .
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org