You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "soumitra-st (via GitHub)" <gi...@apache.org> on 2023/11/16 21:47:39 UTC
[PR] Add a check to enable size based threshold for realtime tables [pinot]
soumitra-st opened a new pull request, #12016:
URL: https://github.com/apache/pinot/pull/12016
As per the [Pinot document](https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime#controlling-number-of-rows-in-consuming-segment), to enable the segment size based threshold rows must be set to 0. Adding a check during table creation, if realtime.segment.flush.threshold.segment.size is set, realtime.segment.flush.threshold.rows must be set to 0.
Added a test, and also ran below curl command to ensure if the table config has invalid combination, then the API fails:
% curl -X POST http://localhost:9000/tables -H 'accept: application/json' -H 'Content-Type: application/json' -d @/Users/soumitra/pinot-tutorial/transcript/transcript-table-realtime.json
{"code":400,"error":"Could not create StreamConfig using the streamConfig map: Invalid config: realtime.segment.flush.threshold.rows=1000000, it must be 0 for size based segment to work."}
`backward-incompat`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #12016:
URL: https://github.com/apache/pinot/pull/12016
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12016:
URL: https://github.com/apache/pinot/pull/12016#discussion_r1397982032
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java:
##########
@@ -157,7 +157,8 @@ public static void validate(TableConfig tableConfig, @Nullable Schema schema, @N
// Validate that StreamConfig can be created
streamConfig = new StreamConfig(tableConfig.getTableName(), streamConfigMap);
} catch (Exception e) {
- throw new IllegalStateException("Could not create StreamConfig using the streamConfig map", e);
+ throw new IllegalStateException("Could not create StreamConfig using the streamConfig map: " + e.getMessage(),
Review Comment:
Since we already included the exception, do not add message again
##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamConfig.java:
##########
@@ -172,7 +172,7 @@ public StreamConfig(String tableNameWithType, Map<String, String> streamConfigMa
_flushThresholdRows = extractFlushThresholdRows(streamConfigMap);
_flushThresholdTimeMillis = extractFlushThresholdTimeMillis(streamConfigMap);
- _flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap);
+ _flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap, _flushThresholdRows);
Review Comment:
Do not add check within the constructor because it will break existing table configs. For new checks we want to enforce, add it into `TableConfigUtils.validate()`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12016:
URL: https://github.com/apache/pinot/pull/12016#issuecomment-1815439243
## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
Attention: `18 lines` in your changes are missing coverage. Please review.
> Comparison is base [(`1ab9e62`)](https://app.codecov.io/gh/apache/pinot/commit/1ab9e62b6d1908dda598eaaff9f6a2d8cc5a7b65?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.63% compared to head [(`237faac`)](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 27.59%.
> Report is 3 commits behind head on master.
| [Files](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
|---|---|---|
| [...e/data/manager/realtime/IngestionDelayTracker.java](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvSW5nZXN0aW9uRGVsYXlUcmFja2VyLmphdmE=) | 0.00% | [13 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...java/org/apache/pinot/spi/stream/StreamConfig.java](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvc3RyZWFtL1N0cmVhbUNvbmZpZy5qYXZh) | 0.00% | [4 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
| [...he/pinot/segment/local/utils/TableConfigUtils.java](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9UYWJsZUNvbmZpZ1V0aWxzLmphdmE=) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
<details><summary>Additional details and impacted files</summary>
```diff
@@ Coverage Diff @@
## master #12016 +/- ##
=============================================
- Coverage 61.63% 27.59% -34.05%
+ Complexity 1151 207 -944
=============================================
Files 2385 2385
Lines 129271 129278 +7
Branches 20016 20017 +1
=============================================
- Hits 79682 35679 -44003
- Misses 43792 90907 +47115
+ Partials 5797 2692 -3105
```
| [Flag](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (ø)` | |
| [integration](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (ø)` | |
| [integration1](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (ø)` | |
| [integration2](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (ø)` | |
| [java-11](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.00%)` | :arrow_down: |
| [java-21](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `<0.01% <0.00%> (-61.51%)` | :arrow_down: |
| [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.03%)` | :arrow_down: |
| [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `0.00% <0.00%> (-61.49%)` | :arrow_down: |
| [temurin](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.05%)` | :arrow_down: |
| [unittests](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (-34.05%)` | :arrow_down: |
| [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12016/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `27.59% <0.00%> (+0.01%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
</details>
[:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
:loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Add a check to enable size based threshold for realtime tables [pinot]
Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12016:
URL: https://github.com/apache/pinot/pull/12016#discussion_r1396526847
##########
pinot-core/src/test/java/org/apache/pinot/core/realtime/stream/StreamConfigTest.java:
##########
@@ -279,6 +285,16 @@ public void testStreamConfigValidations() {
streamConfig = new StreamConfig(tableName, streamConfigMap);
assertEquals(streamConfig.getFlushThresholdSegmentSizeBytes(),
StreamConfig.DEFAULT_FLUSH_THRESHOLD_SEGMENT_SIZE_BYTES);
+
+ // If size based threshold is set, then rows must be 0
+ streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_ROWS, "1000000");
+ streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_SEGMENT_SIZE, "100M");
+ try {
+ new StreamConfig(tableName, streamConfigMap);
+ fail("Invalid config: flush threshold rows must be 0, when flush threshold size is set.");
+ } catch (Exception e) {
+ // Expected
+ }
Review Comment:
i am not sure this is what the document intended to mean
what i read was:
```
You can pick the appropriate value for segment size and number of hours in the table config, and set the number of rows to zero. Note that you don't have to pick values exactly as given in each of these combinations (they are calculated guesses anyway).
```
it wouldn't throw exception if num rows is set to non-zero.
my concern is that this will cause issue during cluster upgrade. if any user have both configuration set table will be unable to load in an error state .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org