You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/07/06 00:34:07 UTC

[GitHub] [pinot] npawar opened a new pull request, #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

npawar opened a new pull request, #9020:
URL: https://github.com/apache/pinot/pull/9020

   Recently saw an exception in RealtimeValidationManager, wherein the partitionGroupSmallestOffset was returned as null (when trying to fix a segment which was marked OFFLINE). As a result, the periodic task encounters NPE and exits.
   In this PR
   1. Handling the null value gracefully
   2. Adding a metric for catching errors in PeriodicTasks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a diff in pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
npawar commented on code in PR #9020:
URL: https://github.com/apache/pinot/pull/9020#discussion_r914318161


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTask.java:
##########
@@ -116,6 +116,8 @@ protected void processTables(List<String> tableNamesWithType, Properties periodi
         processTable(tableNameWithType, context);
       } catch (Exception e) {
         LOGGER.error("Caught exception while processing table: {} in task: {}", tableNameWithType, _taskName, e);
+        _controllerMetrics.addMeteredTableValue(tableNameWithType + "." + _taskName,
+            ControllerMeter.PERIODIC_TASK_ERROR, 1L);

Review Comment:
   actually it'll be resolved as `<rawTableName>_<OFFLINE|REALTIME>.<periodicTaskName>`. Have set the expression in the yml accordingly 
   ```
   name=\"pinot.controller.(\\w+)_(\\w+).(\\w+).periodicTaskError\
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #9020:
URL: https://github.com/apache/pinot/pull/9020#issuecomment-1175656247

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/9020?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#9020](https://codecov.io/gh/apache/pinot/pull/9020?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4af3088) into [master](https://codecov.io/gh/apache/pinot/commit/de16a0a35d93f3fe0393df563015e7dd1298ceb9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (de16a0a) will **decrease** coverage by `54.71%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #9020       +/-   ##
   =============================================
   - Coverage     70.08%   15.36%   -54.72%     
   + Complexity     4957      170     -4787     
   =============================================
     Files          1827     1780       -47     
     Lines         96064    94008     -2056     
     Branches      14356    14125      -231     
   =============================================
   - Hits          67327    14447    -52880     
   - Misses        24092    78531    +54439     
   + Partials       4645     1030     -3615     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `15.36% <0.00%> (+0.02%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/9020?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...g/apache/pinot/common/metrics/ControllerMeter.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyTWV0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...elix/core/periodictask/ControllerPeriodicTask.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3BlcmlvZGljdGFzay9Db250cm9sbGVyUGVyaW9kaWNUYXNrLmphdmE=) | `66.07% <0.00%> (-1.21%)` | :arrow_down: |
   | [.../core/realtime/PinotLLCRealtimeSegmentManager.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3JlYWx0aW1lL1Bpbm90TExDUmVhbHRpbWVTZWdtZW50TWFuYWdlci5qYXZh) | `65.70% <0.00%> (-12.08%)` | :arrow_down: |
   | [...src/main/java/org/apache/pinot/sql/FilterKind.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcWwvRmlsdGVyS2luZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ain/java/org/apache/pinot/core/data/table/Key.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL0tleS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/data/table/Record.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1JlY29yZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/util/GroupByUtils.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS91dGlsL0dyb3VwQnlVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/spi/trace/BaseRecording.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdHJhY2UvQmFzZVJlY29yZGluZy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/spi/trace/NoOpRecording.java](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdHJhY2UvTm9PcFJlY29yZGluZy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1383 more](https://codecov.io/gh/apache/pinot/pull/9020/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/9020?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/9020?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [de16a0a...4af3088](https://codecov.io/gh/apache/pinot/pull/9020?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar merged pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
npawar merged PR #9020:
URL: https://github.com/apache/pinot/pull/9020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
walterddr commented on code in PR #9020:
URL: https://github.com/apache/pinot/pull/9020#discussion_r914366824


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTask.java:
##########
@@ -116,6 +116,8 @@ protected void processTables(List<String> tableNamesWithType, Properties periodi
         processTable(tableNameWithType, context);
       } catch (Exception e) {
         LOGGER.error("Caught exception while processing table: {} in task: {}", tableNameWithType, _taskName, e);
+        _controllerMetrics.addMeteredTableValue(tableNameWithType + "." + _taskName,
+            ControllerMeter.PERIODIC_TASK_ERROR, 1L);

Review Comment:
   ok looks like `(\\w+)_(\\w+)` does the resolution to right most (e.g. if there's underscore in table name it will make the second `(\\w+)` the smallest matching group. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
walterddr commented on code in PR #9020:
URL: https://github.com/apache/pinot/pull/9020#discussion_r914366824


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTask.java:
##########
@@ -116,6 +116,8 @@ protected void processTables(List<String> tableNamesWithType, Properties periodi
         processTable(tableNameWithType, context);
       } catch (Exception e) {
         LOGGER.error("Caught exception while processing table: {} in task: {}", tableNameWithType, _taskName, e);
+        _controllerMetrics.addMeteredTableValue(tableNameWithType + "." + _taskName,
+            ControllerMeter.PERIODIC_TASK_ERROR, 1L);

Review Comment:
   ok looks like `(\\w+)_(\\w+)` does the resolution to right most (e.g. if there's underscores in table name, it will make the first `(\\w+)` largest possible and second `(\\w+)` the smallest matching group). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
walterddr commented on code in PR #9020:
URL: https://github.com/apache/pinot/pull/9020#discussion_r914311851


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/periodictask/ControllerPeriodicTask.java:
##########
@@ -116,6 +116,8 @@ protected void processTables(List<String> tableNamesWithType, Properties periodi
         processTable(tableNameWithType, context);
       } catch (Exception e) {
         LOGGER.error("Caught exception while processing table: {} in task: {}", tableNameWithType, _taskName, e);
+        _controllerMetrics.addMeteredTableValue(tableNameWithType + "." + _taskName,
+            ControllerMeter.PERIODIC_TASK_ERROR, 1L);

Review Comment:
   assuming this will be resolved as `<table_name>.<OFFLINE|REALTIME>.<TaskName>`? (e.g. tableNameWithType here is dot-separated)



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1215,12 +1215,18 @@ private void createNewConsumingSegment(TableConfig tableConfig, PartitionLevelSt
     StreamPartitionMsgOffset partitionGroupSmallestOffset =
         getPartitionGroupSmallestOffset(streamConfig, partitionGroupId);
 
-    // Start offset must be higher than the start offset of the stream
-    if (partitionGroupSmallestOffset.compareTo(startOffset) > 0) {
-      LOGGER.error("Data lost from offset: {} to: {} for partition: {} of table: {}", startOffset,
-          partitionGroupSmallestOffset, partitionGroupId, tableConfig.getTableName());
+    if (partitionGroupSmallestOffset != null) {
+      // Start offset must be higher than the start offset of the stream
+      if (partitionGroupSmallestOffset.compareTo(startOffset) > 0) {

Review Comment:
   assuming startOffset is never null yes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on a diff in pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
npawar commented on code in PR #9020:
URL: https://github.com/apache/pinot/pull/9020#discussion_r914317759


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1215,12 +1215,18 @@ private void createNewConsumingSegment(TableConfig tableConfig, PartitionLevelSt
     StreamPartitionMsgOffset partitionGroupSmallestOffset =
         getPartitionGroupSmallestOffset(streamConfig, partitionGroupId);
 
-    // Start offset must be higher than the start offset of the stream
-    if (partitionGroupSmallestOffset.compareTo(startOffset) > 0) {
-      LOGGER.error("Data lost from offset: {} to: {} for partition: {} of table: {}", startOffset,
-          partitionGroupSmallestOffset, partitionGroupId, tableConfig.getTableName());
+    if (partitionGroupSmallestOffset != null) {
+      // Start offset must be higher than the start offset of the stream
+      if (partitionGroupSmallestOffset.compareTo(startOffset) > 0) {

Review Comment:
   correct. that's coming from the startOffset value set in segmentZkMetadata



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on a diff in pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
KKcorps commented on code in PR #9020:
URL: https://github.com/apache/pinot/pull/9020#discussion_r914526561


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -1215,12 +1215,18 @@ private void createNewConsumingSegment(TableConfig tableConfig, PartitionLevelSt
     StreamPartitionMsgOffset partitionGroupSmallestOffset =
         getPartitionGroupSmallestOffset(streamConfig, partitionGroupId);
 
-    // Start offset must be higher than the start offset of the stream
-    if (partitionGroupSmallestOffset.compareTo(startOffset) > 0) {
-      LOGGER.error("Data lost from offset: {} to: {} for partition: {} of table: {}", startOffset,
-          partitionGroupSmallestOffset, partitionGroupId, tableConfig.getTableName());
+    if (partitionGroupSmallestOffset != null) {
+      // Start offset must be higher than the start offset of the stream
+      if (partitionGroupSmallestOffset.compareTo(startOffset) > 0) {

Review Comment:
   This is fine but what I am more concerned about is why it comes null in the first place. We definitely need this check but will need to dig a bit deeper as well for future PRs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on pull request #9020: Null check for partitionGroupSmallestOffset and metric for failure in RealtimeSegmentValidationManager

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on PR #9020:
URL: https://github.com/apache/pinot/pull/9020#issuecomment-1176485634

   What was the stream type of the table in which this error happened? Could be a bug in retrieving the smallest offset in one of the stream implementations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org