You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "sajjad-moradi (via GitHub)" <gi...@apache.org> on 2023/07/25 00:57:14 UTC

[GitHub] [pinot] sajjad-moradi opened a new pull request, #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors

sajjad-moradi opened a new pull request, #11166:
URL: https://github.com/apache/pinot/pull/11166

   ## Description
   
   This PR fixes the issue described in #11165 by moving any stream interactions out of the OFFLINE to CONSUMING transition handling.
   
   ## Testing Done
   Ran an integration test locally and simulated an exception during stream consumer creation: the segment went to OFFLINE state. Then kicked off `RealtimeSegmentValidationManager` job and verified that a new consuming segment was created, and consumption began successfully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1648834252

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11166](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (a735cff) into [master](https://app.codecov.io/gh/apache/pinot/commit/ea493a253297a3e086f264787fb849e99ceb0bef?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (ea493a2) will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff            @@
   ##           master   #11166     +/-   ##
   =========================================
     Coverage    0.11%    0.11%             
   =========================================
     Files        2212     2158     -54     
     Lines      118625   116118   -2507     
     Branches    17948    17642    -306     
   =========================================
     Hits          137      137             
   + Misses     118468   115961   -2507     
     Partials       20       20             
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files Changed](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...manager/realtime/LLRealtimeSegmentDataManager.java](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvTExSZWFsdGltZVNlZ21lbnREYXRhTWFuYWdlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   
   ... and [63 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11166/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi merged pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors

Posted by "sajjad-moradi (via GitHub)" <gi...@apache.org>.
sajjad-moradi merged PR #11166:
URL: https://github.com/apache/pinot/pull/11166


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors

Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1650607286

   I am worried that we may not be able to test all combinations of the case if we post a stopConsume message. Does it still work if Helix has not yet gotten around to set externalview to ERROR state?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors

Posted by "sajjad-moradi (via GitHub)" <gi...@apache.org>.
sajjad-moradi commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1650289291

   > Does it work if we directly send stop consume message when it throws exception in the constructor?
   
   That's a great idea. I actually verified that it works as expected:
   Before the problematic segment goes to ERROR state, the IS is marked as OFFLINE for the segment. For a very short period of time, the EV is in ERROR state till Helix kicks off `ERROR -> OFFLINE` transition for that segment. Then it becomes OFFLINE. After that kicking off `RealtimeSegmentValidationManager` creates a new consuming segment for the partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11166:
URL: https://github.com/apache/pinot/pull/11166#discussion_r1272916639


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -234,7 +234,7 @@ public void deleteSegmentFile() {
   private final ServerMetrics _serverMetrics;
   private final PartitionUpsertMetadataManager _partitionUpsertMetadataManager;
   private final BooleanSupplier _isReadyToConsumeData;
-  private final MutableSegmentImpl _realtimeSegment;
+  private MutableSegmentImpl _realtimeSegment;

Review Comment:
   (MAJOR) I think this can cause NPE when querying the segment if the segment is not set up before state message returns



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors

Posted by "sajjad-moradi (via GitHub)" <gi...@apache.org>.
sajjad-moradi commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1650714624

   > I am worried that we may not be able to test all combinations of the case if we post a stopConsume message. Does it still work if Helix has not yet gotten around to set externalview to ERROR state?
   
   I believe because we've marked IS as OFFLINE, EV will eventually be OFFLINE.
   There are two possibilities here [since before throwing exception in handling `OFFLINE->CONSUMING` transition, controller has marked IS as OFFLINE]:
   - EV goes to ERROR and then Helix tries to match the current state (ERROR) to ideal state (OFFLINE)
   - Helix tries to match IS and EV before EV goes to ERROR. In this case both IS and EV are OFFLINE, so helix doesn't do anything.
   
   In both scenarios, the EV ends up being OFFLINE for the segment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org