You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "sajjad-moradi (via GitHub)" <gi...@apache.org> on 2023/07/25 00:57:14 UTC
[GitHub] [pinot] sajjad-moradi opened a new pull request, #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors
sajjad-moradi opened a new pull request, #11166:
URL: https://github.com/apache/pinot/pull/11166
## Description
This PR fixes the issue described in #11165 by moving any stream interactions out of the OFFLINE to CONSUMING transition handling.
## Testing Done
Ran an integration test locally and simulated an exception during stream consumer creation: the segment went to OFFLINE state. Then kicked off `RealtimeSegmentValidationManager` job and verified that a new consuming segment was created, and consumption began successfully.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1648834252
## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
> Merging [#11166](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (a735cff) into [master](https://app.codecov.io/gh/apache/pinot/commit/ea493a253297a3e086f264787fb849e99ceb0bef?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (ea493a2) will **increase** coverage by `0.00%`.
> The diff coverage is `0.00%`.
```diff
@@ Coverage Diff @@
## master #11166 +/- ##
=========================================
Coverage 0.11% 0.11%
=========================================
Files 2212 2158 -54
Lines 118625 116118 -2507
Branches 17948 17642 -306
=========================================
Hits 137 137
+ Misses 118468 115961 -2507
Partials 20 20
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1temurin11 | `?` | |
| integration1temurin17 | `?` | |
| integration1temurin20 | `?` | |
| integration2temurin11 | `?` | |
| integration2temurin17 | `?` | |
| integration2temurin20 | `?` | |
| unittests1temurin11 | `?` | |
| unittests1temurin17 | `?` | |
| unittests1temurin20 | `?` | |
| unittests2temurin11 | `?` | |
| unittests2temurin17 | `?` | |
| unittests2temurin20 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Files Changed](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [...manager/realtime/LLRealtimeSegmentDataManager.java](https://app.codecov.io/gh/apache/pinot/pull/11166?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvTExSZWFsdGltZVNlZ21lbnREYXRhTWFuYWdlci5qYXZh) | `0.00% <0.00%> (ø)` | |
... and [63 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11166/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] sajjad-moradi merged pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors
Posted by "sajjad-moradi (via GitHub)" <gi...@apache.org>.
sajjad-moradi merged PR #11166:
URL: https://github.com/apache/pinot/pull/11166
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] mcvsubbu commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors
Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1650607286
I am worried that we may not be able to test all combinations of the case if we post a stopConsume message. Does it still work if Helix has not yet gotten around to set externalview to ERROR state?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] sajjad-moradi commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors
Posted by "sajjad-moradi (via GitHub)" <gi...@apache.org>.
sajjad-moradi commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1650289291
> Does it work if we directly send stop consume message when it throws exception in the constructor?
That's a great idea. I actually verified that it works as expected:
Before the problematic segment goes to ERROR state, the IS is marked as OFFLINE for the segment. For a very short period of time, the EV is in ERROR state till Helix kicks off `ERROR -> OFFLINE` transition for that segment. Then it becomes OFFLINE. After that kicking off `RealtimeSegmentValidationManager` creates a new consuming segment for the partition.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11166:
URL: https://github.com/apache/pinot/pull/11166#discussion_r1272916639
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -234,7 +234,7 @@ public void deleteSegmentFile() {
private final ServerMetrics _serverMetrics;
private final PartitionUpsertMetadataManager _partitionUpsertMetadataManager;
private final BooleanSupplier _isReadyToConsumeData;
- private final MutableSegmentImpl _realtimeSegment;
+ private MutableSegmentImpl _realtimeSegment;
Review Comment:
(MAJOR) I think this can cause NPE when querying the segment if the segment is not set up before state message returns
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] sajjad-moradi commented on pull request #11166: Fix the issue of consuming segment entering ERROR state due to stream connection errors
Posted by "sajjad-moradi (via GitHub)" <gi...@apache.org>.
sajjad-moradi commented on PR #11166:
URL: https://github.com/apache/pinot/pull/11166#issuecomment-1650714624
> I am worried that we may not be able to test all combinations of the case if we post a stopConsume message. Does it still work if Helix has not yet gotten around to set externalview to ERROR state?
I believe because we've marked IS as OFFLINE, EV will eventually be OFFLINE.
There are two possibilities here [since before throwing exception in handling `OFFLINE->CONSUMING` transition, controller has marked IS as OFFLINE]:
- EV goes to ERROR and then Helix tries to match the current state (ERROR) to ideal state (OFFLINE)
- Helix tries to match IS and EV before EV goes to ERROR. In this case both IS and EV are OFFLINE, so helix doesn't do anything.
In both scenarios, the EV ends up being OFFLINE for the segment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org