You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/06 23:15:29 UTC

[GitHub] [pinot] jmint-stripe opened a new issue #7874: Realtime consumption halted if segment state transition fails

jmint-stripe opened a new issue #7874:
URL: https://github.com/apache/pinot/issues/7874


   We observed some realtime ingestion lag on one of our Pinot clusters. After some investigation we determined that the lag was happening on a subset of the partitions for the Kafka stream we were ingesting from.
   
   Analyzing the logs showed that this was caused by a temporary ZooKeeper connection issue that caused a cascade of `InterupptedException` and this caused some segment state transitions from `OFFLINE` to `CONSUMING` to fail. 
   
   Some relevant log messages:
   
   ```
   2021/11/30 01:55:15.334 WARN [ZKHelixManager] [HelixTaskExecutor-message_handle_STATE_TRANSITION] zkClient to [redacted] is not connected, wait for 10000ms.
   ```
   
   ```
   Exception while executing a state transition task [redacted segment name]
       ...
       Caused by: java.lang.RuntimeException: InterruptedException when acquiring the partitionConsumerSemaphore for segment: [redacted segment name]
   ```
   
   ```
   2021/11/30 01:55:15.334 ERROR [HelixTask] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Exception after executing a message, msgId: 76da755d-4ae3-4d61-84e6-11a946f6bffcorg.I0Itec.zkclient.exception.ZkInterruptedException: java.lang.InterruptedException
       org.I0Itec.zkclient.exception.ZkInterruptedException: java.lang.InterruptedException
               at org.apache.helix.manager.zk.zookeeper.ZkClient.acquireEventLock(ZkClient.java:1142)
       ...
   ```
   
   The end result was that consumption stopped for the partitions represented by these segments that had failed state transitions.
   
   In order to get the servers to start consuming for those partitions again we had to restart the servers hosting those segments. The expectation is that Pinot should be able to eventually recover and start consuming again once the ZooKeeper connection is available again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi commented on issue #7874: Realtime consumption halted if segment state transition fails

Posted by GitBox <gi...@apache.org>.
sajjad-moradi commented on issue #7874:
URL: https://github.com/apache/pinot/issues/7874#issuecomment-990091639


   PR for the fix: https://github.com/apache/pinot/pull/7886


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi removed a comment on issue #7874: Realtime consumption halted if segment state transition fails

Posted by GitBox <gi...@apache.org>.
sajjad-moradi removed a comment on issue #7874:
URL: https://github.com/apache/pinot/issues/7874#issuecomment-990091639


   PR for the fix: https://github.com/apache/pinot/pull/7886


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi commented on issue #7874: Realtime consumption halted if segment state transition fails

Posted by GitBox <gi...@apache.org>.
sajjad-moradi commented on issue #7874:
URL: https://github.com/apache/pinot/issues/7874#issuecomment-989597928


   Ideally calling segment _reset_ endpoint on controller should fix the problem and you shouldn't need to restart the server. I just looked into the code for consuming segments and found the issue that prevents segment reset doing its job. I'll create the PR for the fix soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen edited a comment on issue #7874: Realtime consumption halted if segment state transition fails

Posted by GitBox <gi...@apache.org>.
mapshen edited a comment on issue #7874:
URL: https://github.com/apache/pinot/issues/7874#issuecomment-989277508


   +1 on this. We experienced this issue as well when there was a Zookeeper issue a couple of weeks ago. Tried to reset/reload the segment, but it went back into the bad state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen commented on issue #7874: Realtime consumption halted if segment state transition fails

Posted by GitBox <gi...@apache.org>.
mapshen commented on issue #7874:
URL: https://github.com/apache/pinot/issues/7874#issuecomment-989277508


   +1 on this. We experienced this issue as well when there is a Zookeeper issue a couple of weeks ago. Tried to reset/reload the segment, but it went back into the bad state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org