You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@pekko.apache.org by "nvollmar (via GitHub)" <gi...@apache.org> on 2024/03/07 08:52:31 UTC

[I] TcpDnsClient cannot recover if registration on TcpConnection times out [incubator-pekko]

nvollmar opened a new issue, #1182:
URL: https://github.com/apache/incubator-pekko/issues/1182

   I uncovered this investigating cluster issues on our nightly deployment test. Since we started to use a low power cpu governor during the night we started seeing issues of a Pekko cluster forming during the nightly deployment.
   
   I've tracked it down to the `TcpDnsClient` / `TcpConnection` initialization timing out, leaving it in a state it cannot recover from and never responding to any requests.
   
   The `TcpOutgoingConnection` is connecting and responds with a `Tcp.Connected` message to the `TcpDnsClient`, which in turn registers itself on the connection again:
   https://github.com/apache/incubator-pekko/blob/46e60a61fbabce5e3f36a408bfa3d1fb249eef44/actor/src/main/scala/org/apache/pekko/io/dns/internal/TcpDnsClient.scala#L52
   
   
   If that message arrives late, the `TcpOutgoingConnection` will stop itself and `TcpDnsClient` has no detection or handling for this case:
   
   https://github.com/apache/incubator-pekko/blob/46e60a61fbabce5e3f36a408bfa3d1fb249eef44/actor/src/main/scala/org/apache/pekko/io/TcpConnection.scala#L104
   
   This is a very unusual case, but it happens almost every deployment for one or two pods when the system is in low power mode.
   
   Proposed fix: `TcpDnsClient` must watch the connection and fail on termination to re-initialize (it is already handled by a backoff supervisor)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] TcpDnsClient cannot recover if registration on TcpConnection times out [pekko]

Posted by "nvollmar (via GitHub)" <gi...@apache.org>.
nvollmar commented on issue #1182:
URL: https://github.com/apache/pekko/issues/1182#issuecomment-2097411693

   @pjfanning Since we ran into that a couple of time now, I'd like to backport to 1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] TcpDnsClient cannot recover if registration on TcpConnection times out [pekko]

Posted by "nvollmar (via GitHub)" <gi...@apache.org>.
nvollmar commented on issue #1182:
URL: https://github.com/apache/pekko/issues/1182#issuecomment-2098175636

   Sure, will do


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] TcpDnsClient cannot recover if registration on TcpConnection times out [incubator-pekko]

Posted by "nvollmar (via GitHub)" <gi...@apache.org>.
nvollmar commented on issue #1182:
URL: https://github.com/apache/incubator-pekko/issues/1182#issuecomment-1983528490

   The `TcpDnsClient` didn't receive anything as the `TcpOutgoingConnection` just does `context.stop(self)` in case of a timeout. The client then is basically dead and can't recover without restarting the actor system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] TcpDnsClient cannot recover if registration on TcpConnection times out [pekko]

Posted by "pjfanning (via GitHub)" <gi...@apache.org>.
pjfanning commented on issue #1182:
URL: https://github.com/apache/pekko/issues/1182#issuecomment-2097909039

   @nvollmar sure - could you create a cherry pick PR that targets the 1.0.x branch and add that new PR to the 1.0.3 milestone?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] TcpDnsClient cannot recover if registration on TcpConnection times out [incubator-pekko]

Posted by "nvollmar (via GitHub)" <gi...@apache.org>.
nvollmar closed issue #1182: TcpDnsClient cannot recover if registration on TcpConnection times out
URL: https://github.com/apache/incubator-pekko/issues/1182


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] TcpDnsClient cannot recover if registration on TcpConnection times out [incubator-pekko]

Posted by "Roiocam (via GitHub)" <gi...@apache.org>.
Roiocam commented on issue #1182:
URL: https://github.com/apache/incubator-pekko/issues/1182#issuecomment-1983480480

   The message was sent to the connection immediately, but you observed that the message was late. 
   
   I noticed that TcpOutgoingConnection can reply to TcpDnsClient after an exception(postStop). Is it possible that the terminated response of Connection is not received by TcpDnsClient in time due to the lack of active scheduling of low-power CPU?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org