You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Matt Nohelty <no...@gmail.com> on 2019/07/31 14:54:42 UTC

What happens when a client gets disconnected

Sorry for the long delay in responding to this issue.  I will work on
replicating this issue in a more controlled test environment and try to
grab thread dumps from there.

In a previous post you mentioned that the blocking in this thread dump
should only happen when a data node is affected which is usually a server
node and you also said that near cache consistency is observed
continuously.  If we have near caching enabled, does that mean clients
become data nodes?  If that's the case, does that explain why we are seeing
blocking when a client crashes or hangs?

Assuming this is related to near caching, is there any configuration to
adjust this behavior to give us availability over perfect consistency?
Having a failure on one client ripple across the entire system and
effectively take down all other clients of that cluster is a major problem.
We obviously want to avoid problems like an OOM error or a big GC pause in
the client application but if these things happen we need to be able to
absorb these gracefully and limit the blast radius to just that client
node.

Re: What happens when a client gets disconnected

Posted by Andrei Aleksandrov <ae...@gmail.com>.
Hi,

I guess that you should provide the full client and server logs, 
configuration files and reproducer if it's possible for case when the 
client node with near cache was able to crush the whole cluster.

Looks like it can be the issue here and the best way will be raise the 
JIRA ticket for it after analyze of provided data.

BR,
Andrei

On 2019/07/31 14:54:42, Matt Nohelty <n....@gmail.com> wrote:
 > Sorry for the long delay in responding to this issue. I will work on>
 > replicating this issue in a more controlled test environment and try to>
 > grab thread dumps from there.>
 >
 > In a previous post you mentioned that the blocking in this thread dump>
 > should only happen when a data node is affected which is usually a 
server>
 > node and you also said that near cache consistency is observed>
 > continuously. If we have near caching enabled, does that mean clients>
 > become data nodes? If that's the case, does that explain why we are 
seeing>
 > blocking when a client crashes or hangs?>
 >
 > Assuming this is related to near caching, is there any configuration to>
 > adjust this behavior to give us availability over perfect consistency?>
 > Having a failure on one client ripple across the entire system and>
 > effectively take down all other clients of that cluster is a major 
problem.>
 > We obviously want to avoid problems like an OOM error or a big GC 
pause in>
 > the client application but if these things happen we need to be able to>
 > absorb these gracefully and limit the blast radius to just that client>
 > node.>
 >