You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Avishai Ish-Shalom <av...@fewbytes.com> on 2014/03/06 21:10:44 UTC

hung threads and CLOSE_WAIT sockets

Hi,

We've had a strange mishap with a solr cloud cluster (version 4.5.1) where
we observed high search latency. The problem appears to develop over
several hours until such point where the entire cluster stopped responding
properly.

After investigation we found that the number of threads (both solr and
jetty) gradually rose over several hours until it hit a the maximum allowed
at which point the cluster stopped responding properly. After restarting
several nodes the number of threads dropped and the cluster started
responding again.
We've examined nodes that were not restarted and found a high number of
CLOSE_WAIT sockets held by the solr process; these sockets were using a
random local port and 8983 remote port - meaning they were outgoing
connections. a thread dump did not show a large number of solr threads and
we were unable to determine which thread(s) is holding these sockets.

has anyone else encountered such a situation?

Regards,
Avishai

Re: hung threads and CLOSE_WAIT sockets

Posted by Mark Miller <ma...@gmail.com>.
On Mar 7, 2014, at 3:11 AM, Avishai Ish-Shalom <av...@fewbytes.com> wrote:

> SOLR-5216 


Yes, that is the one.

- Mark

http://about.me/markrmiller


Re: hung threads and CLOSE_WAIT sockets

Posted by Avishai Ish-Shalom <av...@fewbytes.com>.
SOLR-5216 ?


On Fri, Mar 7, 2014 at 12:13 AM, Mark Miller <ma...@gmail.com> wrote:

> It sounds like the distributed update deadlock issue.
>
> It's fixed in 4.6.1 and 4.7.
>
> - Mark
>
> http://about.me/markrmiller
>
> On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom <av...@fewbytes.com>
> wrote:
>
> > Hi,
> >
> > We've had a strange mishap with a solr cloud cluster (version 4.5.1)
> where
> > we observed high search latency. The problem appears to develop over
> > several hours until such point where the entire cluster stopped
> responding
> > properly.
> >
> > After investigation we found that the number of threads (both solr and
> > jetty) gradually rose over several hours until it hit a the maximum
> allowed
> > at which point the cluster stopped responding properly. After restarting
> > several nodes the number of threads dropped and the cluster started
> > responding again.
> > We've examined nodes that were not restarted and found a high number of
> > CLOSE_WAIT sockets held by the solr process; these sockets were using a
> > random local port and 8983 remote port - meaning they were outgoing
> > connections. a thread dump did not show a large number of solr threads
> and
> > we were unable to determine which thread(s) is holding these sockets.
> >
> > has anyone else encountered such a situation?
> >
> > Regards,
> > Avishai
>
>

Re: hung threads and CLOSE_WAIT sockets

Posted by Mark Miller <ma...@gmail.com>.
It sounds like the distributed update deadlock issue.

It’s fixed in 4.6.1 and 4.7.

- Mark

http://about.me/markrmiller

On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom <av...@fewbytes.com> wrote:

> Hi,
> 
> We've had a strange mishap with a solr cloud cluster (version 4.5.1) where
> we observed high search latency. The problem appears to develop over
> several hours until such point where the entire cluster stopped responding
> properly.
> 
> After investigation we found that the number of threads (both solr and
> jetty) gradually rose over several hours until it hit a the maximum allowed
> at which point the cluster stopped responding properly. After restarting
> several nodes the number of threads dropped and the cluster started
> responding again.
> We've examined nodes that were not restarted and found a high number of
> CLOSE_WAIT sockets held by the solr process; these sockets were using a
> random local port and 8983 remote port - meaning they were outgoing
> connections. a thread dump did not show a large number of solr threads and
> we were unable to determine which thread(s) is holding these sockets.
> 
> has anyone else encountered such a situation?
> 
> Regards,
> Avishai