You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2015/04/24 02:17:29 UTC

16K threads used up, Solr 4.10 doing nothing.

A client had a Solr instance doing absolutely nothing for a month.
Literally a test system that was idle. When they tried to finally do
something, they couldn't. That Solr process had over 16K threads
operating. No indexing, no querying, was going on, nada.

They investigated and found that the Solr couldn't connect to
Zookeeper and had a zillion threads (well, actually about 16K which
was their limit) with the stack trace at the end.

Admittedly the client had a weird situation where Solr couldn't talk
to Zookeeper, and admittedly Solr can't do much if it can't talk to
ZK.

Even so this seems odd. I'm also a bit worried that if they fix the
reason Solr couldn't talk to ZK (maybe it's a firewall issue? plug the
cable back in?) that when all those threads suddenly get to do their
thing what will happen? Not to mention any effects on other processes.

Anyway, if this is worth a JIRA I can create one if there aren't any already.

Solr 4.10

Here's the stack trace:

"main-EventThread" daemon prio=10 tid=0x000000000a38d000 nid=0xeb51 in
Object.wait() [0x00007e8c15c89000]

   java.lang.Thread.State: TIMED_WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:215)
        - locked <0x00000003c5306fc0> (a
org.apache.solr.common.cloud.ConnectionManager)
        at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:138)
        at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:56)
        at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

Erick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: 16K threads used up, Solr 4.10 doing nothing.

Posted by Erick Erickson <er...@gmail.com>.

I'm pretty sure they hadn't. And the stack traces in the jstack
(excerpt above) isn't indicative of the facet code.

The critical bit here is that the server was just sitting around for a
long time, completely and totally idle (dev system so it's only used
intermittently).

Erick

On Sat, Apr 25, 2015 at 2:24 AM, Dmitry Kan <dm...@gmail.com> wrote:
> Hi Erick,
>
> Do you know, whether the client used facet.threads, even once?
>
> There's a bug in solr with threaded faceting, that makes use of unlimited
> amount of threads.
>
> Regards,
> Dmitry
>
> On 24 Apr 2015 3:17 am, "Erick Erickson" <er...@gmail.com> wrote:
>>
>> A client had a Solr instance doing absolutely nothing for a month.
>> Literally a test system that was idle. When they tried to finally do
>> something, they couldn't. That Solr process had over 16K threads
>> operating. No indexing, no querying, was going on, nada.
>>
>> They investigated and found that the Solr couldn't connect to
>> Zookeeper and had a zillion threads (well, actually about 16K which
>> was their limit) with the stack trace at the end.
>>
>> Admittedly the client had a weird situation where Solr couldn't talk
>> to Zookeeper, and admittedly Solr can't do much if it can't talk to
>> ZK.
>>
>> Even so this seems odd. I'm also a bit worried that if they fix the
>> reason Solr couldn't talk to ZK (maybe it's a firewall issue? plug the
>> cable back in?) that when all those threads suddenly get to do their
>> thing what will happen? Not to mention any effects on other processes.
>>
>> Anyway, if this is worth a JIRA I can create one if there aren't any
>> already.
>>
>> Solr 4.10
>>
>> Here's the stack trace:
>>
>> "main-EventThread" daemon prio=10 tid=0x000000000a38d000 nid=0xeb51 in
>> Object.wait() [0x00007e8c15c89000]
>>
>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>        at java.lang.Object.wait(Native Method)
>>        at
>> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:215)
>>         - locked <0x00000003c5306fc0> (a
>> org.apache.solr.common.cloud.ConnectionManager)
>>         at
>> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:138)
>>         at
>> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:56)
>>         at
>> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
>>         at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
>>         at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>>
>> Erick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: 16K threads used up, Solr 4.10 doing nothing.

Posted by Dmitry Kan <dm...@gmail.com>.

Hi Erick,

Do you know, whether the client used facet.threads, even once?

There's a bug in solr with threaded faceting, that makes use of unlimited
amount of threads.

Regards,
Dmitry
On 24 Apr 2015 3:17 am, "Erick Erickson" <er...@gmail.com> wrote:

> A client had a Solr instance doing absolutely nothing for a month.
> Literally a test system that was idle. When they tried to finally do
> something, they couldn't. That Solr process had over 16K threads
> operating. No indexing, no querying, was going on, nada.
>
> They investigated and found that the Solr couldn't connect to
> Zookeeper and had a zillion threads (well, actually about 16K which
> was their limit) with the stack trace at the end.
>
> Admittedly the client had a weird situation where Solr couldn't talk
> to Zookeeper, and admittedly Solr can't do much if it can't talk to
> ZK.
>
> Even so this seems odd. I'm also a bit worried that if they fix the
> reason Solr couldn't talk to ZK (maybe it's a firewall issue? plug the
> cable back in?) that when all those threads suddenly get to do their
> thing what will happen? Not to mention any effects on other processes.
>
> Anyway, if this is worth a JIRA I can create one if there aren't any
> already.
>
> Solr 4.10
>
> Here's the stack trace:
>
> "main-EventThread" daemon prio=10 tid=0x000000000a38d000 nid=0xeb51 in
> Object.wait() [0x00007e8c15c89000]
>
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        at
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:215)
>         - locked <0x00000003c5306fc0> (a
> org.apache.solr.common.cloud.ConnectionManager)
>         at
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:138)
>         at
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:56)
>         at
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:132)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>
> Erick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>