You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2012/12/12 18:52:23 UTC

SolrCloud: CloudSolrServer Zookeeper disconnects and re-connects with heavy memory usage consumption.

Hello everyone.

I have developed and stand alone WebApp with a custom API that dispatches
queries to SolrCloud using CloudSolrServer implementation to do that. I´m
testing with a single Zookeeper instance installed in an Amazon instance.
Solr servers are deployed in two Amazon instances and I have one intance
more which contains the custom search API engine that I told before. I´m
using a *30000ms *of Zookeeper *zkConnectdTimeout *and *zkClientTimeout*.


With that scenario I´ve noticed that everything works fine with
CloudSolrServer but frequently I see loggin traces as the following:


*2012-12-12 17:35:41,932 30486688
[http-bio-8080-exec-7-SendThread(amazon-dns:9000)] INFO
 org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
heard from server in 67044ms for sessionid 0x13b8a4218720055, closing
socket connection and attempting reconnect*
*
*
*2012-12-12 17:35:41,996 30486752
[http-bio-8080-exec-8-SendThread(amazon-dns:9000)] INFO
 org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
heard from server in 67301ms for sessionid 0x13b8a4218720052, closing
socket connection and attempting reconnect*
*
*
*2012-12-12 17:35:42,077 30522458
[pool-1-thread-1-SendThread(amazon-dns:9000)] INFO
 org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
heard from server in 67299ms for sessionid 0x13b8a4218720053, closing
socket connection and attempting reconnect*
*
*
*2012-12-12 17:35:42,286 30487042 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager  - Watcher
org.apache.solr.common.cloud.ConnectionManager@20c5f562name:ZooKeeperConnection
Watcher:amazon-dns:9000 got event WatchedEvent
state:Disconnected type:None path:null path:null type:None*



The message is clear: nothing have been heard from the server in 67seconds.
It´s strange, because Zookeeper Amazon instance is up and Zookeeper the
service is up. Also a connection problem would be extremely strange because
communication between Amazon instances is asumed to be always on.

After that, I start seeing logging traces as:


*2012-12-12 17:37:15,501 30580257 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.ZkStateReader  - Updating cluster state from
ZooKeeper...*
*
*
*2012-12-12 17:37:15,510 30615891 [pool-1-thread-2-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager  - Waiting for client to
connect to ZooKeeper*
*
*
*2012-12-12 17:37:15,512 30580268 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.DefaultConnectionStrategy  - Reconnected to
ZooKeeper*
*
*
*2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager  - Connected:true*
*
*
*2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.zookeeper.ClientCnxn  - EventThread shut down*



But the *big problem* is that when this kind of
disconnnect-reconnect-disconnect-reconnect behavior happens the WebApp
seems to be blocked (it looks like CloudSolrServer Zookeeper status update
is blocking) and I continue receiven search queries. The result is that
memory increases and increases and the search engine Web App module gets
almost blocked. It seems that this kind of Zookeeper status update is
heavy-memory-consumer and also blocking.

Does anyone experienced a behavior like that? Any tips or suggestions?


Thank you very much in advance for your help.

Regards,

-- 

- Luis Cappa

Re: SolrCloud: CloudSolrServer Zookeeper disconnects and re-connects with heavy memory usage consumption.

Posted by Luis Cappa Banda <lu...@gmail.com>.
I´ve read the following in SolrCloud FAQ:

*"Q:* I'm seeing lot's of session timeout exceptions - what to do?

   -

   *A:* Try raising the ZooKeeper
<http://wiki.apache.org/solr/ZooKeeper> session
   timeout by editing solr.xml - see the zkClientTimeout attribute. The
   minimum session timeout is 2 times your
ZooKeeper<http://wiki.apache.org/solr/ZooKeeper> defined
   tickTime. The maximum is 20 times the tickTime. The default tickTime is 2
   seconds. You should avoiding raising this for no good reason, but it should
   be high enough that you don't see a lot of false session timeouts due to
   load, network lag, or garbage collection pauses. Some environments might
   need to go as high as 30-60 seconds."


Any suggestion or recommendation? What about increasing tickTime to 10
seconds with zkClientTimeout = 30 seconds?


2012/12/12 Luis Cappa Banda <lu...@gmail.com>

> Hello everyone.
>
> I have developed and stand alone WebApp with a custom API that dispatches
> queries to SolrCloud using CloudSolrServer implementation to do that. I´m
> testing with a single Zookeeper instance installed in an Amazon instance.
> Solr servers are deployed in two Amazon instances and I have one intance
> more which contains the custom search API engine that I told before. I´m
> using a *30000ms *of Zookeeper *zkConnectdTimeout *and *zkClientTimeout*.
>
>
> With that scenario I´ve noticed that everything works fine with
> CloudSolrServer but frequently I see loggin traces as the following:
>
>
> *2012-12-12 17:35:41,932 30486688
> [http-bio-8080-exec-7-SendThread(amazon-dns:9000)] INFO
>  org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
> heard from server in 67044ms for sessionid 0x13b8a4218720055, closing
> socket connection and attempting reconnect*
> *
> *
> *2012-12-12 17:35:41,996 30486752
> [http-bio-8080-exec-8-SendThread(amazon-dns:9000)] INFO
>  org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
> heard from server in 67301ms for sessionid 0x13b8a4218720052, closing
> socket connection and attempting reconnect*
> *
> *
> *2012-12-12 17:35:42,077 30522458
> [pool-1-thread-1-SendThread(amazon-dns:9000)] INFO
>  org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
> heard from server in 67299ms for sessionid 0x13b8a4218720053, closing
> socket connection and attempting reconnect*
> *
> *
> *2012-12-12 17:35:42,286 30487042 [http-bio-8080-exec-7-EventThread] INFO
>  org.apache.solr.common.cloud.ConnectionManager  - Watcher
> org.apache.solr.common.cloud.ConnectionManager@20c5f562name:ZooKeeperConnection Watcher:amazon-dns:9000 got event WatchedEvent
> state:Disconnected type:None path:null path:null type:None*
>
>
>
> The message is clear: nothing have been heard from the server in
> 67seconds. It´s strange, because Zookeeper Amazon instance is up and
> Zookeeper the service is up. Also a connection problem would be extremely
> strange because communication between Amazon instances is asumed to be
> always on.
>
> After that, I start seeing logging traces as:
>
>
> *2012-12-12 17:37:15,501 30580257 [http-bio-8080-exec-7-EventThread] INFO
>  org.apache.solr.common.cloud.ZkStateReader  - Updating cluster state from
> ZooKeeper...*
> *
> *
> *2012-12-12 17:37:15,510 30615891 [pool-1-thread-2-EventThread] INFO
>  org.apache.solr.common.cloud.ConnectionManager  - Waiting for client to
> connect to ZooKeeper*
> *
> *
> *2012-12-12 17:37:15,512 30580268 [http-bio-8080-exec-7-EventThread] INFO
>  org.apache.solr.common.cloud.DefaultConnectionStrategy  - Reconnected to
> ZooKeeper*
> *
> *
> *2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO
>  org.apache.solr.common.cloud.ConnectionManager  - Connected:true*
> *
> *
> *2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO
>  org.apache.zookeeper.ClientCnxn  - EventThread shut down*
>
>
>
> But the *big problem* is that when this kind of
> disconnnect-reconnect-disconnect-reconnect behavior happens the WebApp
> seems to be blocked (it looks like CloudSolrServer Zookeeper status update
> is blocking) and I continue receiven search queries. The result is that
> memory increases and increases and the search engine Web App module gets
> almost blocked. It seems that this kind of Zookeeper status update is
> heavy-memory-consumer and also blocking.
>
> Does anyone experienced a behavior like that? Any tips or suggestions?
>
>
> Thank you very much in advance for your help.
>
> Regards,
>
> --
>
> - Luis Cappa
>
>


-- 

- Luis Cappa