You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Sunil.Srinivasan" <Su...@target.com> on 2015/06/17 17:04:47 UTC

Solr/ZK issues

Hi Folks,

We are seeing the following in our logs on our Solr nodes after which Solr nodes go into multiple full GCs  and eventually runs out of heap. We saw this ticket - https://issues.apache.org/jira/browse/SOLR-7338 - wondering that’s the one causing it.  We are currently on 4.10.0

INFO  - 2015-06-17 08:06:28.163; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@422f41e9 name:ZooKeeperConnection Watcher:got event WatchedEvent state:Expired type:None path:null path:null type:None
INFO  - 2015-06-17 08:06:28.163; org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper session was expired. Attempting to reconnect to recover relationship with ZooKeeper...
INFO  - 2015-06-17 08:06:28.166; org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired - starting a new one...
INFO  - 2015-06-17 08:06:28.171; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper
INFO  - 2015-06-17 08:06:28.177; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@422f41e9 name:ZooKeeperConnection Watcher: got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
INFO  - 2015-06-17 08:06:28.177; org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper
INFO  - 2015-06-17 08:06:28.178; org.apache.solr.common.cloud.ConnectionManager$1; Connection with ZooKeeper reestablished.
INFO  - 2015-06-17 08:06:28.178; org.apache.solr.common.cloud.DefaultConnectionStrategy; Reconnected to ZooKeeper
INFO  - 2015-06-17 08:06:28.179; org.apache.solr.common.cloud.ConnectionManager; Connected:true
WARN  - 2015-06-17 08:06:28.179; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=category coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=category_shadow coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=rules_shadow coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=rules coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=catalog_shadow coreNodeName=core_node2
WARN  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=catalog coreNodeName=core_node2
INFO  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; publishing core=category state=down collection=category
INFO  - 2015-06-17 08:06:28.180; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; publishing core=category_shadow state=down collection=category_shadow
INFO  - 2015-06-17 08:06:28.186; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; publishing core=rules_shadow state=down collection=rules_shadow
INFO  - 2015-06-17 08:06:28.189; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; publishing core=rules state=down collection=rules
INFO  - 2015-06-17 08:06:28.191; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; publishing core=catalog_shadow state=down collection=catalog_shadow
INFO  - 2015-06-17 08:06:28.193; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; publishing core=catalog state=down collection=catalog
INFO  - 2015-06-17 08:06:28.194; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property
INFO  - 2015-06-17 08:06:28.198; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state.
o wait for leader to see down state.
WARN  - 2015-06-17 08:07:51.188; org.apache.solr.cloud.ZkController;
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/rules_shadow/leader_elect/shard1/election
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
        at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:290)
        at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:287)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:287)
        at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:363)
        at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237)
        at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
ERROR - 2015-06-17 08:07:51.190; org.apache.solr.common.SolrException; There was a problem finding the leader in zk:java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153)
        at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:307)
        at org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:304)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:304)
        at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:928)
        at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:914)
        at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1514)
        at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:386)
        at org.apache.solr.cloud.ZkController.access$000(ZkController.java:89)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:237)
        at org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)

INFO  - 2015-06-17 08:07:51.220; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state.
INFO  - 2015-06-17 08:07:51.240; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state.
INFO  - 2015-06-17 08:07:51.258; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state.
INFO  - 2015-06-17 08:07:51.274; org.apache.solr.cloud.ZkController; Replica core_node2 NOT in leader-initiated recovery, need to wait for leader to see down state.
INFO  - 2015-06-17 08:07:51.284; org.apache.solr.cloud.ElectionContext; canceling election /overseer_elect/election/93424944611198761-<<<>>>>:8080_solr-n_0000000286


Any pointers here?

Thanks,
Sunil