You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shawn Heisey (JIRA)" <ji...@apache.org> on 2013/08/09 17:21:48 UTC

[jira] [Commented] (SOLR-5129) If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later

    [ https://issues.apache.org/jira/browse/SOLR-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734891#comment-13734891 ] 

Shawn Heisey commented on SOLR-5129:
------------------------------------

Full report from user on mailing list:

We have 10 Solr4 nodes (5 shards with replication factor 2) and three zookeeper instances. When we bring 10 Solr4 nodes [up] (while all zookeeper instances are down), we see this exception in Solr4 logs. (which makes sense)

{noformat}
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
862352 [main-SendThread(d136274-003.dc.gs.com:2181)] WARN  org.apache.zookeeper.ClientCnxn  ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
{noformat}

When we bring up all zookeeper instances, we stop getting above exception, see this message in log and log stops moving after that:

{noformat}
INFO  - 2013-08-09 15:48:41.447; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@203727c5 name:ZooKeeperConnection Watcher:zk1.test.com:2181,zk2.test.com:2181,zk3.test.com:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
998962 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  ? Watcher org.apache.solr.common.cloud.ConnectionManager@203727c5 name:ZooKeeperConnection Watcher:zk1.test.com:2181,zk2.test.com:2181,qa-zk3.test.com:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
INFO  - 2013-08-09 15:48:41.528; org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status change trigger but we are already closed
999043 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  ? Client->ZooKeeper status change trigger but we are already closed
{noformat}

At this point, we cannot see admin page or query of any solr nodes unless we restart entire cloud and after that everything is great. So we must put checks to make sure that N/2 + 1 zookeeper instances are up before we can bring up any solr nodes.

                
> If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later
> --------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5129
>                 URL: https://issues.apache.org/jira/browse/SOLR-5129
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.4
>            Reporter: Shawn Heisey
>             Fix For: 4.5, 5.0
>
>
> Summary of report from user on mailing list:
> If zookeeper is down when you start Solr nodes, they will not function correctly, even if you later start zookeeper.  While zookeeper is down, the log shows connection failures as expected.  When zookeeper comes back, the log shows:
> INFO  - 2013-08-09 15:48:41.528; org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status change trigger but we are already closed
> At that point, Solr (admin UI and all other functions) does not work, and won't work until it is restarted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org