You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ryan Cutter <ry...@gmail.com> on 2014/09/29 17:24:23 UTC

Overseer cannot talk to ZK

Solr 4.7.2 went down during a period of little activity.  Wondering if
anyone has an idea about what's going on, thanks!

INFO  - 2014-09-26 15:35:00.152;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: null state: Disconnected type None

then eventually:

WARN  - 2014-09-26 15:35:00.377;
org.apache.solr.cloud.OverseerCollectionProcessor; Overseer cannot talk to
ZK

and:

WARN  - 2014-09-26 15:35:00.454;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Solr cannot talk to ZK,
exiting Overseer main queue loop
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
        at
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:257)
        at
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:254)
        at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
        at
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:254)
        at
org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89)
        at
org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:411)
        at
org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:391)
        at
org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:173)
        at java.lang.Thread.run(Thread.java:662)
as well as:
ERROR - 2014-09-26 15:35:21.025; org.apache.solr.common.SolrException;
There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props
        at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:934)
        at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:898)
        at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1422)
        at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:370)
        at
org.apache.solr.cloud.ZkController.access$000(ZkController.java:87)
        at
org.apache.solr.cloud.ZkController$1.command(ZkController.java:222)
        at
org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/collection1/leaders/shard1
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:274)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:271)
        at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
        at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:271)
        at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:912)
        ... 6 more

In ZK's log at about the same time:

2014-09-26 15:35:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@334]
- Expiring session 0x144e09d7b910008, timeout of 30000ms exceeded
2014-09-26 15:35:00,000 [myid:] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x144e09d7b910007, likely client has closed socket
        at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:662)

as well as:

2014-09-26 15:35:00,534 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperException when
processing sessionid:0x144e09d7b91000a type:delete cxid:0x8 zxid:0xbdd
txntype:-1 reqpath:n/a Error Path:/overseer_elect/leader
Error:KeeperErrorCode = NoNode for /overseer_elect/leader
2014-09-26 15:35:00,572 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperException when
processing sessionid:0x144e09d7b91000a type:create cxid:0xd zxid:0xbdf
txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
NodeExists for /overseer

Re: Overseer cannot talk to ZK

Posted by Ryan Cutter <ry...@gmail.com>.
Sorry, I believe this can be disregarded.  There were changes made to
system time that likely caused this state.  Apologies, Ryan

On Mon, Sep 29, 2014 at 8:24 AM, Ryan Cutter <ry...@gmail.com> wrote:

> Solr 4.7.2 went down during a period of little activity.  Wondering if
> anyone has an idea about what's going on, thanks!
>
> INFO  - 2014-09-26 15:35:00.152;
> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
> fired on path: null state: Disconnected type None
>
> then eventually:
>
> WARN  - 2014-09-26 15:35:00.377;
> org.apache.solr.cloud.OverseerCollectionProcessor; Overseer cannot talk to
> ZK
>
> and:
>
> WARN  - 2014-09-26 15:35:00.454;
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Solr cannot talk to ZK,
> exiting Overseer main queue loop
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /overseer/queue
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
>         at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:257)
>         at
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:254)
>         at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>         at
> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:254)
>         at
> org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89)
>         at
> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:411)
>         at
> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:391)
>         at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:173)
>         at java.lang.Thread.run(Thread.java:662)
> as well as:
> ERROR - 2014-09-26 15:35:21.025; org.apache.solr.common.SolrException;
> There was a problem finding the leader in
> zk:org.apache.solr.common.SolrException: Could not get leader props
>         at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:934)
>         at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:898)
>         at
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1422)
>         at
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:370)
>         at
> org.apache.solr.cloud.ZkController.access$000(ZkController.java:87)
>         at
> org.apache.solr.cloud.ZkController$1.command(ZkController.java:222)
>         at
> org.apache.solr.common.cloud.ConnectionManager$1$1.run(ConnectionManager.java:166)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/collection1/leaders/shard1
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>         at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:274)
>         at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:271)
>         at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>         at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:271)
>         at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:912)
>         ... 6 more
>
> In ZK's log at about the same time:
>
> 2014-09-26 15:35:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@334]
> - Expiring session 0x144e09d7b910008, timeout of 30000ms exceeded
> 2014-09-26 15:35:00,000 [myid:] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x144e09d7b910007, likely client has closed socket
>         at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>         at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
>         at java.lang.Thread.run(Thread.java:662)
>
> as well as:
>
> 2014-09-26 15:35:00,534 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperException
> when processing sessionid:0x144e09d7b91000a type:delete cxid:0x8 zxid:0xbdd
> txntype:-1 reqpath:n/a Error Path:/overseer_elect/leader
> Error:KeeperErrorCode = NoNode for /overseer_elect/leader
> 2014-09-26 15:35:00,572 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperException
> when processing sessionid:0x144e09d7b91000a type:create cxid:0xd zxid:0xbdf
> txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
> NodeExists for /overseer
>
>