You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Christine Poerschke (BLOOMBERG/ LONDON)" <cp...@bloomberg.net> on 2013/12/23 18:49:07 UTC

indexing delay due to zookeeper election

Hello.

The behaviour we observed was that a zookeeper election took about 2s plus 1.5s for reading the zoo_data snapshot. During this time solr tried to establish connections to any zookeeper in the ensemble but only succeeded once a leader was elected *and* that leader was done reading the snapshot. Solr document updates were slowed down during this time window.

Is this expected to happen during and shortly after elections, that is zookeeper closing existing connections, rejecting new connections and thus stalling solr updates?

Other than avoiding zookeeper elections, are there ways of reducing their impact on solr?

Thanks,

Christine


zookeeper log extract

08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
08:18:56,918 [FileSnap.java:83] Reading snapshot ...
...
08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection from ...
08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client ... (no session established for client)
...
08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...


solr log extract

08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from server sessionid ... likely server has closed socket, closing socket connection and attempting reconnect
08:18:55,068 [ConnectionManager.java:72] Watcher org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent state:Disconnected type:None path:null path:null type:None
08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
...
08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ... 
08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from server sessionid ... likely server has closed socket, closing socket connection and attempting reconnect
...
08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:57,641 [ClientCnxn.java:849] Socket connection established to ... 
08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:58,353 [ClientCnxn.java:849] Socket connection established to ... 
08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on server ... sessionid = ..., negotiated timeout = ...
08:18:58,751 ... [ConnectionManager.java:72] Watcher
org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection
Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent state:SyncConnected type:None path:null path:null type:None


Re: indexing delay due to zookeeper election

Posted by Mark Miller <ma...@gmail.com>.
Interesting stuff! This is expected but not really something I have thought
about yet.

Can you file a JIRA issue? I think we want to try and tackle this with code.

We currently reject updates when we lose our connection to ZooKeeper. We
are pretty strict about this. I think you could reasonably be less strict
(eg not start rejecting updates for a few seconds).

- Mark


On Mon, Dec 23, 2013 at 12:49 PM, Christine Poerschke (BLOOMBERG/ LONDON) <
cpoerschke@bloomberg.net> wrote:

> Hello.
>
> The behaviour we observed was that a zookeeper election took about 2s plus
> 1.5s for reading the zoo_data snapshot. During this time solr tried to
> establish connections to any zookeeper in the ensemble but only succeeded
> once a leader was elected *and* that leader was done reading the snapshot.
> Solr document updates were slowed down during this time window.
>
> Is this expected to happen during and shortly after elections, that is
> zookeeper closing existing connections, rejecting new connections and thus
> stalling solr updates?
>
> Other than avoiding zookeeper elections, are there ways of reducing their
> impact on solr?
>
> Thanks,
>
> Christine
>
>
> zookeeper log extract
>
> 08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
> 08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
> 08:18:56,918 [FileSnap.java:83] Reading snapshot ...
> ...
> 08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection
> from ...
> 08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session
> 0x0 due to java.io.IOException: ZooKeeperServer not running
> 08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client
> ... (no session established for client)
> ...
> 08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...
>
>
> solr log extract
>
> 08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from
> server sessionid ... likely server has closed socket, closing socket
> connection and attempting reconnect
> 08:18:55,068 [ConnectionManager.java:72] Watcher
> org.apache.solr.common.cloud.ConnectionManager@...
> name:ZooKeeperConnection Watcher:host1:port1,host2:port2,host3:port3,...
> got event WatchedEvent state:Disconnected type:None path:null path:null
> type:None
> 08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
> ...
> 08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from
> server sessionid ... likely server has closed socket, closing socket
> connection and attempting reconnect
> ...
> 08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:57,641 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,353 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on
> server ... sessionid = ..., negotiated timeout = ...
> 08:18:58,751 ... [ConnectionManager.java:72] Watcher
> org.apache.solr.common.cloud.ConnectionManager@...
> name:ZooKeeperConnection
> Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent
> state:SyncConnected type:None path:null path:null type:None
>
>


-- 
- Mark