You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Elodie Sannier <el...@kelkoo.fr> on 2013/12/09 16:39:47 UTC
SolrCloud 4.6.0 - leader election issue
Hello,
I am using SolrCloud 4.6.0 with two shards, two replicas by shard and with two collections.
collection fr_blue:
- shard1 -> server-01 (replica1), server-01 (replica2)
- shard2 -> server-02 (replica1), server-02 (replica2)
collection fr_green:
- shard1 -> server-01 (replica1), server-01 (replica2)
- shard2 -> server-02 (replica1), server-02 (replica2)
If I start the four solr instances without a delay between each start, it is not possible to connect to them and it is not possible to acces to the Solr Admin page.
If I get the clusterstate.json with zkCli, the statuses are:
- "active" for the leaders of the first collection
- "recoverying"for the other replicas of the first collection
- "down" for all replicas of the second collection (no leader)
The logs loop on the following messages :
server-01:
2013-12-09 14:41:28,634 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010063, packet:: clientPath:null serverPath:null finished:false header:: 568,4 replyHeader:: 568,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard1,F response::
2013-12-09 14:41:28,635 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010064, packet:: clientPath:null serverPath:null finished:false header:: 372,4 replyHeader:: 372,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard2,F response::
server-02:
2013-12-09 14:41:51,381 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1014,4 replyHeader:: 1014,483813,0 request:: '/s6fr/overseer_elect/leader,F response:: #7b226964223a2239303837313832313732343837363839342d6463312d76742d6465762d78656e2d30362d766d2d30362e6465762e6463312e6b656c6b6f6f2e6e65743a383038375f736561726368736f6c726e6f646566722d6e5f30303030303030303634227d,s{483632,483632,1386599789203,1386599789203,0,0,0,90871821724876894,104,0,483632}
2013-12-09 14:41:51,383 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1015,8 replyHeader:: 1015,483813,0 request:: '/s6fr/overseer/queue,F response:: v{}
2013-12-09 14:41:51,385 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1016,8 replyHeader:: 1016,483813,0 request:: '/s6fr/overseer/queue-work,F response:: v{}
After 10 minutes, there is a WARN message, a leader is found for the second collection and it is possible to connect to the solr instances:
2013-12-06 21:17:57,635 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:212 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
2013-12-06 21:27:58,719 [coreLoadExecutor-4-thread-2] WARN org.apache.solr.update.PeerSync:handleResponse:322 - PeerSync: core=fr_green url=http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr exception talking to http://dc1-vt-dev-xen-06-vm-06.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/, failed
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: incref on a closed log: tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.0000000000000000001 refcount=1}
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)
at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
2013-12-06 21:27:58,730 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.SyncStrategy:syncReplicas:134 - Leader's attempt to sync with shard failed, moving to the next candidate
2013-12-06 21:27:58,732 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.ShardLeaderElectionContext:runLeaderProcess:224 - Sync was not a success but no one else is active! I am the leader
2013-12-06 21:27:58,736 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.ShardLeaderElectionContext:runLeaderProcess:251 - I am the new leader: http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/ shard1
Is it a bug with the leader election ?
This problem does not occur :
- with the version 4.5.1.
- or if I start the four solr instances with a delay between them (about 15 seconds).
- or if I configure only one collection
- or if I have only one replica by shard
Elodie Sannier
--
Kelkoo
*Elodie Sannier *Software engineer
*E*elodie.sannier@kelkoo.fr <ma...@kelkoo.fr>
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles
RE: SolrCloud 4.6.0 - leader election issue
Posted by Markus Jelsma <ma...@openindex.io>.
I can confirm i've seen this issue as well on trunk, a very recent build.
-----Original message-----
> From:Elodie Sannier <el...@kelkoo.fr>
> Sent: Monday 9th December 2013 16:43
> To: solr-user@lucene.apache.org
> Cc: search5team@lists.kelkoo.com
> Subject: SolrCloud 4.6.0 - leader election issue
>
> Hello,
>
> I am using SolrCloud 4.6.0 with two shards, two replicas by shard and with two collections.
>
> collection fr_blue:
> - shard1 -> server-01 (replica1), server-01 (replica2)
> - shard2 -> server-02 (replica1), server-02 (replica2)
>
> collection fr_green:
> - shard1 -> server-01 (replica1), server-01 (replica2)
> - shard2 -> server-02 (replica1), server-02 (replica2)
>
> If I start the four solr instances without a delay between each start, it is not possible to connect to them and it is not possible to acces to the Solr Admin page.
>
> If I get the clusterstate.json with zkCli, the statuses are:
> - "active" for the leaders of the first collection
> - "recoverying"for the other replicas of the first collection
> - "down" for all replicas of the second collection (no leader)
>
> The logs loop on the following messages :
>
> server-01:
> 2013-12-09 14:41:28,634 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010063, packet:: clientPath:null serverPath:null finished:false header:: 568,4 replyHeader:: 568,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard1,F response::
> 2013-12-09 14:41:28,635 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010064, packet:: clientPath:null serverPath:null finished:false header:: 372,4 replyHeader:: 372,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard2,F response::
>
> server-02:
> 2013-12-09 14:41:51,381 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1014,4 replyHeader:: 1014,483813,0 request:: '/s6fr/overseer_elect/leader,F response:: #7b226964223a2239303837313832313732343837363839342d6463312d76742d6465762d78656e2d30362d766d2d30362e6465762e6463312e6b656c6b6f6f2e6e65743a383038375f736561726368736f6c726e6f646566722d6e5f30303030303030303634227d,s{483632,483632,1386599789203,1386599789203,0,0,0,90871821724876894,104,0,483632}
> 2013-12-09 14:41:51,383 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1015,8 replyHeader:: 1015,483813,0 request:: '/s6fr/overseer/queue,F response:: v{}
> 2013-12-09 14:41:51,385 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1016,8 replyHeader:: 1016,483813,0 request:: '/s6fr/overseer/queue-work,F response:: v{}
>
> After 10 minutes, there is a WARN message, a leader is found for the second collection and it is possible to connect to the solr instances:
>
> 2013-12-06 21:17:57,635 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:212 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
> 2013-12-06 21:27:58,719 [coreLoadExecutor-4-thread-2] WARN org.apache.solr.update.PeerSync:handleResponse:322 - PeerSync: core=fr_green url=http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr exception talking to http://dc1-vt-dev-xen-06-vm-06.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/, failed
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: incref on a closed log: tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.0000000000000000001 refcount=1}
> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)
> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
> at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)
> at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> 2013-12-06 21:27:58,730 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.SyncStrategy:syncReplicas:134 - Leader's attempt to sync with shard failed, moving to the next candidate
> 2013-12-06 21:27:58,732 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.ShardLeaderElectionContext:runLeaderProcess:224 - Sync was not a success but no one else is active! I am the leader
> 2013-12-06 21:27:58,736 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.ShardLeaderElectionContext:runLeaderProcess:251 - I am the new leader: http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/ shard1
>
> Is it a bug with the leader election ?
>
> This problem does not occur :
> - with the version 4.5.1.
> - or if I start the four solr instances with a delay between them (about 15 seconds).
> - or if I configure only one collection
> - or if I have only one replica by shard
>
> Elodie Sannier
>
> --
> Kelkoo
>
> *Elodie Sannier *Software engineer
>
> *E*elodie.sannier@kelkoo.fr <ma...@kelkoo.fr>
> *Y!Messenger* kelkooelodies
> *T* +33 (0)4 56 09 07 55 *M*
> *A* 4/6 Rue des Méridiens 38130 Echirolles
>
>
>
>