You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zahra121 <z....@gmail.com> on 2018/02/27 08:36:56 UTC

Changing Leadership in SolrCloud

Suppose I have a node which is a leader in SolrCloud.

When I block this leader's SolrCloud and Zookeeper ports by the command
"firewall-cmd --remove-port=<SolrPort>/tcp --permanent", the leader does not
change automatically and this leader status remains active in solr admin UI.

Thus, I decided to change the leader manually. I tried REBALANCELEADERS and
ADDROLE commands in solrCloud, however the leader did not change!

How can I manually change the leader if the firewall blocks the SolrCloud
ports from being listened?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Changing Leadership in SolrCloud

Posted by Zahra Aminolroaya <z....@gmail.com>.
The leader status is active. My main question is that how I can change the
leader in SolrCloud.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Changing Leadership in SolrCloud

Posted by Amin Raeiszadeh <am...@gmail.com>.
i don't understand your problem clearly but solr admin ui has some bugs.
to check your cloud nodes state use the CLUSTERSTATUS command:

/admin/collections?action=CLUSTERSTATUS
in some cases your command was done but you can't see in admin ui.

On Tue, Feb 27, 2018 at 12:49 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 2/27/2018 1:36 AM, zahra121 wrote:
>
>> Suppose I have a node which is a leader in SolrCloud.
>>
>> When I block this leader's SolrCloud and Zookeeper ports by the command
>> "firewall-cmd --remove-port=<SolrPort>/tcp --permanent", the leader does
>> not
>> change automatically and this leader status remains active in solr admin
>> UI.
>>
>> Thus, I decided to change the leader manually. I tried REBALANCELEADERS
>> and
>> ADDROLE commands in solrCloud, however the leader did not change!
>>
>
> I am not completely familiar with how SolrCloud handles down servers, but
> I don't think it proactively does any kind of "ping" to make sure they're
> still up.  Probably you would need to send a request that SolrCloud tries
> to send to the down server, so that the cluster can notice that Solr is
> down and change the clusterstate.
>
> ZK should be a lot more responsive to changes like that, because it DOES
> use a ping-like mechanism to see if servers are up.  Solr's admin UI does
> not have any visibility into which ZK server is the leader, though -- so
> you can't see the results of blocking a ZK server unless you look at the ZK
> log.
>
> Thanks,
> Shawn
>
>

Re: Changing Leadership in SolrCloud

Posted by Zahra Aminolroaya <z....@gmail.com>.
Thanks Shawn for the reply. when I try to add a document to solr I get the
"no route to host" exception. this means that SolrCloud is aware of the
blocking ports; However, zookeeper does not automatically change the leader!  



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Changing Leadership in SolrCloud

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/27/2018 1:36 AM, zahra121 wrote:
> Suppose I have a node which is a leader in SolrCloud.
>
> When I block this leader's SolrCloud and Zookeeper ports by the command
> "firewall-cmd --remove-port=<SolrPort>/tcp --permanent", the leader does not
> change automatically and this leader status remains active in solr admin UI.
>
> Thus, I decided to change the leader manually. I tried REBALANCELEADERS and
> ADDROLE commands in solrCloud, however the leader did not change!

I am not completely familiar with how SolrCloud handles down servers, 
but I don't think it proactively does any kind of "ping" to make sure 
they're still up.  Probably you would need to send a request that 
SolrCloud tries to send to the down server, so that the cluster can 
notice that Solr is down and change the clusterstate.

ZK should be a lot more responsive to changes like that, because it DOES 
use a ping-like mechanism to see if servers are up.  Solr's admin UI 
does not have any visibility into which ZK server is the leader, though 
-- so you can't see the results of blocking a ZK server unless you look 
at the ZK log.

Thanks,
Shawn


Re: Changing Leadership in SolrCloud

Posted by Zahra Aminolroaya <z....@gmail.com>.
Dear Mr. Shalin,

Yes. I mean "state" in Cluster State API and UI.

Let me explain what happened previous days by detail:

Think I have Collection A distributed across node1 (the leader), node2 and
node 3. 

I used the following command to block node 1 solr and zookeeper ports from
being listend:
(the ports are 2888/3888/2181 and 4239)

firewall-cmd --remove-port=<node1Port>/tcp --permanent

node 1 state is still "active", and leader is "true" in response of Cluster
State API.

the Solr logs of node 1 is like below:


org.apache.solr.common.SolrException: ClusterState says we are the leader
(<node1IP>:4239/solr/collectionA_shard2_replica1), but locally we don't
think so. Request came from <node2IP>:4239/solr/collectionA_shard4_replica3/
	at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:658)
	at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:418)
	at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:346)
	at ......

node 2 error in solr logs is:

forwarding update to <node1IP>:4239/solr/collection A_shard5_replica1/
failed - retrying ... retries: 24 add{,id=121,commitWithin=1000}
params:update.chain=add-unknown-fields-to-the-schema&update.distrib=TOLEADER&distrib.from=node2:4239/solr/collection
A_shard2_replica2/
rsp:503:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at node1IP:4239/solr/collection A_shard5_replica1: Service
Unavailable

node 3 error in solr logs is like node 2 error.

------------------------------------------------------------------------------------------------

Unforunately, today I found that my node 4 and node 5 from collection B and
C became down. The  logs errors were like below:

2018-03-01 00:26:46.133 ERROR
(zkCallback-4-thread-28-processing-n:node4IP:4239_solr-EventThread) [   ]
o.a.s.c.ZkController :org.apache.solr.common.SolrException: There was a
problem making a request to the leader
	at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1551)
	at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:476)
	at org.apache.solr.cloud.ZkController.access$500(ZkController.java:121)
	at org.apache.solr.cloud.ZkController$1.command(ZkController.java:338)
	at
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
	at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
	at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
	at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

and 

Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/Collection B/state.json
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
	at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
	at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
	at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
	at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
	at
org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
	at
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1096)
	... 39 more


I think these errors are related to blocking the ports of node 1.

I wonder if you help me.

Regards,
Zahra









--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Changing Leadership in SolrCloud

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
When you say it is active, I presume you mean the "state" as returned by
the Cluster Status API or as shown on the UI. But is it still the leader?
Are you sure the firewall rules are correct? Do you see disconnected or
session expiry exceptions in the leader logs?

On Wed, Feb 28, 2018 at 12:21 PM, Zahra Aminolroaya <z.aminolroaya@gmail.com
> wrote:

> Thanks Shalin. our "zkClientTimeout" is 30000, so the leader should be
> changed by now; However, the previous leader is still active.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Changing Leadership in SolrCloud

Posted by Zahra Aminolroaya <z....@gmail.com>.
Thanks Shalin. our "zkClientTimeout" is 30000, so the leader should be
changed by now; However, the previous leader is still active.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Changing Leadership in SolrCloud

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
When you block communication between Zookeeper and the leader, the ZK
client inside Solr will disconnect and its session will expire after the
session timeout. At this point a new leader should be elected
automatically. The default timeout is 30 seconds. You should be able to see
the value in solr.xml property named "zkClientTimeout".

On Tue, Feb 27, 2018 at 2:06 PM, zahra121 <z....@gmail.com> wrote:

> Suppose I have a node which is a leader in SolrCloud.
>
> When I block this leader's SolrCloud and Zookeeper ports by the command
> "firewall-cmd --remove-port=<SolrPort>/tcp --permanent", the leader does
> not
> change automatically and this leader status remains active in solr admin
> UI.
>
> Thus, I decided to change the leader manually. I tried REBALANCELEADERS and
> ADDROLE commands in solrCloud, however the leader did not change!
>
> How can I manually change the leader if the firewall blocks the SolrCloud
> ports from being listened?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Changing Leadership in SolrCloud

Posted by Florian Gleixner <fl...@redflo.de>.
On 27.02.2018 09:36, zahra121 wrote:
> Suppose I have a node which is a leader in SolrCloud.
> 
> When I block this leader's SolrCloud and Zookeeper ports by the command
> "firewall-cmd --remove-port=<SolrPort>/tcp --permanent", the leader does not
> change automatically and this leader status remains active in solr admin UI.
> 
> Thus, I decided to change the leader manually. I tried REBALANCELEADERS and
> ADDROLE commands in solrCloud, however the leader did not change!
> 
> How can I manually change the leader if the firewall blocks the SolrCloud
> ports from being listened?
> 

From the manpage of firewall-cmd:

---------------------------------
--permanent

        The permanent option --permanent can be used to set options
permanently. These changes are not effective immediately, only after
service restart/reload or system reboot. Without the --permanent option,
a change will only be part of the runtime configuration.
---------------------------------

So you should also apply it to the runtime configuration. Also note,
that this is a stateful firewall. So rules basically apply to connection
establishment. Changes in firewall rules probably do not interrupt
running connections. You should check that with tcpdump.