You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rich Mayfield <ma...@gmail.com> on 2014/04/15 16:58:16 UTC

clusterstate.json does not reflect current state of down versus active

Solr 4.7.1

I am trying to orchestrate a fast restart of a SolrCloud (4.7.1). I was
hoping to use clusterstate.json would reflect the up/down state of each
core as well as whether or not a given core was leader.

clusterstate.json is not kept up to date with what I see going on in my
logs though - I see the leader election process play out. I would expect
that "state" would show "down" immediately for replicas on the node that I
have shut down.

Eventually, after about 30 minutes, all of the leader election processes
complete and clusterstate.json gets updated to the true state for each
replica.

Why does it take so long for clusterstate.json to reflect the correct
state? Is there a better way to determine the state of the system?

(In my case, each node has upwards of 1,000 1-shard collections. There are
two nodes in the cluster - each collection has 2 replicas.)

Thanks much.
rich

Re: clusterstate.json does not reflect current state of down versus active

Posted by Rich Mayfield <ma...@gmail.com>.

Shawn Heisey-4 wrote
> I can envision two issues for you to file in Jira.  The first would be
> an Improvement issue, the second would be a Bug:
> 
> * SolrCloud: Add API to move leader off a Solr instance
> * SolrCloud: LotsOfCollections takes a long time to stabilize

I've created:
* SOLR-5990 - SolrCloud with LotsOfCores does not come up fully
* SOLR-5991 - SolrCloud: Add API to move leader off a Solr instance



--
View this message in context: http://lucene.472066.n3.nabble.com/clusterstate-json-does-not-reflect-current-state-of-down-versus-active-tp4131266p4131588.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: clusterstate.json does not reflect current state of down versus active

Posted by Mark Miller <ma...@gmail.com>.

bq.  before any of Solr gets to do its shutdown sequence
Yeah, this is kind of an open issue. There might be a JIRA for it, but I cannot remember. What we really need is an explicit shutdown call that can be made before stopping jetty so that it’s done gracefully.

-- 
Mark Miller
about.me/markrmiller

On April 16, 2014 at 2:54:15 PM, Daniel Collins (danwcollins@gmail.com) wrote:

We actually have a similar scenario, we have 64 cores per machine, and even  
that sometimes has issues when we shutdown all cores at once. We did start  
to write a "force election for Shard X" tool but it was harder than we  
expected, its still on our to-do list.  

Some context, we run 256 shards spread over 4 machines, and several Solr  
instances per machine (16 cores per instance, 4 instances per machine).  
Our machines regularly go down for maintenance, and shutting down the Solr  
core closes the HTTP interface (at Jetty level) before any of Solr gets to  
do its shutdown sequence: publishing as down, election, etc. Since we run  
an NRT system, that causes all kinds of backlogs in the indexing pipeline  
whilst Solr queues up indexing requests waiting for a valid leader...  
Hence the need for an API to move leadership off the instance, *before* we  
begin shutdown.  

Any insight would be appreciated, we are happy to contribute this back if  
we can get it working!  

On 16 April 2014 15:49, Shawn Heisey <so...@elyograg.org> wrote:  

> On 4/16/2014 8:02 AM, Rich Mayfield wrote:  
> > However there doesn’t appear to be a way to force leadership to/from a  
> > particular replica.  
>  
> I would have expected that doing a core reload on the current leader  
> would force an election and move the leader, but on my 4.2.1 SolrCloud  
> (the only version I have running at the moment) that does not appear to  
> be happening. IMHO we need a way to force a leader change on a shard.  
> An API for "move all leaders currently on this Solr instance" would  
> actually be a very useful feature.  
>  
> I can envision two issues for you to file in Jira. The first would be  
> an Improvement issue, the second would be a Bug:  
>  
> * SolrCloud: Add API to move leader off a Solr instance  
> * SolrCloud: LotsOfCollections takes a long time to stabilize  
>  
> If we can get a dev who specializes in SolrCloud to respond, perhaps  
> they'll have a recommendation about whether these are sensible issues,  
> and if not, what they'd recommend.  
>  
> Thanks,  
> Shawn  
>  
>

Re: clusterstate.json does not reflect current state of down versus active

Posted by Daniel Collins <da...@gmail.com>.

We actually have a similar scenario, we have 64 cores per machine, and even
that sometimes has issues when we shutdown all cores at once.  We did start
to write a "force election for Shard X" tool but it was harder than we
expected, its still on our to-do list.

Some context, we run 256 shards spread over 4 machines, and several Solr
instances per machine (16 cores per instance, 4 instances per machine).
 Our machines regularly go down for maintenance, and shutting down the Solr
core closes the HTTP interface (at Jetty level) before any of Solr gets to
do its shutdown sequence: publishing as down, election, etc.  Since we run
an NRT system, that causes all kinds of backlogs in the indexing pipeline
whilst Solr queues up indexing requests waiting for a valid leader...
 Hence the need for an API to move leadership off the instance, *before* we
begin shutdown.

Any insight would be appreciated, we are happy to contribute this back if
we can get it working!

On 16 April 2014 15:49, Shawn Heisey <so...@elyograg.org> wrote:

> On 4/16/2014 8:02 AM, Rich Mayfield wrote:
> > However there doesn’t appear to be a way to force leadership to/from a
> > particular replica.
>
> I would have expected that doing a core reload on the current leader
> would force an election and move the leader, but on my 4.2.1 SolrCloud
> (the only version I have running at the moment) that does not appear to
> be happening.  IMHO we need a way to force a leader change on a shard.
> An API for "move all leaders currently on this Solr instance" would
> actually be a very useful feature.
>
> I can envision two issues for you to file in Jira.  The first would be
> an Improvement issue, the second would be a Bug:
>
> * SolrCloud: Add API to move leader off a Solr instance
> * SolrCloud: LotsOfCollections takes a long time to stabilize
>
> If we can get a dev who specializes in SolrCloud to respond, perhaps
> they'll have a recommendation about whether these are sensible issues,
> and if not, what they'd recommend.
>
> Thanks,
> Shawn
>
>

Re: clusterstate.json does not reflect current state of down versus active

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/16/2014 8:02 AM, Rich Mayfield wrote:
> However there doesn’t appear to be a way to force leadership to/from a
> particular replica.

I would have expected that doing a core reload on the current leader
would force an election and move the leader, but on my 4.2.1 SolrCloud
(the only version I have running at the moment) that does not appear to
be happening.  IMHO we need a way to force a leader change on a shard.
An API for "move all leaders currently on this Solr instance" would
actually be a very useful feature.

I can envision two issues for you to file in Jira.  The first would be
an Improvement issue, the second would be a Bug:

* SolrCloud: Add API to move leader off a Solr instance
* SolrCloud: LotsOfCollections takes a long time to stabilize

If we can get a dev who specializes in SolrCloud to respond, perhaps
they'll have a recommendation about whether these are sensible issues,
and if not, what they'd recommend.

Thanks,
Shawn

Re: clusterstate.json does not reflect current state of down versus active

Posted by Rich Mayfield <ma...@gmail.com>.

Shawn Heisey-4 wrote
> What are you trying to achieve with your restart? Can you just reload
> the collections one by one instead?

We restart when we update a handler, schema, or solrconfig for our cores.

I’ve tried just shutting down both nodes. Updating both, and restarting
both. With a 1,000 replicas though both nodes take awhile to spin up each
replica, figure out its state relative to SolrCloud, and spend a lot of time
trying to talk with one another. Inevitably something fails, retries in 2
seconds, then 4, 8, and soon retries go out 512 seconds. It doesn’t seem
that SolrCloud can handle a restart with a lot of cores without some careful
orchestration.

I've tried the relatively foolproof/safe approach of:

1) Unload all cores from node A (thus forcing leadership to node B)
2) Shut down, update, and restart node A
3) Re-create all cores in node A as replicas
4) Repeat 1-3 but for node B

The thing is, creating the cores takes a long time - a couple seconds per
core. Keep in mind that nothing is going on while doing this - no new
content to synchronize and no searches are being performed. But even with
2-3 seconds per core we're talking about a fairly long process to cycle
through both sets of 1,000 replicas.

When I do the above, clusterstate.json appears to be kept up to date and
reflects the nodes that have been created. I would expect this given we’re
talking about whether or not the replica exists.

What I was then trying to do is find a way to update both nodes without
going through the full unload/re-create process. Avoiding the leader
election process seemed to be key in a faster restart.

What I was hoping to achieve was:

1) Shift leadership to all replicas on node B
2) Shut down, update, and restart node A.
3) Repeat 1-2 but swap A/B

However there doesn’t appear to be a way to force leadership to/from a
particular replica.

Next approach was to merely shut down a node and wait for the other node to
pick up all leaders by fetching clusterstate.json.

1) Shut down node A
2) Wait for leader election process to play out (leaders shift to node B)
3) Update and restart A
4) Repeat 1-3 but swap A/B

With step 2 though, clusterstate.json doesn’t seem to update and reflect the
leader election process that I can see play out in the log. I use
http://solrhost/solr/zookeeper?path=%2Fclusterstate.json&detail=true to get
clusterstate.json. In the end, this isn’t that much better or faster than my
first approach (unload and create) because the leader election process still
takes a couple seconds per replica.

So basically three issues - and maybe I need focussing on the “right”
problem:

1) Pulling the plug on SolrCloud and restarting with ~1,000 cores is iffy -
many collections never start
2) There’s no way to force election off of or to a node for an orchestrated
restart
3) clusterstate.json doesn’t appear to be updated (frequently) when it comes
to capturing leadership

--
View this message in context: http://lucene.472066.n3.nabble.com/clusterstate-json-does-not-reflect-current-state-of-down-versus-active-tp4131266p4131470.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: clusterstate.json does not reflect current state of down versus active

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/15/2014 8:58 AM, Rich Mayfield wrote:
> I am trying to orchestrate a fast restart of a SolrCloud (4.7.1). I was
> hoping to use clusterstate.json would reflect the up/down state of each
> core as well as whether or not a given core was leader.
>
> clusterstate.json is not kept up to date with what I see going on in my
> logs though - I see the leader election process play out. I would expect
> that "state" would show "down" immediately for replicas on the node that I
> have shut down.
>
> Eventually, after about 30 minutes, all of the leader election processes
> complete and clusterstate.json gets updated to the true state for each
> replica.
>
> Why does it take so long for clusterstate.json to reflect the correct
> state? Is there a better way to determine the state of the system?
>
> (In my case, each node has upwards of 1,000 1-shard collections. There are
> two nodes in the cluster - each collection has 2 replicas.)

First, I'll admit that my experience with SolrCloud is not as extensive
as my experience with non-cloud installs.  I do have a SolrCloud (4.2.1)
install, but it's a the smallest possible redundant setup -- three
servers, two run Solr and Zookeeper, the third runs Zookeeper only.

What are you trying to achieve with your restart?  Can you just reload
the collections one by one instead?

Assuming that reloading isn't going to work for some reason (rebooting
for OS updates is one possibility), we need to determine why it takes so
long for a node to stabilize.

Here's a bunch of info about performance problems with Solr.  I wrote
it, so we can discuss it in depth if you like:

http://wiki.apache.org/solr/SolrPerformanceProblems

I have three possible suspicions for the root of your problem.  It is
likely to be one of them, but it could be a combination of any or all of
them.  Because this happens at startup, I don't think it's likely that
you're dealing with a GC problem caused by a very large heap.

1) The system is replaying 1000 transaction logs (possibly large, one
for each core) at startup, and also possibly initiating index recovery
using replication.  2) You don't have enough RAM to cache your index
effectively.  3) Your java heap is too small.

If your zookeeper ensemble does not use separate disks from your Solr
data (or separate servers), there could be an issue with zookeeper
client timeouts that's completely separate from any other problems.

I haven't addressed the fact that your cluster state doesn't update
quickly.  This might be a bug, but if we can deal with the slow
startup/stabilization first, then we can see whether there's anything
left to deal with on the cluster state.

Thanks,
Shawn