You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Patrick Recchia <pa...@gmail.com> on 2018/07/23 10:27:57 UTC

/state.json vs /clusterstate.json

From what I know until today, the status of a solr cluster used to be
stored in a zk entry /clusterstate.json; but is now, from solr 5.0, stored
within a sub-folder /collections/<xxx>/state.json.

We are having issues with our cluster, and I have noticed today that:
for most of the collections  there is a /state.json entry within
/collections/<xxxx>/state.json

but for some of them there is no entry state.json.
On the other hand, there is a /clusterstate.json; which I would have not
expected.

What is going on?
Who decides where the state of a collection is written to?
Can I force it somehow?

Because, from what I can understand, we're facing the 'few hundreds
collections' issue I've read about some time ago.

Let me explain:

Just a few figures:
- we currently have 103 collections
- most of them  have 40 shards and 2 replicas each
Which brings to approx 800 replicas in total.

Now, we had found references somewhere on the net saying that the 'number
of collections' of a solr cluster should remain within the 'few hundreds'
range.
because of performance issue. Since each 'collection' would point to the
same zk entry.
Comment seemed to be bound to solr 4, though.

But now, we have reached 800 nodes. Which shouldn't be a problem if they
cluster in groups of 80 nodes at a time (1 collection).
But is definitely an issue if they all point to a single zk node.

Thanks already for any hint at where to look

Patrick

Re: /state.json vs /clusterstate.json

Posted by Erick Erickson <er...@gmail.com>.
Oh boy. These bits are awkward during the transition but I think you'll be OK.

bq. "but for some of them there is no entry state.json"

This is a bit concerning. Are the entries in clustersate.json for
valid replicas that are _not_ in the associated state.json?

There is a cluster property "legacyCloud". When true (the default
before 7.0), when Solr finds replicas laying around on disk it
reconstructs the znode from the information in the associated
core.properties file in the replica's directory. In, you guessed it,
clusterstate.json. So if you, say, shut down a Solr node with replicas
on it, then deleted the collection and then brought the Solr node
backup, those replicas would re-appear in clusterstate.json.

So if you have live replicas in clusterstate.json but _not_ in
state.json then somehow you have to get them to the right place. If
not, they can be safely deleted from clusterstate.json.

What you really want, though, is to not have anything in
clusterstate.json. So here's what I'd do:

> create a different ZK ensemble, maybe a single node on your local box. Specifically _not_ connected to your prod system.
> See what happens if you issue the MIGRATESTATEFORMAT command to that isolated ZK node. Does the result conform to your prod system? If so, you can run it on our prod system.
> Once you're happy with the individual state.json files, go ahead and migrate those to prod.
> set the legacyCloud property to false.

NOTES:
1> you need to have an empty clusterstate.json file, one that just
consists of {}.
2> you can use the "bin/solr zk" series of commands to overwrite
individual znodes. Or zkcli, whichever you find easiest.
3> I'd _really_ recommend backing things up first!
4> There are visual ZK node editor tools out there, if this gets
really complex it would probably be worth investing in.

Best,
Erick

On Mon, Jul 23, 2018 at 3:27 AM, Patrick Recchia
<pa...@gmail.com> wrote:
> From what I know until today, the status of a solr cluster used to be
> stored in a zk entry /clusterstate.json; but is now, from solr 5.0, stored
> within a sub-folder /collections/<xxx>/state.json.
>
> We are having issues with our cluster, and I have noticed today that:
> for most of the collections  there is a /state.json entry within
> /collections/<xxxx>/state.json
>
> but for some of them there is no entry state.json.
> On the other hand, there is a /clusterstate.json; which I would have not
> expected.
>
> What is going on?
> Who decides where the state of a collection is written to?
> Can I force it somehow?
>
> Because, from what I can understand, we're facing the 'few hundreds
> collections' issue I've read about some time ago.
>
> Let me explain:
>
> Just a few figures:
> - we currently have 103 collections
> - most of them  have 40 shards and 2 replicas each
> Which brings to approx 800 replicas in total.
>
> Now, we had found references somewhere on the net saying that the 'number
> of collections' of a solr cluster should remain within the 'few hundreds'
> range.
> because of performance issue. Since each 'collection' would point to the
> same zk entry.
> Comment seemed to be bound to solr 4, though.
>
> But now, we have reached 800 nodes. Which shouldn't be a problem if they
> cluster in groups of 80 nodes at a time (1 collection).
> But is definitely an issue if they all point to a single zk node.
>
> Thanks already for any hint at where to look
>
> Patrick