You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Tim Chen <Ti...@sbs.com.au> on 2016/06/29 07:05:22 UTC

Solr Cloud 2nd Server Recover Stuck

Hi,

I need some help please.

I am running Solr Cloud 4.10.4, with ensemble ZooKeeper.

Server A running Solr Cloud + ZooKeeper
Server B running Solr Cloud + ZooKeeper
Server C running ZooKeeper only.

For some reason Server B is crashed and all data lost. I have cleaned it up, deleted all existing collection index files and start up the Solr service fresh.

If a Collection that has only 1 shard, Server B has managed to create and replicate from Server A:
       SolrCore [collection1] Solr index directory 'xxxxxxxx/collection1/data/index' doesn't exist. Creating new index...

If a Collection that has 2 shards, Server B doesn't seem to be doing anything. The Collection was configured 2 shards and 2 replication originally.

Here is the Clusterstate.json from ZooKeeper.

Collection1 has only 1 shard.
Collection cr_dev has 2 shards, one is on server A, one was on server B.
Server A: 10.1.11.70
Server B: 10.2.11.244

Is it because "autoCreated" is missing from collection cr_dev? How do I set this? API call?

"collection1":{
    "shards":{"shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{
          "core_node1":{
            "state":"active",
            "core":"collection1",
            "node_name":"10.1.11.70:8983_solr",
            "base_url":"http://10.1.11.70:8983/solr",
            "leader":"true"},
          "core_node2":{
            "state":"active",
            "core":"collection1",
            "node_name":"10.2.11.244:8983_solr",
            "base_url":"http://10.2.11.244:8983/solr"}}}},
    "maxShardsPerNode":"1",
    "router":{"name":"compositeId"},
    "replicationFactor":"1",
    "autoAddReplicas":"false",
    "autoCreated":"true"},
  "cr_dev":{
    "shards":{
      "shard1":{
        "range":"80000000-ffffffff",
        "state":"active",
        "replicas":{
          "core_node1":{
            "state":"active",
            "core":"cr_dev_shard1_replica1",
            "node_name":"10.1.11.70:8983_solr",
            "base_url":"http://10.1.11.70:8983/solr",
            "leader":"true"},
          "core_node4":{
            "state":"down",
            "core":"cr_dev_shard1_replica2",
            "node_name":"10.2.11.244:8983_solr",
            "base_url":"http://10.2.11.244:8983/solr"}}},
      "shard2":{
        "range":"0-7fffffff",
        "state":"active",
        "replicas":{
          "core_node2":{
            "state":"active",
            "core":"cr_dev_shard2_replica1",
            "node_name":"10.1.11.70:8983_solr",
            "base_url":"http://10.1.11.70:8983/solr",
            "leader":"true"},
          "core_node3":{
            "state":"down",
            "core":"cr_dev_shard2_replica2",
            "node_name":"10.2.11.244:8983_solr",
            "base_url":"http://10.2.11.244:8983/solr"}}}},
    "maxShardsPerNode":"2",
    "router":{"name":"compositeId"},
    "replicationFactor":"2",
    "autoAddReplicas":"false"},

Many thanks,
Tim


[tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>

Re: Solr Cloud 2nd Server Recover Stuck

Posted by Erick Erickson <er...@gmail.com>.

NP, glad it worked!

On Wed, Jun 29, 2016 at 10:33 PM, Tim Chen <Ti...@sbs.com.au> wrote:
> Hi Erick,
>
> I have followed your instruction to added as new replica and deleted the old replica - works great!
>
> Everything back to normal now.
>
> Thanks mate!
>
> Cheers,
> Tim
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Thursday, 30 June 2016 1:49 AM
> To: solr-user
> Subject: Re: Solr Cloud 2nd Server Recover Stuck
>
> I'm assuming that 10.1.11.79 is server A here.
>
> What this _looks_ like is that you deleted the entire
> directory here:
> cr_dev_shard1_replica2
> cr_dev_shard2_replica2
>
> but not
> collection1
>
> on server B. This is a little inconsistent, but I think the collection1
> core naming was a little weird with the default collection in 4.10...
>
> Anyway, if this is true then there'll be no
> core.properties
> file in cr_dev_blah blah.
>
> So, Zookeeper has  a record of there being
> such a thing, but it's not present on your sever B.
> To Zookeeper, since the replica hasn't registered
> itself it still looks like the machine is just down.
>
> So here's what I'd try:
> Well, first I'd back up server As index directories...
>
> Use the Collections API ADDREPLICA command to
> add a replica on Server B for each shard, use the "node"
> parameter.
>
> That should churn for a while but eventually create a replica
> and sync it with the leader. Once that's done, use the DELETEREPLICA
> to force Zookeeper to remove the traces of the original replicas on
> server B.
>
> Best,
> Erick
>
> On Wed, Jun 29, 2016 at 12:05 AM, Tim Chen <Ti...@sbs.com.au> wrote:
>> Hi,
>>
>> I need some help please.
>>
>> I am running Solr Cloud 4.10.4, with ensemble ZooKeeper.
>>
>> Server A running Solr Cloud + ZooKeeper
>> Server B running Solr Cloud + ZooKeeper
>> Server C running ZooKeeper only.
>>
>> For some reason Server B is crashed and all data lost. I have cleaned it up, deleted all existing collection index files and start up the Solr service fresh.
>>
>> If a Collection that has only 1 shard, Server B has managed to create and replicate from Server A:
>>        SolrCore [collection1] Solr index directory 'xxxxxxxx/collection1/data/index' doesn't exist. Creating new index...
>>
>> If a Collection that has 2 shards, Server B doesn't seem to be doing anything. The Collection was configured 2 shards and 2 replication originally.
>>
>> Here is the Clusterstate.json from ZooKeeper.
>>
>> Collection1 has only 1 shard.
>> Collection cr_dev has 2 shards, one is on server A, one was on server B.
>> Server A: 10.1.11.70
>> Server B: 10.2.11.244
>>
>> Is it because "autoCreated" is missing from collection cr_dev? How do I set this? API call?
>>
>> "collection1":{
>>     "shards":{"shard1":{
>>         "range":"80000000-7fffffff",
>>         "state":"active",
>>         "replicas":{
>>           "core_node1":{
>>             "state":"active",
>>             "core":"collection1",
>>             "node_name":"10.1.11.70:8983_solr",
>>             "base_url":"http://10.1.11.70:8983/solr",
>>             "leader":"true"},
>>           "core_node2":{
>>             "state":"active",
>>             "core":"collection1",
>>             "node_name":"10.2.11.244:8983_solr",
>>             "base_url":"http://10.2.11.244:8983/solr"}}}},
>>     "maxShardsPerNode":"1",
>>     "router":{"name":"compositeId"},
>>     "replicationFactor":"1",
>>     "autoAddReplicas":"false",
>>     "autoCreated":"true"},
>>   "cr_dev":{
>>     "shards":{
>>       "shard1":{
>>         "range":"80000000-ffffffff",
>>         "state":"active",
>>         "replicas":{
>>           "core_node1":{
>>             "state":"active",
>>             "core":"cr_dev_shard1_replica1",
>>             "node_name":"10.1.11.70:8983_solr",
>>             "base_url":"http://10.1.11.70:8983/solr",
>>             "leader":"true"},
>>           "core_node4":{
>>             "state":"down",
>>             "core":"cr_dev_shard1_replica2",
>>             "node_name":"10.2.11.244:8983_solr",
>>             "base_url":"http://10.2.11.244:8983/solr"}}},
>>       "shard2":{
>>         "range":"0-7fffffff",
>>         "state":"active",
>>         "replicas":{
>>           "core_node2":{
>>             "state":"active",
>>             "core":"cr_dev_shard2_replica1",
>>             "node_name":"10.1.11.70:8983_solr",
>>             "base_url":"http://10.1.11.70:8983/solr",
>>             "leader":"true"},
>>           "core_node3":{
>>             "state":"down",
>>             "core":"cr_dev_shard2_replica2",
>>             "node_name":"10.2.11.244:8983_solr",
>>             "base_url":"http://10.2.11.244:8983/solr"}}}},
>>     "maxShardsPerNode":"2",
>>     "router":{"name":"compositeId"},
>>     "replicationFactor":"2",
>>     "autoAddReplicas":"false"},
>>
>> Many thanks,
>> Tim
>>
>>
>> [tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>
>
>
> [tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>

RE: Solr Cloud 2nd Server Recover Stuck

Posted by Tim Chen <Ti...@sbs.com.au>.

Hi Erick,

I have followed your instruction to added as new replica and deleted the old replica - works great!

Everything back to normal now.

Thanks mate!

Cheers,
Tim

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Thursday, 30 June 2016 1:49 AM
To: solr-user
Subject: Re: Solr Cloud 2nd Server Recover Stuck

I'm assuming that 10.1.11.79 is server A here.

What this _looks_ like is that you deleted the entire
directory here:
cr_dev_shard1_replica2
cr_dev_shard2_replica2

but not
collection1

on server B. This is a little inconsistent, but I think the collection1
core naming was a little weird with the default collection in 4.10...

Anyway, if this is true then there'll be no
core.properties
file in cr_dev_blah blah.

So, Zookeeper has  a record of there being
such a thing, but it's not present on your sever B.
To Zookeeper, since the replica hasn't registered
itself it still looks like the machine is just down.

So here's what I'd try:
Well, first I'd back up server As index directories...

Use the Collections API ADDREPLICA command to
add a replica on Server B for each shard, use the "node"
parameter.

That should churn for a while but eventually create a replica
and sync it with the leader. Once that's done, use the DELETEREPLICA
to force Zookeeper to remove the traces of the original replicas on
server B.

Best,
Erick

On Wed, Jun 29, 2016 at 12:05 AM, Tim Chen <Ti...@sbs.com.au> wrote:
> Hi,
>
> I need some help please.
>
> I am running Solr Cloud 4.10.4, with ensemble ZooKeeper.
>
> Server A running Solr Cloud + ZooKeeper
> Server B running Solr Cloud + ZooKeeper
> Server C running ZooKeeper only.
>
> For some reason Server B is crashed and all data lost. I have cleaned it up, deleted all existing collection index files and start up the Solr service fresh.
>
> If a Collection that has only 1 shard, Server B has managed to create and replicate from Server A:
>        SolrCore [collection1] Solr index directory 'xxxxxxxx/collection1/data/index' doesn't exist. Creating new index...
>
> If a Collection that has 2 shards, Server B doesn't seem to be doing anything. The Collection was configured 2 shards and 2 replication originally.
>
> Here is the Clusterstate.json from ZooKeeper.
>
> Collection1 has only 1 shard.
> Collection cr_dev has 2 shards, one is on server A, one was on server B.
> Server A: 10.1.11.70
> Server B: 10.2.11.244
>
> Is it because "autoCreated" is missing from collection cr_dev? How do I set this? API call?
>
> "collection1":{
>     "shards":{"shard1":{
>         "range":"80000000-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node1":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"10.1.11.70:8983_solr",
>             "base_url":"http://10.1.11.70:8983/solr",
>             "leader":"true"},
>           "core_node2":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"10.2.11.244:8983_solr",
>             "base_url":"http://10.2.11.244:8983/solr"}}}},
>     "maxShardsPerNode":"1",
>     "router":{"name":"compositeId"},
>     "replicationFactor":"1",
>     "autoAddReplicas":"false",
>     "autoCreated":"true"},
>   "cr_dev":{
>     "shards":{
>       "shard1":{
>         "range":"80000000-ffffffff",
>         "state":"active",
>         "replicas":{
>           "core_node1":{
>             "state":"active",
>             "core":"cr_dev_shard1_replica1",
>             "node_name":"10.1.11.70:8983_solr",
>             "base_url":"http://10.1.11.70:8983/solr",
>             "leader":"true"},
>           "core_node4":{
>             "state":"down",
>             "core":"cr_dev_shard1_replica2",
>             "node_name":"10.2.11.244:8983_solr",
>             "base_url":"http://10.2.11.244:8983/solr"}}},
>       "shard2":{
>         "range":"0-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node2":{
>             "state":"active",
>             "core":"cr_dev_shard2_replica1",
>             "node_name":"10.1.11.70:8983_solr",
>             "base_url":"http://10.1.11.70:8983/solr",
>             "leader":"true"},
>           "core_node3":{
>             "state":"down",
>             "core":"cr_dev_shard2_replica2",
>             "node_name":"10.2.11.244:8983_solr",
>             "base_url":"http://10.2.11.244:8983/solr"}}}},
>     "maxShardsPerNode":"2",
>     "router":{"name":"compositeId"},
>     "replicationFactor":"2",
>     "autoAddReplicas":"false"},
>
> Many thanks,
> Tim
>
>
> [tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>


[tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>

Re: Solr Cloud 2nd Server Recover Stuck

Posted by Erick Erickson <er...@gmail.com>.

I'm assuming that 10.1.11.79 is server A here.

What this _looks_ like is that you deleted the entire
directory here:
cr_dev_shard1_replica2
cr_dev_shard2_replica2

but not
collection1

on server B. This is a little inconsistent, but I think the collection1
core naming was a little weird with the default collection in 4.10...

Anyway, if this is true then there'll be no
core.properties
file in cr_dev_blah blah.

So, Zookeeper has  a record of there being
such a thing, but it's not present on your sever B.
To Zookeeper, since the replica hasn't registered
itself it still looks like the machine is just down.

So here's what I'd try:
Well, first I'd back up server As index directories...

Use the Collections API ADDREPLICA command to
add a replica on Server B for each shard, use the "node"
parameter.

That should churn for a while but eventually create a replica
and sync it with the leader. Once that's done, use the DELETEREPLICA
to force Zookeeper to remove the traces of the original replicas on
server B.

Best,
Erick

On Wed, Jun 29, 2016 at 12:05 AM, Tim Chen <Ti...@sbs.com.au> wrote:
> Hi,
>
> I need some help please.
>
> I am running Solr Cloud 4.10.4, with ensemble ZooKeeper.
>
> Server A running Solr Cloud + ZooKeeper
> Server B running Solr Cloud + ZooKeeper
> Server C running ZooKeeper only.
>
> For some reason Server B is crashed and all data lost. I have cleaned it up, deleted all existing collection index files and start up the Solr service fresh.
>
> If a Collection that has only 1 shard, Server B has managed to create and replicate from Server A:
>        SolrCore [collection1] Solr index directory 'xxxxxxxx/collection1/data/index' doesn't exist. Creating new index...
>
> If a Collection that has 2 shards, Server B doesn't seem to be doing anything. The Collection was configured 2 shards and 2 replication originally.
>
> Here is the Clusterstate.json from ZooKeeper.
>
> Collection1 has only 1 shard.
> Collection cr_dev has 2 shards, one is on server A, one was on server B.
> Server A: 10.1.11.70
> Server B: 10.2.11.244
>
> Is it because "autoCreated" is missing from collection cr_dev? How do I set this? API call?
>
> "collection1":{
>     "shards":{"shard1":{
>         "range":"80000000-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node1":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"10.1.11.70:8983_solr",
>             "base_url":"http://10.1.11.70:8983/solr",
>             "leader":"true"},
>           "core_node2":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"10.2.11.244:8983_solr",
>             "base_url":"http://10.2.11.244:8983/solr"}}}},
>     "maxShardsPerNode":"1",
>     "router":{"name":"compositeId"},
>     "replicationFactor":"1",
>     "autoAddReplicas":"false",
>     "autoCreated":"true"},
>   "cr_dev":{
>     "shards":{
>       "shard1":{
>         "range":"80000000-ffffffff",
>         "state":"active",
>         "replicas":{
>           "core_node1":{
>             "state":"active",
>             "core":"cr_dev_shard1_replica1",
>             "node_name":"10.1.11.70:8983_solr",
>             "base_url":"http://10.1.11.70:8983/solr",
>             "leader":"true"},
>           "core_node4":{
>             "state":"down",
>             "core":"cr_dev_shard1_replica2",
>             "node_name":"10.2.11.244:8983_solr",
>             "base_url":"http://10.2.11.244:8983/solr"}}},
>       "shard2":{
>         "range":"0-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node2":{
>             "state":"active",
>             "core":"cr_dev_shard2_replica1",
>             "node_name":"10.1.11.70:8983_solr",
>             "base_url":"http://10.1.11.70:8983/solr",
>             "leader":"true"},
>           "core_node3":{
>             "state":"down",
>             "core":"cr_dev_shard2_replica2",
>             "node_name":"10.2.11.244:8983_solr",
>             "base_url":"http://10.2.11.244:8983/solr"}}}},
>     "maxShardsPerNode":"2",
>     "router":{"name":"compositeId"},
>     "replicationFactor":"2",
>     "autoAddReplicas":"false"},
>
> Many thanks,
> Tim
>
>
> [tour de france 2 july 8:30pm]<http://www.sbs.com.au/cyclingcentral/>