You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Oliver Schrenk <ol...@gmail.com> on 2014/03/11 16:25:25 UTC

replica reports recovery_failed but is considered the leader

Hi,

After an unsuccessful indexing on a Solr Cloud cluster with four machines, were we experienced a lot of errors we are still trying to investigate, we found the cluster to be in a weird state.

    {"collection_v1":{
        "shards":{
          "shard1":{
            "range":"80000000-bfffffff",
            "state":"active",
            "replicas":{
              "core_node1":{
                "state":"recovery_failed",
                "base_url":"http://solr-host9:7070/solr",
                "core":"elmar_v1_shard1_replica1",
                "node_name":"solr-host9:7070_solr",
                "leader":"true"},
              "core_node2":{
                "state":"active",
                "base_url":"http://solr-host8:7070/solr",
                "core":"elmar_v1_shard1_replica2",
                "node_name":"solr-host8:7070_solr"}}},

        ...

        "maxShardsPerNode":"2",
        "router":{"name":"compositeId"},
        "replicationFactor":"2"}}
    }


From my point of view it doesn’t make sense that core_node1is the leader of shard1, when it can’t even be recovered.  With the other machine completely working, why is core_node2 not the leader? Am I wrong in my assumption? In the same vein, how I can I manually set the leader?

Regards
Oliver

Re: replica reports recovery_failed but is considered the leader

Posted by Oliver Schrenk <ol...@gmail.com>.

Solr 4.7

On 11 Mar 2014, at 16:43, Erick Erickson <er...@gmail.com> wrote:

> What version of Solr? There's been quite a bit of work
> between various 4x versions.....
> 
> Erick
> 
> On Tue, Mar 11, 2014 at 11:25 AM, Oliver Schrenk
> <ol...@gmail.com> wrote:
>> Hi,
>> 
>> After an unsuccessful indexing on a Solr Cloud cluster with four machines, were we experienced a lot of errors we are still trying to investigate, we found the cluster to be in a weird state.
>> 
>>    {"collection_v1":{
>>        "shards":{
>>          "shard1":{
>>            "range":"80000000-bfffffff",
>>            "state":"active",
>>            "replicas":{
>>              "core_node1":{
>>                "state":"recovery_failed",
>>                "base_url":"http://solr-host9:7070/solr",
>>                "core":"elmar_v1_shard1_replica1",
>>                "node_name":"solr-host9:7070_solr",
>>                "leader":"true"},
>>              "core_node2":{
>>                "state":"active",
>>                "base_url":"http://solr-host8:7070/solr",
>>                "core":"elmar_v1_shard1_replica2",
>>                "node_name":"solr-host8:7070_solr"}}},
>> 
>>        ...
>> 
>>        "maxShardsPerNode":"2",
>>        "router":{"name":"compositeId"},
>>        "replicationFactor":"2"}}
>>    }
>> 
>> 
>> From my point of view it doesn't make sense that core_node1is the leader of shard1, when it can't even be recovered.  With the other machine completely working, why is core_node2 not the leader? Am I wrong in my assumption? In the same vein, how I can I manually set the leader?
>> 
>> Regards
>> Oliver
>>

Re: replica reports recovery_failed but is considered the leader

Posted by Erick Erickson <er...@gmail.com>.

What version of Solr? There's been quite a bit of work
between various 4x versions.....

Erick

On Tue, Mar 11, 2014 at 11:25 AM, Oliver Schrenk
<ol...@gmail.com> wrote:
> Hi,
>
> After an unsuccessful indexing on a Solr Cloud cluster with four machines, were we experienced a lot of errors we are still trying to investigate, we found the cluster to be in a weird state.
>
>     {"collection_v1":{
>         "shards":{
>           "shard1":{
>             "range":"80000000-bfffffff",
>             "state":"active",
>             "replicas":{
>               "core_node1":{
>                 "state":"recovery_failed",
>                 "base_url":"http://solr-host9:7070/solr",
>                 "core":"elmar_v1_shard1_replica1",
>                 "node_name":"solr-host9:7070_solr",
>                 "leader":"true"},
>               "core_node2":{
>                 "state":"active",
>                 "base_url":"http://solr-host8:7070/solr",
>                 "core":"elmar_v1_shard1_replica2",
>                 "node_name":"solr-host8:7070_solr"}}},
>
>         ...
>
>         "maxShardsPerNode":"2",
>         "router":{"name":"compositeId"},
>         "replicationFactor":"2"}}
>     }
>
>
> From my point of view it doesn't make sense that core_node1is the leader of shard1, when it can't even be recovered.  With the other machine completely working, why is core_node2 not the leader? Am I wrong in my assumption? In the same vein, how I can I manually set the leader?
>
> Regards
> Oliver
>