You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Oliver Schrenk <ol...@gmail.com> on 2014/03/11 16:25:25 UTC
replica reports recovery_failed but is considered the leader
Hi,
After an unsuccessful indexing on a Solr Cloud cluster with four machines, were we experienced a lot of errors we are still trying to investigate, we found the cluster to be in a weird state.
{"collection_v1":{
"shards":{
"shard1":{
"range":"80000000-bfffffff",
"state":"active",
"replicas":{
"core_node1":{
"state":"recovery_failed",
"base_url":"http://solr-host9:7070/solr",
"core":"elmar_v1_shard1_replica1",
"node_name":"solr-host9:7070_solr",
"leader":"true"},
"core_node2":{
"state":"active",
"base_url":"http://solr-host8:7070/solr",
"core":"elmar_v1_shard1_replica2",
"node_name":"solr-host8:7070_solr"}}},
...
"maxShardsPerNode":"2",
"router":{"name":"compositeId"},
"replicationFactor":"2"}}
}
From my point of view it doesn’t make sense that core_node1is the leader of shard1, when it can’t even be recovered. With the other machine completely working, why is core_node2 not the leader? Am I wrong in my assumption? In the same vein, how I can I manually set the leader?
Regards
Oliver
Re: replica reports recovery_failed but is considered the leader
Posted by Oliver Schrenk <ol...@gmail.com>.
Solr 4.7
On 11 Mar 2014, at 16:43, Erick Erickson <er...@gmail.com> wrote:
> What version of Solr? There's been quite a bit of work
> between various 4x versions.....
>
> Erick
>
> On Tue, Mar 11, 2014 at 11:25 AM, Oliver Schrenk
> <ol...@gmail.com> wrote:
>> Hi,
>>
>> After an unsuccessful indexing on a Solr Cloud cluster with four machines, were we experienced a lot of errors we are still trying to investigate, we found the cluster to be in a weird state.
>>
>> {"collection_v1":{
>> "shards":{
>> "shard1":{
>> "range":"80000000-bfffffff",
>> "state":"active",
>> "replicas":{
>> "core_node1":{
>> "state":"recovery_failed",
>> "base_url":"http://solr-host9:7070/solr",
>> "core":"elmar_v1_shard1_replica1",
>> "node_name":"solr-host9:7070_solr",
>> "leader":"true"},
>> "core_node2":{
>> "state":"active",
>> "base_url":"http://solr-host8:7070/solr",
>> "core":"elmar_v1_shard1_replica2",
>> "node_name":"solr-host8:7070_solr"}}},
>>
>> ...
>>
>> "maxShardsPerNode":"2",
>> "router":{"name":"compositeId"},
>> "replicationFactor":"2"}}
>> }
>>
>>
>> From my point of view it doesn't make sense that core_node1is the leader of shard1, when it can't even be recovered. With the other machine completely working, why is core_node2 not the leader? Am I wrong in my assumption? In the same vein, how I can I manually set the leader?
>>
>> Regards
>> Oliver
>>
Re: replica reports recovery_failed but is considered the leader
Posted by Erick Erickson <er...@gmail.com>.
What version of Solr? There's been quite a bit of work
between various 4x versions.....
Erick
On Tue, Mar 11, 2014 at 11:25 AM, Oliver Schrenk
<ol...@gmail.com> wrote:
> Hi,
>
> After an unsuccessful indexing on a Solr Cloud cluster with four machines, were we experienced a lot of errors we are still trying to investigate, we found the cluster to be in a weird state.
>
> {"collection_v1":{
> "shards":{
> "shard1":{
> "range":"80000000-bfffffff",
> "state":"active",
> "replicas":{
> "core_node1":{
> "state":"recovery_failed",
> "base_url":"http://solr-host9:7070/solr",
> "core":"elmar_v1_shard1_replica1",
> "node_name":"solr-host9:7070_solr",
> "leader":"true"},
> "core_node2":{
> "state":"active",
> "base_url":"http://solr-host8:7070/solr",
> "core":"elmar_v1_shard1_replica2",
> "node_name":"solr-host8:7070_solr"}}},
>
> ...
>
> "maxShardsPerNode":"2",
> "router":{"name":"compositeId"},
> "replicationFactor":"2"}}
> }
>
>
> From my point of view it doesn't make sense that core_node1is the leader of shard1, when it can't even be recovered. With the other machine completely working, why is core_node2 not the leader? Am I wrong in my assumption? In the same vein, how I can I manually set the leader?
>
> Regards
> Oliver
>