You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shai Erera <se...@gmail.com> on 2015/03/24 08:36:01 UTC

maxReplicasPerNode

Hi

I saw that we can define maxShardsPerNode when creating a collection, but I
don't see that I can set something similar for replicas. My scenario is the
following:

   - I setup one Solr node
   - Create collection with numShards=1 and replicationFactor=2
   - Hopefully, one replica is created on that node
   - When I bring up the second Solr node, the second replica will be
   created

What I see is that both replicas are created on the first node, and when I
bring up the second Solr node, none of the replicas are moved.

I know that I can "move" one replica by calling ADDREPLICA on node2, then
DELETEREPLICA on node1, but I was wondering if there's an automated way to
do that.

I've also considered creating the collection with replicationFactor=1 and
when the second node comes up it will look for shards w/ one replica only,
and assign themselves as the replica. But it means I have to own that piece
of logic, where if Solr already does that, that's better.

Also, from what I understand, if I create a collection w/ rf=2 and there
are two nodes, then each node is assigned a replica. If one of the nodes
comes down, and a 3rd node comes up, it will be assigned a replica -- is
that correct?

Another related question, if there are two replicas on node1 and node2, and
node2 goes down -- will node1 be assigned the second replica as well?

If this is explained somewhere, I'd appreciate if you can give me a pointer.

Shai

Re: maxReplicasPerNode

Posted by Shai Erera <se...@gmail.com>.
Thanks guys, this makes sense I guess, from Solr's side.

Perhaps we can have a new Collections API like REDIRECTREPLICA or
something, that will redirect a replica to the new node.
This API can simply do ADDREPLICA on the new node, and DELETEREPLICA of the
node that doesn't exist anymore.

I guess I need to implement that for my use case now (I know that if a node
came down, it won't ever come back up again - there will be a new node
replacing it), so I'll see how it plays out and if it works well, I'll open
a JIRA issue. In my case, when the new node comes up, it can check the
cluster's status, and if it detects an orphanage replica, it will add
itself as a new replica and delete the orphanage one.

Let me know if you see a problem with how I intend to address that.

Shai

On Tue, Mar 24, 2015 at 6:01 PM, Anshum Gupta <an...@anshumgupta.net>
wrote:

> Yes, it applies to both. Solr wouldn't auto-add replicas in either of those
> cases (or any other case) to meet the rf specified at create time.
>
> On Tue, Mar 24, 2015 at 2:22 AM, Shai Erera <se...@gmail.com> wrote:
>
> > Thanks Anshum,
> >
> > About #3, i line with my answer to the previous question, Solr wouldn't
> > > auto-add a Replica to meet the replication factor when a node goes
> down.
> > >
> >
> > Just to make sure the answer applies to both these cases:
> >
> >    1. There are two replicas on node1 and node2. Solr won't add a replica
> >    to node1 when node2 goes down.
> >    2. The collection was created with rf=2, Solr creates replicas on
> node1
> >    and node2. If node2 goes down and a node3 comes up instead, will it be
> >    assigned a replica, or Solr does not do that also?
> >
> > In short, is there any scenario where Solr would auto-add replicas (aside
> > from running on HDFS) to meet the 'rf' setting, or after the collection
> has
> > been created, ensuring RF is met is my responsibility?
> >
> > Shai
> >
> > On Tue, Mar 24, 2015 at 10:02 AM, Anshum Gupta <an...@anshumgupta.net>
> > wrote:
> >
> > > Hi Shai,
> > >
> > > As of now, all replicas for a collections are created to meet the
> > specified
> > > replication factor at the time of collection creation. There's no way
> to
> > > defer that until more nodes are up. Your best bet is to have the nodes
> > > already up before you CREATE the collection or create the collection
> > with a
> > > lower replication factor and then use ADDREPLICA.
> > >
> > > About auto-addition of replicas, that's kind of supported when using
> > shared
> > > file system (HDFS) to host the index. It's doesn't truly work as per
> your
> > > use-case i.e. it doesn't consider the intended replication factor but
> > only
> > > brings up a Replica in case all replicas for a node are down, so that
> > > SolrCloud continues to be usable. It also doesn't auto-remove replica
> > when
> > > the old node comes back up. You can read more about this in the
> > > "Automatically Add Replicas in SolrCloud" section here:
> > > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> > >
> > > About #3, i line with my answer to the previous question, Solr wouldn't
> > > auto-add a Replica to meet the replication factor when a node goes
> down.
> > >
> > >
> > > On Tue, Mar 24, 2015 at 12:36 AM, Shai Erera <se...@gmail.com> wrote:
> > >
> > > > Hi
> > > >
> > > > I saw that we can define maxShardsPerNode when creating a collection,
> > > but I
> > > > don't see that I can set something similar for replicas. My scenario
> is
> > > the
> > > > following:
> > > >
> > > >    - I setup one Solr node
> > > >    - Create collection with numShards=1 and replicationFactor=2
> > > >    - Hopefully, one replica is created on that node
> > > >    - When I bring up the second Solr node, the second replica will be
> > > >    created
> > > >
> > > > What I see is that both replicas are created on the first node, and
> > when
> > > I
> > > > bring up the second Solr node, none of the replicas are moved.
> > > >
> > > > I know that I can "move" one replica by calling ADDREPLICA on node2,
> > then
> > > > DELETEREPLICA on node1, but I was wondering if there's an automated
> way
> > > to
> > > > do that.
> > > >
> > > > I've also considered creating the collection with replicationFactor=1
> > and
> > > > when the second node comes up it will look for shards w/ one replica
> > > only,
> > > > and assign themselves as the replica. But it means I have to own that
> > > piece
> > > > of logic, where if Solr already does that, that's better.
> > > >
> > > > Also, from what I understand, if I create a collection w/ rf=2 and
> > there
> > > > are two nodes, then each node is assigned a replica. If one of the
> > nodes
> > > > comes down, and a 3rd node comes up, it will be assigned a replica --
> > is
> > > > that correct?
> > > >
> > > > Another related question, if there are two replicas on node1 and
> node2,
> > > and
> > > > node2 goes down -- will node1 be assigned the second replica as well?
> > > >
> > > > If this is explained somewhere, I'd appreciate if you can give me a
> > > > pointer.
> > > >
> > > > Shai
> > > >
> > >
> > >
> > >
> > > --
> > > Anshum Gupta
> > >
> >
>
>
>
> --
> Anshum Gupta
>

Re: maxReplicasPerNode

Posted by Anshum Gupta <an...@anshumgupta.net>.
Yes, it applies to both. Solr wouldn't auto-add replicas in either of those
cases (or any other case) to meet the rf specified at create time.

On Tue, Mar 24, 2015 at 2:22 AM, Shai Erera <se...@gmail.com> wrote:

> Thanks Anshum,
>
> About #3, i line with my answer to the previous question, Solr wouldn't
> > auto-add a Replica to meet the replication factor when a node goes down.
> >
>
> Just to make sure the answer applies to both these cases:
>
>    1. There are two replicas on node1 and node2. Solr won't add a replica
>    to node1 when node2 goes down.
>    2. The collection was created with rf=2, Solr creates replicas on node1
>    and node2. If node2 goes down and a node3 comes up instead, will it be
>    assigned a replica, or Solr does not do that also?
>
> In short, is there any scenario where Solr would auto-add replicas (aside
> from running on HDFS) to meet the 'rf' setting, or after the collection has
> been created, ensuring RF is met is my responsibility?
>
> Shai
>
> On Tue, Mar 24, 2015 at 10:02 AM, Anshum Gupta <an...@anshumgupta.net>
> wrote:
>
> > Hi Shai,
> >
> > As of now, all replicas for a collections are created to meet the
> specified
> > replication factor at the time of collection creation. There's no way to
> > defer that until more nodes are up. Your best bet is to have the nodes
> > already up before you CREATE the collection or create the collection
> with a
> > lower replication factor and then use ADDREPLICA.
> >
> > About auto-addition of replicas, that's kind of supported when using
> shared
> > file system (HDFS) to host the index. It's doesn't truly work as per your
> > use-case i.e. it doesn't consider the intended replication factor but
> only
> > brings up a Replica in case all replicas for a node are down, so that
> > SolrCloud continues to be usable. It also doesn't auto-remove replica
> when
> > the old node comes back up. You can read more about this in the
> > "Automatically Add Replicas in SolrCloud" section here:
> > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> >
> > About #3, i line with my answer to the previous question, Solr wouldn't
> > auto-add a Replica to meet the replication factor when a node goes down.
> >
> >
> > On Tue, Mar 24, 2015 at 12:36 AM, Shai Erera <se...@gmail.com> wrote:
> >
> > > Hi
> > >
> > > I saw that we can define maxShardsPerNode when creating a collection,
> > but I
> > > don't see that I can set something similar for replicas. My scenario is
> > the
> > > following:
> > >
> > >    - I setup one Solr node
> > >    - Create collection with numShards=1 and replicationFactor=2
> > >    - Hopefully, one replica is created on that node
> > >    - When I bring up the second Solr node, the second replica will be
> > >    created
> > >
> > > What I see is that both replicas are created on the first node, and
> when
> > I
> > > bring up the second Solr node, none of the replicas are moved.
> > >
> > > I know that I can "move" one replica by calling ADDREPLICA on node2,
> then
> > > DELETEREPLICA on node1, but I was wondering if there's an automated way
> > to
> > > do that.
> > >
> > > I've also considered creating the collection with replicationFactor=1
> and
> > > when the second node comes up it will look for shards w/ one replica
> > only,
> > > and assign themselves as the replica. But it means I have to own that
> > piece
> > > of logic, where if Solr already does that, that's better.
> > >
> > > Also, from what I understand, if I create a collection w/ rf=2 and
> there
> > > are two nodes, then each node is assigned a replica. If one of the
> nodes
> > > comes down, and a 3rd node comes up, it will be assigned a replica --
> is
> > > that correct?
> > >
> > > Another related question, if there are two replicas on node1 and node2,
> > and
> > > node2 goes down -- will node1 be assigned the second replica as well?
> > >
> > > If this is explained somewhere, I'd appreciate if you can give me a
> > > pointer.
> > >
> > > Shai
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> >
>



-- 
Anshum Gupta

Re: maxReplicasPerNode

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/24/2015 3:22 AM, Shai Erera wrote:
>>> If this is explained somewhere, I'd appreciate if you can give me a
>>> pointer.

I don't think it's explained anywhere, so that's a lack in the
documentation.

One problem with automatic replica addition in response to cluster
problems is that there is no mechanism (currently, at least) to indicate
that a node disappearance is intentional and temporary, and no way to
configure a minimum time interval before taking automatic action.  It
would be necessary to have these mechanisms before any kind of automatic
repair ability could be implemented.

Thanks,
Shawn


Re: maxReplicasPerNode

Posted by Shai Erera <se...@gmail.com>.
Thanks Anshum,

About #3, i line with my answer to the previous question, Solr wouldn't
> auto-add a Replica to meet the replication factor when a node goes down.
>

Just to make sure the answer applies to both these cases:

   1. There are two replicas on node1 and node2. Solr won't add a replica
   to node1 when node2 goes down.
   2. The collection was created with rf=2, Solr creates replicas on node1
   and node2. If node2 goes down and a node3 comes up instead, will it be
   assigned a replica, or Solr does not do that also?

In short, is there any scenario where Solr would auto-add replicas (aside
from running on HDFS) to meet the 'rf' setting, or after the collection has
been created, ensuring RF is met is my responsibility?

Shai

On Tue, Mar 24, 2015 at 10:02 AM, Anshum Gupta <an...@anshumgupta.net>
wrote:

> Hi Shai,
>
> As of now, all replicas for a collections are created to meet the specified
> replication factor at the time of collection creation. There's no way to
> defer that until more nodes are up. Your best bet is to have the nodes
> already up before you CREATE the collection or create the collection with a
> lower replication factor and then use ADDREPLICA.
>
> About auto-addition of replicas, that's kind of supported when using shared
> file system (HDFS) to host the index. It's doesn't truly work as per your
> use-case i.e. it doesn't consider the intended replication factor but only
> brings up a Replica in case all replicas for a node are down, so that
> SolrCloud continues to be usable. It also doesn't auto-remove replica when
> the old node comes back up. You can read more about this in the
> "Automatically Add Replicas in SolrCloud" section here:
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>
> About #3, i line with my answer to the previous question, Solr wouldn't
> auto-add a Replica to meet the replication factor when a node goes down.
>
>
> On Tue, Mar 24, 2015 at 12:36 AM, Shai Erera <se...@gmail.com> wrote:
>
> > Hi
> >
> > I saw that we can define maxShardsPerNode when creating a collection,
> but I
> > don't see that I can set something similar for replicas. My scenario is
> the
> > following:
> >
> >    - I setup one Solr node
> >    - Create collection with numShards=1 and replicationFactor=2
> >    - Hopefully, one replica is created on that node
> >    - When I bring up the second Solr node, the second replica will be
> >    created
> >
> > What I see is that both replicas are created on the first node, and when
> I
> > bring up the second Solr node, none of the replicas are moved.
> >
> > I know that I can "move" one replica by calling ADDREPLICA on node2, then
> > DELETEREPLICA on node1, but I was wondering if there's an automated way
> to
> > do that.
> >
> > I've also considered creating the collection with replicationFactor=1 and
> > when the second node comes up it will look for shards w/ one replica
> only,
> > and assign themselves as the replica. But it means I have to own that
> piece
> > of logic, where if Solr already does that, that's better.
> >
> > Also, from what I understand, if I create a collection w/ rf=2 and there
> > are two nodes, then each node is assigned a replica. If one of the nodes
> > comes down, and a 3rd node comes up, it will be assigned a replica -- is
> > that correct?
> >
> > Another related question, if there are two replicas on node1 and node2,
> and
> > node2 goes down -- will node1 be assigned the second replica as well?
> >
> > If this is explained somewhere, I'd appreciate if you can give me a
> > pointer.
> >
> > Shai
> >
>
>
>
> --
> Anshum Gupta
>

Re: maxReplicasPerNode

Posted by Anshum Gupta <an...@anshumgupta.net>.
Hi Shai,

As of now, all replicas for a collections are created to meet the specified
replication factor at the time of collection creation. There's no way to
defer that until more nodes are up. Your best bet is to have the nodes
already up before you CREATE the collection or create the collection with a
lower replication factor and then use ADDREPLICA.

About auto-addition of replicas, that's kind of supported when using shared
file system (HDFS) to host the index. It's doesn't truly work as per your
use-case i.e. it doesn't consider the intended replication factor but only
brings up a Replica in case all replicas for a node are down, so that
SolrCloud continues to be usable. It also doesn't auto-remove replica when
the old node comes back up. You can read more about this in the
"Automatically Add Replicas in SolrCloud" section here:
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

About #3, i line with my answer to the previous question, Solr wouldn't
auto-add a Replica to meet the replication factor when a node goes down.


On Tue, Mar 24, 2015 at 12:36 AM, Shai Erera <se...@gmail.com> wrote:

> Hi
>
> I saw that we can define maxShardsPerNode when creating a collection, but I
> don't see that I can set something similar for replicas. My scenario is the
> following:
>
>    - I setup one Solr node
>    - Create collection with numShards=1 and replicationFactor=2
>    - Hopefully, one replica is created on that node
>    - When I bring up the second Solr node, the second replica will be
>    created
>
> What I see is that both replicas are created on the first node, and when I
> bring up the second Solr node, none of the replicas are moved.
>
> I know that I can "move" one replica by calling ADDREPLICA on node2, then
> DELETEREPLICA on node1, but I was wondering if there's an automated way to
> do that.
>
> I've also considered creating the collection with replicationFactor=1 and
> when the second node comes up it will look for shards w/ one replica only,
> and assign themselves as the replica. But it means I have to own that piece
> of logic, where if Solr already does that, that's better.
>
> Also, from what I understand, if I create a collection w/ rf=2 and there
> are two nodes, then each node is assigned a replica. If one of the nodes
> comes down, and a 3rd node comes up, it will be assigned a replica -- is
> that correct?
>
> Another related question, if there are two replicas on node1 and node2, and
> node2 goes down -- will node1 be assigned the second replica as well?
>
> If this is explained somewhere, I'd appreciate if you can give me a
> pointer.
>
> Shai
>



-- 
Anshum Gupta