You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov> on 2018/06/15 17:08:05 UTC

sharding and placement of replicas

If I start with a collection X on two nodes with one shard and two replicas (for redundancy, in case a node goes down): a node on host1 has X_shard1_replica1 and a node on host2 has X_shard1_replica2: when I try SPLITSHARD, I generally get X_shard1_0_replica1, X_shard1_1_replica1 and X_shard1_0_replica0 all on the node on host1 with X_shard1_1_replica0 sitting alone on the node on host2. If host1 were to go down at this point, shard1_0 would be unavailable.

I realize I do have the option to ADDREPLICA creating X_shard1_0_replica2 on the node on host2 and then to DELETEREPLICA for X_shard1_0_replica0: but I don't see the logic behind requiring this extra step. Of the half dozen times I have experimented with SPLITSHARD (starting with one shard and two replicas on separate nodes), it always puts three-out-of-four of the new cores on the same node.

Is there a way either of specifying placement or of giving hints that replicas ought to be separated?

I am currently running Solr6.6.0, if that is relevant.

RE: sharding and placement of replicas

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov>.

I am still wondering whether anyone has ever seen any examples of this actually working (has anyone ever seen any example of SPLITSHARD on a two-node SolrCloud placing replicas of the each shard on different hosts than other replicas of the same shards)?

Anyone?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov> 
Sent: Friday, August 10, 2018 12:54 PM
To: solr-user@lucene.apache.org
Subject: RE: sharding and placement of replicas

Note that I usually create collections with commands which contain (for example)

solr/admin/collections?action=CREATE&name=collectest&collection.configName=collectest&numShards=1&replicationFactor=1&createNodeSet=

I give one node in the createNodeSet and then ADDREPLICA to the other node.

In case this were related, I now tried it a different way, using a command which contains

solr/admin/collections?action=CREATE&name=collectest5&collection.configName=collectest&numShards=1&replicationFactor=2&createNodeSet=

I gave both nodes in the createNodeSet in this case. It created one replica on each node (each node being on a different host at the same port). This is what I would consider the expected behavior (refraining from putting two replicas of the same one shard on the same node)

After this I ran a command including

solr/admin/collections?action=SPLITSHARD&collection=collectest5&shard=shard1&indent=on&async=test20180810h

The result was still the same: one of the four new shards was on one node and the other three were all together on the node from which I issued this command (including putting two replicas of the same shard on the same node).

I am wondering whether there are any examples of this actually working (any examples of SPLITSHARD occasionally placing replicas of the each shard on different hosts than other replicas of the same shards)

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] [mailto:craig.oakley@nih.gov] 
Sent: Thursday, August 09, 2018 5:08 PM
To: solr-user@lucene.apache.org
Subject: RE: sharding and placement of replicas

Okay, I've tried again with two nodes running Solr7.4 on different hosts.

Before SPLITSHARD, collectest2_shard1_replica_n1 was on the host nosqltest22, and collectest2_shard1_replica_n3 was on the host nosqltest11

After running SPLITSHARD (on the nosqltest22 node), only collectest2_shard1_0_replica0 was added to nosqltest11; nosqltest22 became the location for collectest2_shard1_0_replica_n5 and collectest2_shard1_1_replica_n6 and collectest2_shard1_1_replica0 (and so if nosqltest22 were to be down, shard1_1 would not be available).

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, July 31, 2018 5:16 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: sharding and placement of replicas

Right, two JVMs on the same physical host with different ports are
"different Solrs" by default. If you had two replicas per shard and
both were on either Solr instance (same port) that would be
unexpected.

Problem is that this would have been a bug clear back in the Solr 4x
days so the fact that you say you saw it on 6.6 would be unexpected.

Of course if you have three replicas and two instances, I'd absolutely
expect that two replicas would be on one of them for each shard.

Best,
Erick

On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov> wrote:
> In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 comment "If this is a provable and reproducible bug, and it's still a problem in the current stable branch"), I had only installed Solr7.4 on one host, and so I was testing with two nodes on the same host (different port numbers). I had previously had the same symptom when the two nodes were on different hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and report back.
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Tuesday, July 31, 2018 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: sharding and placement of replicas
>
> On 7/27/2018 8:26 PM, Erick Erickson wrote:
>> Yes with some fiddling as far as "placement rules", start here:
>> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>>
>> The idea (IIUC) is that you provide a snitch" that identifies what
>> "rack" the Solr instance is on and can define placement rules that
>> define "don't put more than one thingy on the same rack". "Thingy"
>> here is replica, shard, whatever as defined by other placement rules.
>
> I'd like to see an improvement in Solr's behavior when nothing has been
> configured in auto-scaling or rule-based replica placement.  Configuring
> those things is certainly an option, but I think we can do better even
> without that config.
>
> I believe that Solr already has some default intelligence that keeps
> multiple replicas from ending up on the same *node* when possible ... I
> would like this to also be aware of *hosts*.
>
> Craig hasn't yet indicated whether there is more than one node per host,
> so I don't know whether the behavior he's seeing should be considered a bug.
>
> If somebody gives one machine multiple names/addresses and uses
> different hostnames in their SolrCloud config for one actual host, then
> it wouldn't be able to do any better than it does now, but if there are
> matches in the hostname part of different entries in live_nodes, then I
> think the improvement might be relatively easy.  Not saying that I know
> what to do, but somebody who is familiar with the Collections API code
> can probably do it.
>
> Thanks,
> Shawn
>

RE: sharding and placement of replicas

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov>.

Note that I usually create collections with commands which contain (for example)

solr/admin/collections?action=CREATE&name=collectest&collection.configName=collectest&numShards=1&replicationFactor=1&createNodeSet=

I give one node in the createNodeSet and then ADDREPLICA to the other node.

In case this were related, I now tried it a different way, using a command which contains

solr/admin/collections?action=CREATE&name=collectest5&collection.configName=collectest&numShards=1&replicationFactor=2&createNodeSet=

I gave both nodes in the createNodeSet in this case. It created one replica on each node (each node being on a different host at the same port). This is what I would consider the expected behavior (refraining from putting two replicas of the same one shard on the same node)

After this I ran a command including

solr/admin/collections?action=SPLITSHARD&collection=collectest5&shard=shard1&indent=on&async=test20180810h

The result was still the same: one of the four new shards was on one node and the other three were all together on the node from which I issued this command (including putting two replicas of the same shard on the same node).





I am wondering whether there are any examples of this actually working (any examples of SPLITSHARD occasionally placing replicas of the each shard on different hosts than other replicas of the same shards)


-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] [mailto:craig.oakley@nih.gov] 
Sent: Thursday, August 09, 2018 5:08 PM
To: solr-user@lucene.apache.org
Subject: RE: sharding and placement of replicas

Okay, I've tried again with two nodes running Solr7.4 on different hosts.

Before SPLITSHARD, collectest2_shard1_replica_n1 was on the host nosqltest22, and collectest2_shard1_replica_n3 was on the host nosqltest11

After running SPLITSHARD (on the nosqltest22 node), only collectest2_shard1_0_replica0 was added to nosqltest11; nosqltest22 became the location for collectest2_shard1_0_replica_n5 and collectest2_shard1_1_replica_n6 and collectest2_shard1_1_replica0 (and so if nosqltest22 were to be down, shard1_1 would not be available).


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, July 31, 2018 5:16 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: sharding and placement of replicas

Right, two JVMs on the same physical host with different ports are
"different Solrs" by default. If you had two replicas per shard and
both were on either Solr instance (same port) that would be
unexpected.

Problem is that this would have been a bug clear back in the Solr 4x
days so the fact that you say you saw it on 6.6 would be unexpected.

Of course if you have three replicas and two instances, I'd absolutely
expect that two replicas would be on one of them for each shard.

Best,
Erick

On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov> wrote:
> In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 comment "If this is a provable and reproducible bug, and it's still a problem in the current stable branch"), I had only installed Solr7.4 on one host, and so I was testing with two nodes on the same host (different port numbers). I had previously had the same symptom when the two nodes were on different hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and report back.
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Tuesday, July 31, 2018 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: sharding and placement of replicas
>
> On 7/27/2018 8:26 PM, Erick Erickson wrote:
>> Yes with some fiddling as far as "placement rules", start here:
>> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>>
>> The idea (IIUC) is that you provide a snitch" that identifies what
>> "rack" the Solr instance is on and can define placement rules that
>> define "don't put more than one thingy on the same rack". "Thingy"
>> here is replica, shard, whatever as defined by other placement rules.
>
> I'd like to see an improvement in Solr's behavior when nothing has been
> configured in auto-scaling or rule-based replica placement.  Configuring
> those things is certainly an option, but I think we can do better even
> without that config.
>
> I believe that Solr already has some default intelligence that keeps
> multiple replicas from ending up on the same *node* when possible ... I
> would like this to also be aware of *hosts*.
>
> Craig hasn't yet indicated whether there is more than one node per host,
> so I don't know whether the behavior he's seeing should be considered a bug.
>
> If somebody gives one machine multiple names/addresses and uses
> different hostnames in their SolrCloud config for one actual host, then
> it wouldn't be able to do any better than it does now, but if there are
> matches in the hostname part of different entries in live_nodes, then I
> think the improvement might be relatively easy.  Not saying that I know
> what to do, but somebody who is familiar with the Collections API code
> can probably do it.
>
> Thanks,
> Shawn
>

RE: sharding and placement of replicas

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov>.

Okay, I've tried again with two nodes running Solr7.4 on different hosts.

Before SPLITSHARD, collectest2_shard1_replica_n1 was on the host nosqltest22, and collectest2_shard1_replica_n3 was on the host nosqltest11

After running SPLITSHARD (on the nosqltest22 node), only collectest2_shard1_0_replica0 was added to nosqltest11; nosqltest22 became the location for collectest2_shard1_0_replica_n5 and collectest2_shard1_1_replica_n6 and collectest2_shard1_1_replica0 (and so if nosqltest22 were to be down, shard1_1 would not be available).


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, July 31, 2018 5:16 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: sharding and placement of replicas

Right, two JVMs on the same physical host with different ports are
"different Solrs" by default. If you had two replicas per shard and
both were on either Solr instance (same port) that would be
unexpected.

Problem is that this would have been a bug clear back in the Solr 4x
days so the fact that you say you saw it on 6.6 would be unexpected.

Of course if you have three replicas and two instances, I'd absolutely
expect that two replicas would be on one of them for each shard.

Best,
Erick

On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov> wrote:
> In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 comment "If this is a provable and reproducible bug, and it's still a problem in the current stable branch"), I had only installed Solr7.4 on one host, and so I was testing with two nodes on the same host (different port numbers). I had previously had the same symptom when the two nodes were on different hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and report back.
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Tuesday, July 31, 2018 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: sharding and placement of replicas
>
> On 7/27/2018 8:26 PM, Erick Erickson wrote:
>> Yes with some fiddling as far as "placement rules", start here:
>> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>>
>> The idea (IIUC) is that you provide a snitch" that identifies what
>> "rack" the Solr instance is on and can define placement rules that
>> define "don't put more than one thingy on the same rack". "Thingy"
>> here is replica, shard, whatever as defined by other placement rules.
>
> I'd like to see an improvement in Solr's behavior when nothing has been
> configured in auto-scaling or rule-based replica placement.  Configuring
> those things is certainly an option, but I think we can do better even
> without that config.
>
> I believe that Solr already has some default intelligence that keeps
> multiple replicas from ending up on the same *node* when possible ... I
> would like this to also be aware of *hosts*.
>
> Craig hasn't yet indicated whether there is more than one node per host,
> so I don't know whether the behavior he's seeing should be considered a bug.
>
> If somebody gives one machine multiple names/addresses and uses
> different hostnames in their SolrCloud config for one actual host, then
> it wouldn't be able to do any better than it does now, but if there are
> matches in the hostname part of different entries in live_nodes, then I
> think the improvement might be relatively easy.  Not saying that I know
> what to do, but somebody who is familiar with the Collections API code
> can probably do it.
>
> Thanks,
> Shawn
>

Re: sharding and placement of replicas

Posted by Erick Erickson <er...@gmail.com>.

Right, two JVMs on the same physical host with different ports are
"different Solrs" by default. If you had two replicas per shard and
both were on either Solr instance (same port) that would be
unexpected.

Problem is that this would have been a bug clear back in the Solr 4x
days so the fact that you say you saw it on 6.6 would be unexpected.

Of course if you have three replicas and two instances, I'd absolutely
expect that two replicas would be on one of them for each shard.

Best,
Erick

On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov> wrote:
> In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 comment "If this is a provable and reproducible bug, and it's still a problem in the current stable branch"), I had only installed Solr7.4 on one host, and so I was testing with two nodes on the same host (different port numbers). I had previously had the same symptom when the two nodes were on different hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and report back.
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Tuesday, July 31, 2018 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: sharding and placement of replicas
>
> On 7/27/2018 8:26 PM, Erick Erickson wrote:
>> Yes with some fiddling as far as "placement rules", start here:
>> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>>
>> The idea (IIUC) is that you provide a snitch" that identifies what
>> "rack" the Solr instance is on and can define placement rules that
>> define "don't put more than one thingy on the same rack". "Thingy"
>> here is replica, shard, whatever as defined by other placement rules.
>
> I'd like to see an improvement in Solr's behavior when nothing has been
> configured in auto-scaling or rule-based replica placement.  Configuring
> those things is certainly an option, but I think we can do better even
> without that config.
>
> I believe that Solr already has some default intelligence that keeps
> multiple replicas from ending up on the same *node* when possible ... I
> would like this to also be aware of *hosts*.
>
> Craig hasn't yet indicated whether there is more than one node per host,
> so I don't know whether the behavior he's seeing should be considered a bug.
>
> If somebody gives one machine multiple names/addresses and uses
> different hostnames in their SolrCloud config for one actual host, then
> it wouldn't be able to do any better than it does now, but if there are
> matches in the hostname part of different entries in live_nodes, then I
> think the improvement might be relatively easy.  Not saying that I know
> what to do, but somebody who is familiar with the Collections API code
> can probably do it.
>
> Thanks,
> Shawn
>

RE: sharding and placement of replicas

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov>.

In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 comment "If this is a provable and reproducible bug, and it's still a problem in the current stable branch"), I had only installed Solr7.4 on one host, and so I was testing with two nodes on the same host (different port numbers). I had previously had the same symptom when the two nodes were on different hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and report back.

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Tuesday, July 31, 2018 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: sharding and placement of replicas

On 7/27/2018 8:26 PM, Erick Erickson wrote:
> Yes with some fiddling as far as "placement rules", start here:
> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>
> The idea (IIUC) is that you provide a snitch" that identifies what
> "rack" the Solr instance is on and can define placement rules that
> define "don't put more than one thingy on the same rack". "Thingy"
> here is replica, shard, whatever as defined by other placement rules.

I'd like to see an improvement in Solr's behavior when nothing has been
configured in auto-scaling or rule-based replica placement.  Configuring
those things is certainly an option, but I think we can do better even
without that config.

I believe that Solr already has some default intelligence that keeps
multiple replicas from ending up on the same *node* when possible ... I
would like this to also be aware of *hosts*.

Craig hasn't yet indicated whether there is more than one node per host,
so I don't know whether the behavior he's seeing should be considered a bug.

If somebody gives one machine multiple names/addresses and uses
different hostnames in their SolrCloud config for one actual host, then
it wouldn't be able to do any better than it does now, but if there are
matches in the hostname part of different entries in live_nodes, then I
think the improvement might be relatively easy.  Not saying that I know
what to do, but somebody who is familiar with the Collections API code
can probably do it.

Thanks,
Shawn

Re: sharding and placement of replicas

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/27/2018 8:26 PM, Erick Erickson wrote:
> Yes with some fiddling as far as "placement rules", start here:
> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>
> The idea (IIUC) is that you provide a snitch" that identifies what
> "rack" the Solr instance is on and can define placement rules that
> define "don't put more than one thingy on the same rack". "Thingy"
> here is replica, shard, whatever as defined by other placement rules.

I'd like to see an improvement in Solr's behavior when nothing has been
configured in auto-scaling or rule-based replica placement.  Configuring
those things is certainly an option, but I think we can do better even
without that config.

I believe that Solr already has some default intelligence that keeps
multiple replicas from ending up on the same *node* when possible ... I
would like this to also be aware of *hosts*.

Craig hasn't yet indicated whether there is more than one node per host,
so I don't know whether the behavior he's seeing should be considered a bug.

If somebody gives one machine multiple names/addresses and uses
different hostnames in their SolrCloud config for one actual host, then
it wouldn't be able to do any better than it does now, but if there are
matches in the hostname part of different entries in live_nodes, then I
think the improvement might be relatively easy.  Not saying that I know
what to do, but somebody who is familiar with the Collections API code
can probably do it.

Thanks,
Shawn

Re: sharding and placement of replicas

Posted by Erick Erickson <er...@gmail.com>.

bq. Could SolrCloud avoid putting multiple replicas of the same shard
on the same host when there are multiple nodes per host?

Yes with some fiddling as far as "placement rules", start here:
https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html

The idea (IIUC) is that you provide a snitch" that identifies what
"rack" the Solr instance is on and can define placement rules that
define "don't put more than one thingy on the same rack". "Thingy"
here is replica, shard, whatever as defined by other placement rules.

NOTE: pay particular attention to which version of Solr you're using
as I think this is changing pretty rapidly as part of the autoscaling
(7x) work.


Best,
Erick



On Fri, Jul 27, 2018 at 7:34 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 7/25/2018 3:49 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
>> I end up with four cores instead of two, as expected. The problem is that three of the four cores (col_shard1_0_replica_n5, col_shard1_0_replica0 and col_shard1_1_replica_n6) are *all on hostname1*. Only col_shard1_1_replica0 was placed on hostname2.
> <snip>
>> My question is: How can I tell Solr "avoid putting two replicas of the same shard on the same node"?
>
> Somehow I missed that there were three cores on host1 when you first
> described the problem.  Looking back, I see that you did have that
> information there.  I was more focused on the fact that host2 only had
> one core.  My apologies for not reading closely enough.
>
> Is this collection using compositeId or implicit?  I think it would have
> to be compositeId for a split to work correctly.  I wouldn't expect
> split to be supported on a collection with the implicit router.
>
> Are you running one Solr node per host?  If you have multiple Solr nodes
> (instances) on one host, Solr will have no idea that this is the case --
> the entire node identifier (including host name, port, and context path)
> is compared to distinguish nodes from each other.  The assumption in
> SolrCloud's internals is that each node is completely separate from
> every other node.  Running multiple nodes per host is only recommended
> when the heap requirements are *very* high, and in that situation,
> making sure that replicas are distributed properly will require extra
> effort.  For most installations, it is strongly recommended to only have
> one Solr node per physical host.
>
> If you are only running one Solr node per host, then the way it's
> behaving for you is certainly not the design intent, and sounds like a
> bug in SPLITSHARD.  Solr should try very hard to not place multiple
> replicas of one shard on the same *node*.
>
> A side question for devs that know about SolrCloud internals:  Could
> SolrCloud avoid putting multiple replicas of the same shard on the same
> host when there are multiple nodes per host?  It seems to me that it
> would not be supremely difficult to have SolrCloud detect a match in the
> host name and use that information to prefer nodes on different hosts
> when possible.  I am thinking about creating an issue for this enhancement.
>
> Thanks,
> Shawn
>

Re: sharding and placement of replicas

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/25/2018 3:49 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> I end up with four cores instead of two, as expected. The problem is that three of the four cores (col_shard1_0_replica_n5, col_shard1_0_replica0 and col_shard1_1_replica_n6) are *all on hostname1*. Only col_shard1_1_replica0 was placed on hostname2.
<snip>
> My question is: How can I tell Solr "avoid putting two replicas of the same shard on the same node"?

Somehow I missed that there were three cores on host1 when you first
described the problem.  Looking back, I see that you did have that
information there.  I was more focused on the fact that host2 only had
one core.  My apologies for not reading closely enough.

Is this collection using compositeId or implicit?  I think it would have
to be compositeId for a split to work correctly.  I wouldn't expect
split to be supported on a collection with the implicit router.

Are you running one Solr node per host?  If you have multiple Solr nodes
(instances) on one host, Solr will have no idea that this is the case --
the entire node identifier (including host name, port, and context path)
is compared to distinguish nodes from each other.  The assumption in
SolrCloud's internals is that each node is completely separate from
every other node.  Running multiple nodes per host is only recommended
when the heap requirements are *very* high, and in that situation,
making sure that replicas are distributed properly will require extra
effort.  For most installations, it is strongly recommended to only have
one Solr node per physical host.

If you are only running one Solr node per host, then the way it's
behaving for you is certainly not the design intent, and sounds like a
bug in SPLITSHARD.  Solr should try very hard to not place multiple
replicas of one shard on the same *node*.

A side question for devs that know about SolrCloud internals:  Could
SolrCloud avoid putting multiple replicas of the same shard on the same
host when there are multiple nodes per host?  It seems to me that it
would not be supremely difficult to have SolrCloud detect a match in the
host name and use that information to prefer nodes on different hosts
when possible.  I am thinking about creating an issue for this enhancement.

Thanks,
Shawn

RE: sharding and placement of replicas

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov>.

I just now tried it with Solr7.4 and am getting the same symptoms as I describe below.

The symptoms I describe are quite different from my impression of Shawn Heisey's impression of my symptoms, so I will describe my symptoms again.

Let us assume that we start with a SolrCloud of two nodes: one at hostname1:9999 and the other at hostname2:9999

Let us assume that we have a one-shard collection with two replicas. One of the replicas is on the node at hostname1:9999 (with the core col_shard1_replica_n1) and the other on the node at hostname2:9999 (with the core col_shard1_replica_n3)

Then I run SPLITSHARD

I end up with four cores instead of two, as expected. The problem is that three of the four cores (col_shard1_0_replica_n5, col_shard1_0_replica0 and col_shard1_1_replica_n6) are *all on hostname1*. Only col_shard1_1_replica0 was placed on hostname2.

Prior to the SPLITSHARD, if hostname1 becomes temporarily unavailable, the SolrCloud can still be used: hostname2 has all the data.

After the SPLITSHARD, if hostname1 becomes temporarily unavailable, the SolrCloud does not have any access to the data in shard1_0

Granted, I could add a replica of shard1_0 onto hostname2, and I could then drop one of the extraneous shard1_0 replicas which are on hostname1: but I don't see the logic in requiring such additional steps every time.

My question is: How can I tell Solr "avoid putting two replicas of the same shard on the same node"?

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Tuesday, June 19, 2018 2:20 PM
To: solr-user@lucene.apache.org
Subject: Re: sharding and placement of replicas

On 6/15/2018 11:08 AM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> If I start with a collection X on two nodes with one shard and two replicas (for redundancy, in case a node goes down): a node on host1 has X_shard1_replica1 and a node on host2 has X_shard1_replica2: when I try SPLITSHARD, I generally get X_shard1_0_replica1, X_shard1_1_replica1 and X_shard1_0_replica0 all on the node on host1 with X_shard1_1_replica0 sitting alone on the node on host2. If host1 were to go down at this point, shard1_0 would be unavailable.

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-splitshard

That documentation says "The new shards will have as many replicas as
the original shard."  That tells me that what you're seeing is not
matching the *intent* of the SPLITSHARD feature.  The fact that you get
*one* of the new shards but not the other is suspicious.  I'm wondering
if maybe Solr tried to create it but had a problem doing so.  Can you
check for errors in the solr logfile on host2?

If there's nothing about your environment that would cause a failure to
create the replica, then it might be a bug.

> Is there a way either of specifying placement or of giving hints that replicas ought to be separated?

It shouldn't be necessary to give Solr any parameters for that.  All
nodes where the shard exists should get copies of the new shards when
you split it.

> I am currently running Solr6.6.0, if that is relevant.

If this is a provable and reproducible bug, and it's still a problem in
the current stable branch (next release from that will be 7.4.0), then
it will definitely be fixed.  If it's only a problem in 6.x, then I
can't guarantee that it will be fixed.  That's because the 6.x line is
in maintenance mode, which means that there's a very high bar for
changes.  In most cases, only changes that meet one of these criteria
are made in maintenance mode:

 * Fixes a security bug.
 * Fixes a MAJOR bug with no workaround.
 * Fix is a very trivial code change and not likely to introduce new bugs.

Of those criteria, generally only the first two are likely to prompt an
actual new software release.  If enough changes of the third type
accumulate, that might prompt a new release.

My personal opinion:  If this is a general problem in 6.x, it should be
fixed there.  Because there is a workaround, it would not be cause for
an immediate new release.

Thanks,
Shawn

Re: sharding and placement of replicas

Posted by Shawn Heisey <ap...@elyograg.org>.

On 6/15/2018 11:08 AM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> If I start with a collection X on two nodes with one shard and two replicas (for redundancy, in case a node goes down): a node on host1 has X_shard1_replica1 and a node on host2 has X_shard1_replica2: when I try SPLITSHARD, I generally get X_shard1_0_replica1, X_shard1_1_replica1 and X_shard1_0_replica0 all on the node on host1 with X_shard1_1_replica0 sitting alone on the node on host2. If host1 were to go down at this point, shard1_0 would be unavailable.

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-splitshard

That documentation says "The new shards will have as many replicas as
the original shard."  That tells me that what you're seeing is not
matching the *intent* of the SPLITSHARD feature.  The fact that you get
*one* of the new shards but not the other is suspicious.  I'm wondering
if maybe Solr tried to create it but had a problem doing so.  Can you
check for errors in the solr logfile on host2?

If there's nothing about your environment that would cause a failure to
create the replica, then it might be a bug.

> Is there a way either of specifying placement or of giving hints that replicas ought to be separated?

It shouldn't be necessary to give Solr any parameters for that.  All
nodes where the shard exists should get copies of the new shards when
you split it.

> I am currently running Solr6.6.0, if that is relevant.

If this is a provable and reproducible bug, and it's still a problem in
the current stable branch (next release from that will be 7.4.0), then
it will definitely be fixed.  If it's only a problem in 6.x, then I
can't guarantee that it will be fixed.  That's because the 6.x line is
in maintenance mode, which means that there's a very high bar for
changes.  In most cases, only changes that meet one of these criteria
are made in maintenance mode:

 * Fixes a security bug.
 * Fixes a MAJOR bug with no workaround.
 * Fix is a very trivial code change and not likely to introduce new bugs.

Of those criteria, generally only the first two are likely to prompt an
actual new software release.  If enough changes of the third type
accumulate, that might prompt a new release.

My personal opinion:  If this is a general problem in 6.x, it should be
fixed there.  Because there is a workaround, it would not be cause for
an immediate new release.

Thanks,
Shawn