You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov> on 2018/09/19 20:52:27 UTC

RE: sharding and placement of replicas

I am still wondering whether anyone has ever seen any examples of this actually working (has anyone ever seen any example of SPLITSHARD on a two-node SolrCloud placing replicas of the each shard on different hosts than other replicas of the same shards)?


Anyone?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov> 
Sent: Friday, August 10, 2018 12:54 PM
To: solr-user@lucene.apache.org
Subject: RE: sharding and placement of replicas

Note that I usually create collections with commands which contain (for example)

solr/admin/collections?action=CREATE&name=collectest&collection.configName=collectest&numShards=1&replicationFactor=1&createNodeSet=

I give one node in the createNodeSet and then ADDREPLICA to the other node.

In case this were related, I now tried it a different way, using a command which contains

solr/admin/collections?action=CREATE&name=collectest5&collection.configName=collectest&numShards=1&replicationFactor=2&createNodeSet=

I gave both nodes in the createNodeSet in this case. It created one replica on each node (each node being on a different host at the same port). This is what I would consider the expected behavior (refraining from putting two replicas of the same one shard on the same node)

After this I ran a command including

solr/admin/collections?action=SPLITSHARD&collection=collectest5&shard=shard1&indent=on&async=test20180810h

The result was still the same: one of the four new shards was on one node and the other three were all together on the node from which I issued this command (including putting two replicas of the same shard on the same node).





I am wondering whether there are any examples of this actually working (any examples of SPLITSHARD occasionally placing replicas of the each shard on different hosts than other replicas of the same shards)


-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] [mailto:craig.oakley@nih.gov] 
Sent: Thursday, August 09, 2018 5:08 PM
To: solr-user@lucene.apache.org
Subject: RE: sharding and placement of replicas

Okay, I've tried again with two nodes running Solr7.4 on different hosts.

Before SPLITSHARD, collectest2_shard1_replica_n1 was on the host nosqltest22, and collectest2_shard1_replica_n3 was on the host nosqltest11

After running SPLITSHARD (on the nosqltest22 node), only collectest2_shard1_0_replica0 was added to nosqltest11; nosqltest22 became the location for collectest2_shard1_0_replica_n5 and collectest2_shard1_1_replica_n6 and collectest2_shard1_1_replica0 (and so if nosqltest22 were to be down, shard1_1 would not be available).


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, July 31, 2018 5:16 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: sharding and placement of replicas

Right, two JVMs on the same physical host with different ports are
"different Solrs" by default. If you had two replicas per shard and
both were on either Solr instance (same port) that would be
unexpected.

Problem is that this would have been a bug clear back in the Solr 4x
days so the fact that you say you saw it on 6.6 would be unexpected.

Of course if you have three replicas and two instances, I'd absolutely
expect that two replicas would be on one of them for each shard.

Best,
Erick

On Tue, Jul 31, 2018 at 12:24 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
<cr...@nih.gov> wrote:
> In my case, when trying on Solr7.4 (in response to Shawn Heisey's 6/19/18 comment "If this is a provable and reproducible bug, and it's still a problem in the current stable branch"), I had only installed Solr7.4 on one host, and so I was testing with two nodes on the same host (different port numbers). I had previously had the same symptom when the two nodes were on different hosts, but that was with Solr6.6 -- I can try it again with Solr7.4 with two hosts and report back.
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Tuesday, July 31, 2018 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: sharding and placement of replicas
>
> On 7/27/2018 8:26 PM, Erick Erickson wrote:
>> Yes with some fiddling as far as "placement rules", start here:
>> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>>
>> The idea (IIUC) is that you provide a snitch" that identifies what
>> "rack" the Solr instance is on and can define placement rules that
>> define "don't put more than one thingy on the same rack". "Thingy"
>> here is replica, shard, whatever as defined by other placement rules.
>
> I'd like to see an improvement in Solr's behavior when nothing has been
> configured in auto-scaling or rule-based replica placement.  Configuring
> those things is certainly an option, but I think we can do better even
> without that config.
>
> I believe that Solr already has some default intelligence that keeps
> multiple replicas from ending up on the same *node* when possible ... I
> would like this to also be aware of *hosts*.
>
> Craig hasn't yet indicated whether there is more than one node per host,
> so I don't know whether the behavior he's seeing should be considered a bug.
>
> If somebody gives one machine multiple names/addresses and uses
> different hostnames in their SolrCloud config for one actual host, then
> it wouldn't be able to do any better than it does now, but if there are
> matches in the hostname part of different entries in live_nodes, then I
> think the improvement might be relatively easy.  Not saying that I know
> what to do, but somebody who is familiar with the Collections API code
> can probably do it.
>
> Thanks,
> Shawn
>