You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2016/02/24 13:53:51 UTC

Shard splitting and replica placement strategy

Hi

I wanted to try out the (relatively) new replica placement strategy and how
it plays with shard splitting. So I set up a 4-node cluster, created a
collection with 1 shard and 2 replicas (each created on a different) node.

When I issue a SPLITSHARD command (without any rules set on the
collection), the split finishes successfully and the state of the cluster
is:

n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
n2: s1_r2 (INACTIVE), s1_0_r2
n3: s1_1_r2
n4: empty

So far as expected, since the shard splitting occurred on n1, the two sub
shards were created there, and then Solr filled the missing replicas on
nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not
delete it (in the test).

Then I tried the same, curious if I set the right rule, one of the
sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced"
cluster. So I created the collection with the rule:
"shard:**,replica:<2,node:*", which per the ref guide says that I should
end with no more than one replica per shard on every node. Per my
understanding, I should end up with either 2 nodes each holding one replica
of each shard, 3 nodes holding a mixture of replicas or 4 nodes each holds
exactly one replica.

However, while observing the cluster status I noticed that the two created
sub-shards are marked as ACTIVE and leader, while the two others are marked
in DOWN. Turning on INFO logging I found this:

Caused by: java.lang.NullPointerException at
org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at
org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
at
org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
at
org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
at
org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
at
org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)

I also tried with the rule "replica:<2,node:*" which yielded the same NPE.
I run on 5.4.1 and I couldn't find if this is something that was fixed in
5.5.0/master already. So the question is -- is this a bug or did I
misconfigure the rule?

And as a side question, is there any rule which I can configure so that the
split shards are distributed evenly in the cluster? Or currently SPLITSHARD
will always result in the created shards existing on the origin node, and
it's my responsibility to move them elsewhere?

Shai

Re: Shard splitting and replica placement strategy

Posted by Shai Erera <se...@gmail.com>.
Opened https://issues.apache.org/jira/browse/SOLR-8728 with a test which
reproduces the exception.

On Wed, Feb 24, 2016 at 3:49 PM Shai Erera <se...@gmail.com> wrote:

> Thanks Noble, I'll try to reproduce in a test then. Does the rule I've set
> sound right to you though?
>
> On Wed, Feb 24, 2016, 15:19 Noble Paul <no...@gmail.com> wrote:
>
>> Whatever it is , there should be no NPE. could be a bug
>>
>> On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera <se...@gmail.com> wrote:
>> > Hi
>> >
>> > I wanted to try out the (relatively) new replica placement strategy and
>> how
>> > it plays with shard splitting. So I set up a 4-node cluster, created a
>> > collection with 1 shard and 2 replicas (each created on a different)
>> node.
>> >
>> > When I issue a SPLITSHARD command (without any rules set on the
>> collection),
>> > the split finishes successfully and the state of the cluster is:
>> >
>> > n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
>> > n2: s1_r2 (INACTIVE), s1_0_r2
>> > n3: s1_1_r2
>> > n4: empty
>> >
>> > So far as expected, since the shard splitting occurred on n1, the two
>> sub
>> > shards were created there, and then Solr filled the missing replicas on
>> > nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did
>> not
>> > delete it (in the test).
>> >
>> > Then I tried the same, curious if I set the right rule, one of the
>> > sub-shards' replicas will move to the 4th node, so I end up w/ a
>> "balanced"
>> > cluster. So I created the collection with the rule:
>> > "shard:**,replica:<2,node:*", which per the ref guide says that I
>> should end
>> > with no more than one replica per shard on every node. Per my
>> understanding,
>> > I should end up with either 2 nodes each holding one replica of each
>> shard,
>> > 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one
>> > replica.
>> >
>> > However, while observing the cluster status I noticed that the two
>> created
>> > sub-shards are marked as ACTIVE and leader, while the two others are
>> marked
>> > in DOWN. Turning on INFO logging I found this:
>> >
>> > Caused by: java.lang.NullPointerException at
>> >
>> org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
>> > at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130)
>> at
>> >
>> org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
>> > at
>> >
>> org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
>> > at
>> >
>> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
>> > at
>> >
>> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
>> > at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
>> > org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
>> >
>> org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
>> > at
>> >
>> org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)
>> >
>> > I also tried with the rule "replica:<2,node:*" which yielded the same
>> NPE. I
>> > run on 5.4.1 and I couldn't find if this is something that was fixed in
>> > 5.5.0/master already. So the question is -- is this a bug or did I
>> > misconfigure the rule?
>> >
>> > And as a side question, is there any rule which I can configure so that
>> the
>> > split shards are distributed evenly in the cluster? Or currently
>> SPLITSHARD
>> > will always result in the created shards existing on the origin node,
>> and
>> > it's my responsibility to move them elsewhere?
>> >
>> > Shai
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: Shard splitting and replica placement strategy

Posted by Shai Erera <se...@gmail.com>.
Thanks Noble, I'll try to reproduce in a test then. Does the rule I've set
sound right to you though?

On Wed, Feb 24, 2016, 15:19 Noble Paul <no...@gmail.com> wrote:

> Whatever it is , there should be no NPE. could be a bug
>
> On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera <se...@gmail.com> wrote:
> > Hi
> >
> > I wanted to try out the (relatively) new replica placement strategy and
> how
> > it plays with shard splitting. So I set up a 4-node cluster, created a
> > collection with 1 shard and 2 replicas (each created on a different)
> node.
> >
> > When I issue a SPLITSHARD command (without any rules set on the
> collection),
> > the split finishes successfully and the state of the cluster is:
> >
> > n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
> > n2: s1_r2 (INACTIVE), s1_0_r2
> > n3: s1_1_r2
> > n4: empty
> >
> > So far as expected, since the shard splitting occurred on n1, the two sub
> > shards were created there, and then Solr filled the missing replicas on
> > nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not
> > delete it (in the test).
> >
> > Then I tried the same, curious if I set the right rule, one of the
> > sub-shards' replicas will move to the 4th node, so I end up w/ a
> "balanced"
> > cluster. So I created the collection with the rule:
> > "shard:**,replica:<2,node:*", which per the ref guide says that I should
> end
> > with no more than one replica per shard on every node. Per my
> understanding,
> > I should end up with either 2 nodes each holding one replica of each
> shard,
> > 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one
> > replica.
> >
> > However, while observing the cluster status I noticed that the two
> created
> > sub-shards are marked as ACTIVE and leader, while the two others are
> marked
> > in DOWN. Turning on INFO logging I found this:
> >
> > Caused by: java.lang.NullPointerException at
> >
> org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
> > at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at
> >
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
> > at
> >
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
> > at
> >
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
> > at
> >
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
> > at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
> > org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
> >
> org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
> > at
> >
> org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)
> >
> > I also tried with the rule "replica:<2,node:*" which yielded the same
> NPE. I
> > run on 5.4.1 and I couldn't find if this is something that was fixed in
> > 5.5.0/master already. So the question is -- is this a bug or did I
> > misconfigure the rule?
> >
> > And as a side question, is there any rule which I can configure so that
> the
> > split shards are distributed evenly in the cluster? Or currently
> SPLITSHARD
> > will always result in the created shards existing on the origin node, and
> > it's my responsibility to move them elsewhere?
> >
> > Shai
>
>
>
> --
> -----------------------------------------------------
> Noble Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Shard splitting and replica placement strategy

Posted by Noble Paul <no...@gmail.com>.
Whatever it is , there should be no NPE. could be a bug

On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera <se...@gmail.com> wrote:
> Hi
>
> I wanted to try out the (relatively) new replica placement strategy and how
> it plays with shard splitting. So I set up a 4-node cluster, created a
> collection with 1 shard and 2 replicas (each created on a different) node.
>
> When I issue a SPLITSHARD command (without any rules set on the collection),
> the split finishes successfully and the state of the cluster is:
>
> n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
> n2: s1_r2 (INACTIVE), s1_0_r2
> n3: s1_1_r2
> n4: empty
>
> So far as expected, since the shard splitting occurred on n1, the two sub
> shards were created there, and then Solr filled the missing replicas on
> nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not
> delete it (in the test).
>
> Then I tried the same, curious if I set the right rule, one of the
> sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced"
> cluster. So I created the collection with the rule:
> "shard:**,replica:<2,node:*", which per the ref guide says that I should end
> with no more than one replica per shard on every node. Per my understanding,
> I should end up with either 2 nodes each holding one replica of each shard,
> 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one
> replica.
>
> However, while observing the cluster status I noticed that the two created
> sub-shards are marked as ACTIVE and leader, while the two others are marked
> in DOWN. Turning on INFO logging I found this:
>
> Caused by: java.lang.NullPointerException at
> org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
> at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
> at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
> org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
> at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)
>
> I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I
> run on 5.4.1 and I couldn't find if this is something that was fixed in
> 5.5.0/master already. So the question is -- is this a bug or did I
> misconfigure the rule?
>
> And as a side question, is there any rule which I can configure so that the
> split shards are distributed evenly in the cluster? Or currently SPLITSHARD
> will always result in the created shards existing on the origin node, and
> it's my responsibility to move them elsewhere?
>
> Shai



-- 
-----------------------------------------------------
Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org