You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Yonik Seeley (Jira)" <ji...@apache.org> on 2019/11/04 16:35:00 UTC

[jira] [Comment Edited] (SOLR-13884) Concurrent collection creation leads to unbalanced cluster

    [ https://issues.apache.org/jira/browse/SOLR-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966805#comment-16966805 ] 

Yonik Seeley edited comment on SOLR-13884 at 11/4/19 4:34 PM:
--------------------------------------------------------------

OK, I updated the test to reproduce another serious bug with replica placement and concurrent collection creation.
When collection-level policies are used, and the cluster is currently unbalanced, it's relatively easy to get into a situation where multiple replicas are assigned to the exact same node.  In the wild, I've actually seen all 5 replicas of a single shard be assigned to the same node, and I've been able to reproduce that with my test case.

The test case is currently set up to reproduce the simplest case I could manage. We start off with just 2 nodes, create a single replica on one node, then do 2 collection create commands concurrently (each with 1 shard and replicationFactor=2).  Pretty much 100% of the time, 1 shard will end up with both replicas on the same node.  This does not happen if the creations are done serially.  It also doesn't happen if there is an identical cluster-level policy specified.


was (Author: yseeley@gmail.com):
OK, I updated the test to reproduce another serious bug with replica placement and concurrent collection creation.
When collection-level policies are used, and the cluster is currently unbalanced, it's relatively easy to get into a situation where multiple replicas are assigned to the exact same node.  In the wild, I've actually seen all 5 replicas of a single shard be assigned to the same node, and I've been able to reproduce that with my test case.

The test case is currently set up to reproduce the simplest case I could manage. We start off with just 2 nodes, create a single replica on one node, then do 2 collection create commands concurrently (each with 1 shard and replicationFactor=2).  Pretty much 100% of the time, 1 shard will end up with both replicas on the same node.  This does not happen if the creations are done serially.

> Concurrent collection creation leads to unbalanced cluster
> ----------------------------------------------------------
>
>                 Key: SOLR-13884
>                 URL: https://issues.apache.org/jira/browse/SOLR-13884
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Yonik Seeley
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When multiple collection creations are done concurrently, the cluster can end up very unbalanced, with many (or most) replicas going to a small set of nodes.
> This was observed on both 8.2 and master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org