You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ilan Ginzburg (Jira)" <ji...@apache.org> on 2020/05/29 06:28:00 UTC
[jira] [Comment Edited] (SOLR-14347) Autoscaling placement wrong when concurrent replica placements are calculated

    [ https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119116#comment-17119116 ] 

Ilan Ginzburg edited comment on SOLR-14347 at 5/29/20, 6:27 AM:
----------------------------------------------------------------

[~ab] I’ve created PR [https://github.com/apache/lucene-solr/pull/1542|http://example.com/] that I believe solves the issue identified above.

Fix has two parts:
 # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post placement computation) Session is returned with the {{SessionWrapper}} so that the next use sees the assignments of the computation rather than the initial state.
 # A less obvious one: in the same method, the way the current (orig) session is copied to create the new Session is modified to not validate collections in Zookeeper. This validation removed from the new session everything that didn’t make it to Zookeeper, therefore not showing assignments in progress. {{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried to keep it as close to the original behavior as possible).

A multithreaded collection creation test (JMeter with 40 threads looping through creating single shard single replica collections) led to a balanced 3 nodes cluster. Before the fix there were severe imbalances (up to a single node taking all replicas of the run).

After this fix gets merged (as well as [https://github.com/apache/lucene-solr/pull/1504|http://example.com/] in SOLR-14462 dealing with Session creation and caching), I believe Session management could benefit from some refactoring and simplification.


was (Author: murblanc):
[~ab] I’ve created PR [https://github.com/apache/lucene-solr/pull/1542|http://example.com] that I believe solves the issue identified above.

Fix has two parts:
 # The obvious one: in {{PolicyHelper.getReplicaLocations()}}, the new (post placement computation) Session is returned with the {{SessionWrapper}} so that the next use sees the assignments of the computation rather than the initial state.
 # A less obvious one: in the same method, the way the current (orig) session is copied to create the new Session is modified to not validate collections in Zookeeper. This validation removed from the new session everything that didn’t make it to Zookeeper, therefore not showing assignments in progress. {{Policy.Session.cloneToNewSession()}} contains the code doing the copy (tried to keep it as close to the original behavior as possible).

A multithreaded collection creation test (JMeter with 40 threads looping through creating single shard single replica collections) led to a balanced 3 nodes cluster. Before the fix there were severe imbalances (up to a single node taking all replicas of the run).

After this fix gets merged (as well as [https://github.com/apache/lucene-solr/pull/1504|http://example.com] in SOLR-14462 dealing with Session creation and caching), I believe Session management could benefit from some refactoring and simplification.

> Autoscaling placement wrong when concurrent replica placements are calculated
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-14347
>                 URL: https://issues.apache.org/jira/browse/SOLR-14347
>             Project: Solr
>          Issue Type: Bug
>          Components: AutoScaling
>    Affects Versions: 8.5
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Critical
>             Fix For: 8.6
>
>         Attachments: SOLR-14347.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  * create a cluster of a few nodes (tested with 7 nodes)
>  * define per-collection policies that distribute replicas exclusively on different nodes per policy
>  * concurrently create a few collections, each using a different policy
>  * resulting replica placement will be seriously wrong, causing many policy violations
> Running the same scenario but instead creating collections sequentially results in no violations.
> I suspect this is caused by incorrect locking level for all collection operations (as defined in {{CollectionParams.CollectionAction}}) that create new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations use the policy engine to create new replica placements, and as a result they change the cluster state. However, currently these operations are locked (in {{OverseerCollectionMessageHandler.lockTask}} ) using {{LockLevel.COLLECTION}}. In practice this means that the lock is held only for the particular collection that is being modified.
> A straightforward fix for this issue is to change the locking level to CLUSTER (and I confirm this fixes the scenario described above). However, this effectively serializes all collection operations listed above, which will result in general slow-down of all collection operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org