You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mike Drob (Jira)" <ji...@apache.org> on 2021/01/20 22:48:00 UTC

[jira] [Created] (SOLR-15093) Heavy lock contention during collection creation

Mike Drob created SOLR-15093:
--------------------------------

             Summary: Heavy lock contention during collection creation
                 Key: SOLR-15093
                 URL: https://issues.apache.org/jira/browse/SOLR-15093
             Project: Solr
          Issue Type: Task
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Mike Drob


I was doing some lock analysis and found that we have quite a bit of contention on {{ZkStateReader$LazyCollectionRef.get(boolean)}} during heavy collection creation. I ran a sample workload creating as many collections as I could in 10 minutes, and this method was blocked for about 1:30 of that, which is a pretty significant portion.

A few representative stack traces:

{noformat}
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String, boolean) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String) org.apache.solr.cloud.ZkController.checkIfCoreNodeNameAlreadyExists(CoreDescriptor) org.apache.solr.core.CoreContainer.create(String, Path, Map, boolean)
{noformat}

And another:

{noformat}
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(boolean)
org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String, boolean)
org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String)
org.apache.solr.common.cloud.ZkStateReader.getCollection(String)
org.apache.solr.cloud.ZkController.publish(CoreDescriptor, Replica$State, boolean, boolean)
org.apache.solr.cloud.ZkController.preRegister(CoreDescriptor, boolean)
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreDescriptor, boolean, boolean)
org.apache.solr.core.CoreContainer.create(String, Path, Map, boolean)
{noformat}

And one more:

{noformat}
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(boolean)
 org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String, boolean)
 org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(String)
 org.apache.solr.common.cloud.ZkStateReader.registerDocCollectionWatcher(String, DocCollectionWatcher)
 org.apache.solr.common.cloud.ZkStateReader.waitForState(String, long, TimeUnit, Predicate)
 org.apache.solr.cloud.ZkController.checkStateInZk(CoreDescriptor)
 org.apache.solr.cloud.ZkController.preRegister(CoreDescriptor, boolean)
 org.apache.solr.core.CoreContainer.createFromDescriptor(CoreDescriptor, boolean, boolean)
 org.apache.solr.core.CoreContainer.create(String, Path, Map, boolean)
{noformat}

It looks like part of the problem is that we never allow ourselves to use the cache so each one happens to be a full fetch out to ZK. We have the optimizations there to compare the stat and the version, but it's still relatively heavyweight it appears.

cc: [~noble.paul], you might find this interesting. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org