You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2016/04/20 22:29:25 UTC

[jira] [Comment Edited] (SOLR-9014) Audit all usages of ClusterState methods which may make calls to ZK via the lazy collection reference

    [ https://issues.apache.org/jira/browse/SOLR-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250642#comment-15250642 ] 

Shalin Shekhar Mangar edited comment on SOLR-9014 at 4/20/16 8:28 PM:
----------------------------------------------------------------------

This is turning out to be interesting. We have to revisit a few assumptions:

# The Overseer has a wait loop to see a certain condition to be true in many places. The earlier assumption was that updateClusterState was expensive and therefore it was better to wait until you see the state. But not that we have lazy collections and a collection specific forceUpdateCollection, the wait loop is actually as expensive because it ends up reading the collection state from ZooKeeper -- sometime as frequently as 100ms. We should return the resolved reference from ZkStateReader#forceUpdateCollection and use it in such places.
# The ClusterState#getCollections was supposed to be lightweight i.e. it just read and returned the names of known collections from local cached state. This was changed in SOLR-6629 to resolve the reference. This means that it ends up going to ZK for each non-watched collection. So API calls like LIST, downnode etc have become way more expensive. It is better to start returning a List<DocCollection> from this method instead.


was (Author: shalinmangar):
This is turning out to be interesting. We have to revisit a few assumptions:

# The Overseer has a wait loop to see a certain condition to be true in many places. The earlier assumption was that updateClusterState was expensive and therefore it was better to wait until you see the state. But not that we have lazy collections and a collection specific forceUpdateCollection, the wait look is actually more expensive because it ends up reading the collection state from ZooKeeper -- sometime as frequently as 100ms.
# The ClusterState#getCollections was supposed to be lightweight i.e. it just read and returned the names of known collections from local cached state. This was changed in SOLR-6629 to resolve the reference. This means that it ends up going to ZK for each non-watched collection. So API calls like LIST, downnode etc have become way more expensive. It is better to start returning a List<DocCollection> from this method instead.

> Audit all usages of ClusterState methods which may make calls to ZK via the lazy collection reference
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9014
>                 URL: https://issues.apache.org/jira/browse/SOLR-9014
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>             Fix For: master, 6.1
>
>
> ClusterState has a bunch of methods such as getSlice and getReplica which internally call getCollectionOrNull that ends up making a call to ZK via the lazy collection reference. Many classes use these methods even though a DocCollection object is available. In such cases, multiple redundant calls to ZooKeeper can happen if the collection is not watched locally. This is especially true for Overseer classes which operate on all collections.
> We should audit all usages of these methods and replace them with calls to appropriate DocCollection methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org