You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2016/05/19 17:34:12 UTC

[jira] [Commented] (SOLR-8973) TX-frenzy on Zookeeper when collection is put to use

    [ https://issues.apache.org/jira/browse/SOLR-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291599#comment-15291599 ] 

ASF subversion and git services commented on SOLR-8973:
-------------------------------------------------------

Commit b663e5bcad9974f2d80c16b85c862407a38290e0 in lucene-solr's branch refs/heads/branch_6_0 from [~dragonsinth]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b663e5b ]

SOLR-8973: Zookeeper frenzy when a core is first created.

For branch_6_0: Modified ZkStateReaderTest to use ZkStateReader.updateClusterState() instead of .forceUpdateCollection(), since SOLR-8745 will land in 6.1.


> TX-frenzy on Zookeeper when collection is put to use
> ----------------------------------------------------
>
>                 Key: SOLR-8973
>                 URL: https://issues.apache.org/jira/browse/SOLR-8973
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 6.0
>            Reporter: Janmejay Singh
>            Assignee: Scott Blum
>              Labels: collections, patch-available, solrcloud, zookeeper
>             Fix For: 5.5.1, 5.6, 6.0.1, 6.1, master (7.0)
>
>         Attachments: SOLR-8973-ZkStateReader.patch, SOLR-8973.patch, SOLR-8973.patch, SOLR-8973.patch
>
>
> This is to do with a distributed data-race. Core-creation happens at a time when collection is not yet visible to the node. In this case a fallback code-path is used which de-references collection-state lazily (on demand) as opposed to setting a watch and keeping it cached locally.
> Due to this, as requests towards the core mount, it generates ZK fetch for collection proportionately. On a large solr-cloud cluster, this generates several Gbps of TX traffic on ZK nodes. This affects indexing throughput(which floors) in addition to running ZK node out of network bandwidth. 
> On smaller solr-cloud clusters its hard to run into, because probability of this race materializing reduces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org