You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2019/07/29 22:53:00 UTC

[jira] [Updated] (SOLR-13660) AbstractFullDistribZkTestBase.waitForActiveReplicaCount is broken

     [ https://issues.apache.org/jira/browse/SOLR-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-13660:
----------------------------
    Attachment: SOLR-13660.patch
        Status: Open  (was: Open)



Allthough this method is not used directly in many Solr tests that subclass {{AbstractFullDistribZkTestBase}} it is used by other methods in {{AbstractFullDistribZkTestBase}} -- including when creating the {{DEFAULT_COLLECTION}}.

Because of the esoteric way {{AbstractFullDistribZkTestBase}} initializes it's collections (and jetty instances) almost every replica created starts in recovery -- so as a result of this bug, subclasses may frequently see their test methods being invoked before the expected number of shards/replicas.

In at least one case (TestCloudSchemaless) this has lead to test failures (ultimately due to requests timing out when trying to add documents) as a result of test client operations competing with multiple concurrent replica recoveries on CPU constrained jenkins machines.

----

The attached patch:

* fixes {{waitForActiveReplicaCount(...)}} to check that the replicas are active
* deprecates and updates the javadocs of {{getTotalReplicas(...)}} to make it clear that this method doesn't care about the status of the replica.
** this method was formally used by {{waitForActiveReplicaCount(...)}}
* also makes some related fixes to {{createJettys(...)}}:
** adds some comments clarifying how this method initializes the shards vs addingthe replicas
** improves the initial slice count check to use existing helper methods which also verifies the slices are active
*** this doesn't really affect the correctness of the method given how the collection is used at this point, but helps simplify the code.




> AbstractFullDistribZkTestBase.waitForActiveReplicaCount is broken
> -----------------------------------------------------------------
>
>                 Key: SOLR-13660
>                 URL: https://issues.apache.org/jira/browse/SOLR-13660
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13660.patch
>
>
> {{AbstractFullDistribZkTestBase.waitForActiveReplicaCount(...)}} is broken, and does not actually check that the replicas are active.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org