You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Colvin Cowie (Jira)" <ji...@apache.org> on 2020/05/20 17:35:00 UTC

[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

    [ https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112472#comment-17112472 ] 

Colvin Cowie commented on SOLR-14503:
-------------------------------------

I see {{ZkFailoverTest}} was added for SOLR-5129, but because it does {{}}{{Thread.sleep({color:#0000ff}5000{color});}} with {{waitForZk}} set to 60 it doesn't stop the zk server for long enough for it to exceed either the configured timeout or the unconfigured DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds.

I've tried modifying the test to cover both a successful start and the configured timeout being exceeded, but I can't quite get it to work with both cases at the same time since I seem to end up with the server dead when the second test starts, and I'm not familiar enough with way these tests are written to know what the right way to write these tests is.

If I simply duplicate the existing test method so that there's two test cases doing the same thing, it also fails. So it's not specific to the case that I'm adding. [^flawed-test.patch]

> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> -----------------------------------------------------------
>
>                 Key: SOLR-14503
>                 URL: https://issues.apache.org/jira/browse/SOLR-14503
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>            Reporter: Colvin Cowie
>            Priority: Minor
>         Attachments: SOLR-14503.patch, flawed-test.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 seconds, then core container intialization fails and the node will not recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't quite do so for two reasons:
>  # [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297] it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK environment property [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org