You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2019/04/04 09:03:00 UTC
[jira] [Comment Edited] (SOLR-13339) ZkController.preClose() prevent recovery, indexfetcher being kicked off after SolrCores already closed

    [ https://issues.apache.org/jira/browse/SOLR-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809638#comment-16809638 ] 

Cao Manh Dat edited comment on SOLR-13339 at 4/4/19 9:02 AM:
-------------------------------------------------------------

Hi [~tomasflobbe], Thanks a lot for your review.
{quote} 
I think this makes sense to try to move some of the zk shutdown logic to the beginning of the shutdown process, but I'm not sure of all the ramifications of this change without more testing. For example, we'll be removing the live_node, publishing DOWN state and closing the electionContexts, however, elections are not cancelled. Will this cause periods with leaderless shards/no Overseer?
{quote}
I agree that calling close {{electionContexts}} at {{preClose()}} is not safe without enough tests. Besides that, two additional calls here is (just prevent recovery/indexFetcher being triggered)
* close {{terms}} watchers: which I think is safe, since we are shutting down
* stopReplication: which I think is safe, since we are shutting down

Additional call like {{removeEphemeralLiveNode}} and {{publishNodeAsDown}} should not counted since they only get moved from CoreContainer.shutdown()
{quote}
Regarding the patch, one thing I don't like is that the preClose() seems to be mandatory now, and that breaks the contract of Closeable in ZkController. Maybe close() should call preClose() if it hasn't been called yet?
{quote}
+1


was (Author: caomanhdat):
Hi [~tomasflobbe], Thanks a lot for your review.
{quote} 
I think this makes sense to try to move some of the zk shutdown logic to the beginning of the shutdown process, but I'm not sure of all the ramifications of this change without more testing. For example, we'll be removing the live_node, publishing DOWN state and closing the electionContexts, however, elections are not cancelled. Will this cause periods with leaderless shards/no Overseer?
{quote}
I agree that calling close {{electionContexts}} at {{preClose()}} is not safe without enough tests. Besides that, two additional calls here is (just prevent recovery/indexFetcher being triggered)
* close {{terms}} watchers: which I think is safe, since we are shutting down
* stopReplication: which I think is safe, since we are shutting down

{quote}
Regarding the patch, one thing I don't like is that the preClose() seems to be mandatory now, and that breaks the contract of Closeable in ZkController. Maybe close() should call preClose() if it hasn't been called yet?
{quote}
+1

> ZkController.preClose() prevent recovery, indexfetcher being kicked off after SolrCores already closed
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13339
>                 URL: https://issues.apache.org/jira/browse/SOLR-13339
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-13339.patch
>
>
> Right now, recovery can be kicked off when SolrCore is already closed or duing closing a SolrCore. It will be more safer to close all event listeners that can kick off recovery or indexfetcher (replicateFromLeader) before we close SolrCores.
> I think it will fix the problem of SOLR-13276 test failures recently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org