You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ludovic Boutros (JIRA)" <ji...@apache.org> on 2014/05/21 18:22:37 UTC

[jira] [Updated] (SOLR-6086) Replica active during Warming

     [ https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ludovic Boutros updated SOLR-6086:
----------------------------------

    Attachment: SOLR-6086.patch

I checked the differences in the logs and in the code.

The problem occures when:
- a node is restarted 
- Peer Sync failed (no "/get" handler for instance, should it become mandatory ?)
- the node is already synced (nothing to replicate)

or :

- a node is restarted and this is the leader (I do not know if it only appends with a lonely leader...)
- the node is already synced (nothing to replicate)

For the first case,

I think this is a side effect of the modification in SOLR-4965. 

If Peer Sync is succesfull, in the code an explicit commit is called. And there's a comment which says:

{code:title=RecoveryStrategy.java|borderStyle=solid}
            // force open a new searcher
            core.getUpdateHandler().commit(new CommitUpdateCommand(req, false));
{code}

This is not the case if Peer Sync failed.
Just adding this line is enough to correct this issue.

Here is a patch with a test which reproduce the problem and the correction (to be applied to the branch 4x).

I am working on the second case.

> Replica active during Warming
> -----------------------------
>
>                 Key: SOLR-6086
>                 URL: https://issues.apache.org/jira/browse/SOLR-6086
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.6.1, 4.8.1
>            Reporter: ludovic Boutros
>         Attachments: SOLR-6086.patch
>
>
> At least with Solr 4.6.1, replica are considered as active during the warming process.
> This means that if you restart a replica or create a new one, queries will  
> be send to this replica and the query will hang until the end of the warming  
> process (If cold searchers are not used).
> You cannot add or restart a node silently anymore.
> I think that the fact that the replica is active is not a bad thing.
> But, the HttpShardHandler and the CloudSolrServer class should take the warming process in account.
> Currently, I have developped a new very simple component which check that a searcher is registered.
> I am also developping custom HttpShardHandler and CloudSolrServer classes which will check the warming process in addition to the ACTIVE status in the cluster state.
> This seems to be more a workaround than a solution but that's all I can do in this version.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org