You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ludovic Boutros (JIRA)" <ji...@apache.org> on 2014/05/21 18:24:38 UTC
[jira] [Comment Edited] (SOLR-6086) Replica active during Warming
[ https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004856#comment-14004856 ]
ludovic Boutros edited comment on SOLR-6086 at 5/21/14 4:23 PM:
----------------------------------------------------------------
I checked the differences in the logs and in the code.
The problem occures when:
- a node is restarted
- Peer Sync failed (no "/get" handler for instance, should it become mandatory ?)
- the node is already synced (nothing to replicate)
or :
- a node is restarted and this is the leader (I do not know if it only appends with a lonely leader...)
- the node is already synced (nothing to replicate)
For the first case,
I think this is a side effect of the modification in SOLR-4965.
If Peer Sync is succesfull, in the code an explicit commit is called. And there's a comment which says:
{code:title=RecoveryStrategy.java|borderStyle=solid}
// force open a new searcher
core.getUpdateHandler().commit(new CommitUpdateCommand(req, false));
{code}
This is not the case if Peer Sync failed.
Just adding this line is enough to correct this issue.
Here is a patch with a test which reproduces the problem and the correction (to be applied to the branch 4x).
I am working on the second case.
was (Author: lboutros):
I checked the differences in the logs and in the code.
The problem occures when:
- a node is restarted
- Peer Sync failed (no "/get" handler for instance, should it become mandatory ?)
- the node is already synced (nothing to replicate)
or :
- a node is restarted and this is the leader (I do not know if it only appends with a lonely leader...)
- the node is already synced (nothing to replicate)
For the first case,
I think this is a side effect of the modification in SOLR-4965.
If Peer Sync is succesfull, in the code an explicit commit is called. And there's a comment which says:
{code:title=RecoveryStrategy.java|borderStyle=solid}
// force open a new searcher
core.getUpdateHandler().commit(new CommitUpdateCommand(req, false));
{code}
This is not the case if Peer Sync failed.
Just adding this line is enough to correct this issue.
Here is a patch with a test which reproduce the problem and the correction (to be applied to the branch 4x).
I am working on the second case.
> Replica active during Warming
> -----------------------------
>
> Key: SOLR-6086
> URL: https://issues.apache.org/jira/browse/SOLR-6086
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.6.1, 4.8.1
> Reporter: ludovic Boutros
> Attachments: SOLR-6086.patch
>
>
> At least with Solr 4.6.1, replica are considered as active during the warming process.
> This means that if you restart a replica or create a new one, queries will
> be send to this replica and the query will hang until the end of the warming
> process (If cold searchers are not used).
> You cannot add or restart a node silently anymore.
> I think that the fact that the replica is active is not a bad thing.
> But, the HttpShardHandler and the CloudSolrServer class should take the warming process in account.
> Currently, I have developped a new very simple component which check that a searcher is registered.
> I am also developping custom HttpShardHandler and CloudSolrServer classes which will check the warming process in addition to the ACTIVE status in the cluster state.
> This seems to be more a workaround than a solution but that's all I can do in this version.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org