You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2016/11/01 14:57:58 UTC

[jira] [Comment Edited] (SOLR-9706) fetchIndex blocks incoming queries when issued on a replica in SolrCloud

    [ https://issues.apache.org/jira/browse/SOLR-9706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625624#comment-15625624 ] 

Erick Erickson edited comment on SOLR-9706 at 11/1/16 2:57 PM:
---------------------------------------------------------------

M/S replication does not show this behavior. That's why I wondered if it's deliberate or just an accident of coding.

Given that it only happens in SolrCloud, and the node should be in recovery when the logic for a fetchindex kicks in and thus not receive any queries, if it's accidental then it could easily have been there from day 1. One could even argue that this is correct in the "normal" case.

This scenario is one in which an explicit fetchindex is submitted while the search cluster is actively serving queries, thus something of an edge case.

The idea of passing a parameter to override this behavior assumes that it's deliberate. If changing the code such that _explicit_ fetchindex commands in cloud mode don't block incoming queries that would be fine too.


was (Author: erickerickson):
M/S replication does not show this behavior. That's why I wondered if it's deliberate or just an accident of coding.

Given that it only happens in SolrCloud, and the node should be in recovery and thus not receive any queries, if it's accidental then it would go unnoticed. One could even argue that this is correct in the "normal" case.

This scenario is one in which an explicit fetchindex is submitted while the search cluster is actively serving queries, thus something of an edge case.

The idea of passing a parameter to override this behavior assumes that it's deliberate. If changing the code such that _explicit_ fetchindex commands don't block that would be fine too.

> fetchIndex blocks incoming queries when issued on a replica in SolrCloud
> ------------------------------------------------------------------------
>
>                 Key: SOLR-9706
>                 URL: https://issues.apache.org/jira/browse/SOLR-9706
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.3, trunk
>            Reporter: Erick Erickson
>
> This is something of an edge case, but it's perfectly possible to issue a fetchIndex command through the core admin API to a replica in SolrCloud. While the fetch is going on, incoming queries are blocked. Then when the fetch completes, all the queued-up queries execute.
> In the normal case, this is probably the proper behavior as a fetchIndex during "normal" SolrCloud operation indicates that the replica's index is too far out of date and _shouldn't_ serve queries, this is a special case.
> Why would one want to do this? Well, in _extremely_ high indexing throughput situations, the additional time taken for the leader forwarding the query on to a follower is too high. So there is an indexing cluster and a search cluster and an external process that issues a fetchIndex to each replica in the search cluster periodiclally.
> What do people think about an "expert" option for fetchIndex that would cause a replica to behave like the old master/slave days and continue serving queries while the fetchindex was going on? Or another solution?
> FWIW, here's the stack traces where the blocking is going on (6.3 about). This is not hard to reproduce if you introduce an artificial delay in the fetch command then submit a fetchIndex and try to query.
> Blocked query thread(s)
> DefaultSolrCoreState.loci(159)
> DefaultSolrCoreState.getIndexWriter (104)
> SolrCore.openNewSearcher(1781)
> SolrCore.getSearcher(1931)
> SolrCore.getSearchers(1677)
> SolrCore.getSearcher(1577)
> SolrQueryRequestBase.getSearcher(115)
> QueryComponent.process(308).
> The stack trace that releases this is
> DefaultSolrCoreState.createMainIndexWriter(240)
> DefaultSolrCoreState.changeWriter(203)
> DefaultSolrCoreState.openIndexWriter(228) // LOCK RELEASED 2 lines later
> IndexFetcher.fetchLatestIndex(493) (approx, I have debugging code in there. It's in the "finally" clause anyway.)
> IndexFetcher.fetchLatestIndex(251).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org