You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/04/23 20:32:00 UTC

[jira] [Commented] (SOLR-12258) V2 API should "retry" for unresolved collections/aliases (like V1 does)

    [ https://issues.apache.org/jira/browse/SOLR-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448816#comment-16448816 ] 

David Smiley commented on SOLR-12258:
-------------------------------------

To be extra clear, here's a small excerpt from org.apache.solr.update.processor.TimeRoutedAliasUpdateProcessorTest#test (line 97-111) that is minimally sufficient to be showing the problem. _The payload part is irrelevant.; what matters is that it's simply a V2 request_:
{code:java}
    CollectionAdminRequest.createCollection(configName, configName, 1, 1).process(solrClient);

    // manipulate the config...
    checkNoError(solrClient.request(new V2Request.Builder("/collections/" + configName + "/config")
        .withMethod(SolrRequest.METHOD.POST)
        .withPayload("{" +
            "  'set-user-property' : {'update.autoCreateFields':false}," + // no data driven
            "  'add-updateprocessor' : {" +
            "    'name':'tolerant', 'class':'solr.TolerantUpdateProcessorFactory'" +
            "  }," +
            "  'add-updateprocessor' : {" + // for testing
            "    'name':'inc', 'class':'" + IncrementURPFactory.class.getName() + "'," +
            "    'fieldName':'" + intField + "'" +
            "  }," +
            "}").build()));
{code}
The second call, where we manipulate the config, sometimes/rarely fails because V2HttpCall can't resolve the collection (line 119). It's ZK state simply isn't up to date (I surmise). In principle, a V1 call could fail as well but in practice maybe it's more rare because the "retry" aspect of V1 buys it sufficient extra time. Adding a SolrCloudTestCase.waitForState in-between the calls here _may_ help but again there's no guarantee since waitForState waits for _the state of the client's state reader_ (not for state readers of Solr nodes).

For aliases, we can call ZooKeeper.sync("/aliases.json",...) -- and in fact I made sure ZkStateReader now does this in update(). For cases where our code expects to operate on a collection (thus it had better exist or we have an error) we could try and do a similar thing for collections? In fact we have ZkStateReader.forceUpdateCollection(collection) added by [~shalinmangar] in SOLR-8745 though it doesn't call ZooKeeper.sync().... but shouldn't it?

> V2 API should "retry" for unresolved collections/aliases (like V1 does)
> -----------------------------------------------------------------------
>
>                 Key: SOLR-12258
>                 URL: https://issues.apache.org/jira/browse/SOLR-12258
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud, v2 API
>            Reporter: David Smiley
>            Priority: Major
>
> When using V1, if the request refers to a possible collection/alias that fails to resolve, HttpSolrCall will invoke AliasesManager.update() then retry the request as if anew (in collaboration with SolrDispatchFilter).  If it fails to resolve again we stop there and return an error; it doesn't go on forever.
> V2 (V2HttpCall specifically) doesn't have this retry mechanism.  It'll return "no such collection or alias".
> The retry will not only work for an alias but the retrying is a delay that will at least help the odds of a newly made collection from being known to this Solr node.  It'd be nice if this was more explicit – i.e. if there was a mechanism similar to AliasesManager.update() but for a collection.  I'm not sure how to do that?
> BTW I discovered this while debugging a Jenkins failure of TimeRoutedAliasUpdateProcessorTest.test where it early on simply goes to issue a V2 based request to change the configuration of a collection that was created immediately before it.  It's pretty mysterious.  I am aware of SolrCloudTestCase.waitForState which is maybe something that needs to be called?  But if that were true then *every* SolrCloud test would need to use it; it just seems wrong to me that we ought to use this method commonly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org