You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Varun Thacker (JIRA)" <ji...@apache.org> on 2017/11/01 16:16:00 UTC

[jira] [Commented] (SOLR-11484) CloudSolrClient's cache of collection clusterstate can cause RouteExceptions when attempting directUpdates after collection modifications

    [ https://issues.apache.org/jira/browse/SOLR-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234311#comment-16234311 ] 

Varun Thacker commented on SOLR-11484:
--------------------------------------

Hi Everyone,

 [~cpoerschke] what are your thoughts on this? I guess the work "Only" in the flag would mean that the update should fail if there are no leaders?

In which case our tests should not set this flag and use the default behaviour which is "If there is no leader, send the request to any live NRT node"

> CloudSolrClient's cache of collection clusterstate can cause RouteExceptions when attempting directUpdates after collection modifications
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11484
>                 URL: https://issues.apache.org/jira/browse/SOLR-11484
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Noble Paul
>            Priority: Major
>             Fix For: 7.2, master (8.0)
>
>         Attachments: SOLR-11484.patch, SOLR-11484.patch, jenkins.thetaphi.20662.txt
>
>
> This was discovered while auditing jenkins failures from 
> {{TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete}} (where a test explicitly deletes and then recreates a collection with the same name), but as noted in a comment below, SOLR-11392 is another example of non-obvious test failures that can pop up because of this bug.
> In practice, it can affect any CloudSolrClient user after changes have been made to a collection (to add/move replicas, etc...)
> ----
> Original jira notes...
> {{TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete}}
> seems to fail with non-trivial frequency, so I grabbed the logs from a recent failure and starting trying to follow along with the actions to figure out what exactly is happening....
> https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20662/
> {noformat}
>    [junit4] ERROR   20.3s J1 | TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete <<<
>    [junit4]    > Throwable #1: org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from server at https://127.0.0.1:42959/solr/testcollection_shard1_replica_n3: Expected mime type a
> pplication/octet-stream but got text/html. <html>
>    [junit4]    > <head>
>    [junit4]    > <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
>    [junit4]    > <title>Error 404 </title>
> {noformat}
> The crux of this failure appears to be a genuine bug in how CloudSolrClient uses it's cached ClusterState info when doing (direct) updates.  The key bits seem to be:
> * CloudSolrClient does _something_ (update,query,etc...) with a collection causing the current cluster state for the collection to be cached
> * The actual collection changes such that a Solr node/core no longer exists as part of the collection
> * CloudSolrClient is asked to process an UpdateRequest which triggers the code paths for the {{directUpdate()}} method -- which attempts to route the updates directly to a replica of the appropriate shard using the (cache) collection state info
> * CloudSolrClient (may) attempt to send that UpdateRequest to a node/core that doesn't exist, getting a 404 -- which does not (seem to) trigger a state refresh, or retry to find a correct URL to resend the update to.
> Details to follow in comment....



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org