You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Grzegorz Lebek (JIRA)" <ji...@apache.org> on 2018/05/29 09:32:00 UTC

[jira] [Created] (SOLR-12415) Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr node goes down, it is unable to detect when it become live again due to 404 error

Grzegorz Lebek created SOLR-12415:
-------------------------------------

             Summary: Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr node goes down, it is unable to detect when it become live again due to 404 error
                 Key: SOLR-12415
                 URL: https://issues.apache.org/jira/browse/SOLR-12415
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrJ
    Affects Versions: 7.3.1, 7.2.1, 7.4
         Environment: Solr 7.2.1

2 servers - master and slave.
            Reporter: Grzegorz Lebek


*Context*
When LBHttpSolrClient has been constructed using *base urls*, and when a slave goes down, and then back again, the client is unable to mark it as alive again due to 404 error.

Logs  below:
{code:java}
 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "GET /solr/select?q=%3A&rows=0&sort=docid+asc&distrib=false&wt=javabin&version=2 HTTP/1.1[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Host: localhost:8984[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Connection: Keep-Alive[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "HTTP/1.1 404 Not Found[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "Cache-Control: must-revalidate,no-cache,no-store[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "Content-Type: text/html;charset=iso-8859-1[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "Content-Length: 243[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "[\r][\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<html>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<head>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<title>Error 404 Not Found</title>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "</head>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<body><h2>HTTP ERROR 404</h2>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<p>Problem accessing /solr/select. Reason:[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<pre> Not Found</pre></p>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "</body>[\n]"

 DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "</html>[\n]"{code}

*Analysis*
when using only *base urls* in a LBHttpSolrClient we need to pass a "*collection*" paramter when sending a request. It works fine except that in a method 
{code:java}
private void checkAZombieServer(ServerWrapper zombieServer){code}
it tries to query a solr without the collection parameter, to check if the server is alive. This causes a html content (apparently dashboard) to be returned, and as a result it will move to the exception clause in the method therefore even if the server is back it will never be marked as alive again.
I debugged this and if we pass a collection name there as a second param it will respond in a right manner.

Suggestion is either to somehow pass the collection name or to change the way zombie servers are pinged.

*Steps to reproduce*

Run 2 servers - master and slave. Create client using base urls. Index, test search etc.

Turn off slave server and after couple of seconds turn it on again.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org