You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by search engn dev <sa...@gmail.com> on 2014/07/18 07:15:24 UTC

Understanding query behaviour in LBHttpSolrServer

I just want to understand query flow and how load balancing works in case of
LBHttpSolrServer. We have setup SolrCloud with one collection, and that
collection has 4 shards and each shard has two nodes i.e one master and one
replica. 

I have configured LBHttpSolrServer as below.
SolrServer lbHttpSolrServer = new
LBHttpSolrServer("http://shard1_master:8080/solr/","http://shard2_master:8080/solr/","http://shard3_master:8080/solr/","http://shard4_master:8080/solr/","http://shard1_replica:8080/solr/","http://shard2_replica:8080/solr/","http://shard3_replica:8080/solr/","http://shard4_replica:8080/solr/",);

>From my understanding solr and solrj works as below,
1. LBHttpSolrServer keeps pinging above list of servers and maintains list
of live servers.
2. Every time query arives it picks one server from the list (round-robin
fashion)
3. Sends query to selected server server.
4. When query arives at solr node it internally distributes query to
remaining shards , collects,merges,ranks results and sends response back to
the user.

Here my confusion is at point number 4, is my understanding correct? if not
please correct. And do i need to pass all 8 nodes to LBHttpSolrServer or
just 4 will be sufficient . 
 



--
View this message in context: http://lucene.472066.n3.nabble.com/Understanding-query-behaviour-in-LBHttpSolrServer-tp4147835.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Understanding query behaviour in LBHttpSolrServer

Posted by Jack Krupansky <ja...@basetechnology.com>.

For traditional, non-SolrCloud "distributed" mode, load balancing and 
sharded queries are independent concepts - you can use them each separately 
or together at your choice. If you want the query to be sharded for a 
non-SolrCloud Solr server, then you need to pass the "shards" parameter on 
each query. For SolrCloud the sharding of queries takes place automatically 
without any shards parameter. But you should use the CloudSolrServer for 
load balancing of SolrCloud anyway - internally it does the load balancing 
automatically based on discovery of the SolrCloud configuration.

-- Jack Krupansky

-----Original Message----- 
From: search engn dev
Sent: Friday, July 18, 2014 1:15 AM
To: solr-user@lucene.apache.org
Subject: Understanding query behaviour in LBHttpSolrServer

I just want to understand query flow and how load balancing works in case of
LBHttpSolrServer. We have setup SolrCloud with one collection, and that
collection has 4 shards and each shard has two nodes i.e one master and one
replica.

I have configured LBHttpSolrServer as below.
SolrServer lbHttpSolrServer = new
LBHttpSolrServer("http://shard1_master:8080/solr/","http://shard2_master:8080/solr/","http://shard3_master:8080/solr/","http://shard4_master:8080/solr/","http://shard1_replica:8080/solr/","http://shard2_replica:8080/solr/","http://shard3_replica:8080/solr/","http://shard4_replica:8080/solr/",);

>From my understanding solr and solrj works as below,
1. LBHttpSolrServer keeps pinging above list of servers and maintains list
of live servers.
2. Every time query arives it picks one server from the list (round-robin
fashion)
3. Sends query to selected server server.
4. When query arives at solr node it internally distributes query to
remaining shards , collects,merges,ranks results and sends response back to
the user.

Here my confusion is at point number 4, is my understanding correct? if not
please correct. And do i need to pass all 8 nodes to LBHttpSolrServer or
just 4 will be sufficient .




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-query-behaviour-in-LBHttpSolrServer-tp4147835.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Understanding query behaviour in LBHttpSolrServer

Posted by Shawn Heisey <so...@elyograg.org>.

On 7/18/2014 12:51 AM, search engn dev wrote:
> From my understanding solr and solrj works as below, 
> 1. LBHttpSolrServer keeps pinging above list of servers and maintains list
> of live servers. 
> 2. Every time query arives it picks one server from the list (round-robin
> fashion) 
> 3. Sends query to selected server server. 
> 4. When query arives at solr node it internally distributes query to
> remaining shards , collects,merges,ranks results and sends response back to
> the user. 

The first three sound like what LBHttpSolrServer probably does, though I
haven't looked very deeply at the code, so I cannot say for sure.  The
fourth item is exactly how SolrCloud behaves.

If you are indeed running SolrCloud, you should not be using
LBHttpSolrServer.  Instead, use the cloud-aware client, CloudSolrServer.
 It will work better than LBHttpSolrServer.

Thanks,
Shawn

Re: Understanding query behaviour in LBHttpSolrServer

Posted by search engn dev <sa...@gmail.com>.

Thanks Shawn,

I am also not sure about query flow , 

>From my understanding solr and solrj works as below, 
1. LBHttpSolrServer keeps pinging above list of servers and maintains list
of live servers. 
2. Every time query arives it picks one server from the list (round-robin
fashion) 
3. Sends query to selected server server. 
4. When query arives at solr node it internally distributes query to
remaining shards , collects,merges,ranks results and sends response back to
the user. 

are these steps correct ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Understanding-query-behaviour-in-LBHttpSolrServer-tp4147835p4147846.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Understanding query behaviour in LBHttpSolrServer

Posted by Shawn Heisey <so...@elyograg.org>.

On 7/17/2014 11:15 PM, search engn dev wrote:
> I just want to understand query flow and how load balancing works in case of
> LBHttpSolrServer. We have setup SolrCloud with one collection, and that
> collection has 4 shards and each shard has two nodes i.e one master and one
> replica.

If you're running SolrCloud and building client applications with SolrJ,
just use CloudSolrServer.  You just pass it the same zkHost value that
you give to SolrCloud itself, listing all your zookeeper servers.  The
CloudSolrServer object is a zookeeper client, so my understanding is
that it will dynamically adjust to the current clusterstate -- if
servers go down, get added, or get removed, the client will know as soon
as SolrCloud itself does, without restarting the application or building
a new client object.

CloudSolrServer will automatically load balance requests across the
nodes that comprise the collection that is being queried.  Newer
versions of the client will also route updates directly to the leader of
the correct shard, which reduces load on the servers and speeds up indexing.

Internally, CloudSolrServer uses an instance of LBHttpSolrServer, but
the list of URLs is dynamically managed, your program doesn't need to
worry about it.

http://lucene.apache.org/solr/4_9_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html

Side questions/comments for experienced committers, I can file some
issues and work on these:

The javadoc for the first CloudSolrServer constructor just mentions
HOST:PORT for the format of zkHost.  Should that be expanded so that
it's apparent that if multiple ZK servers are present, they all need to
be listed?  All of the constructor javadocs could do with a little more
substance.

I think that "throws MalformedURLException" needs to be removed from the
second CloudSolrServer constructor.  When I tried that, eclipse didn't
show any errors.

Although it's not complex code, the various CloudSolrServer constructors
are very similar, the actual work should probably be done by one
constructor that is called by all the others.

Thanks,
Shawn