You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jay Potharaju <js...@gmail.com> on 2016/10/23 00:19:00 UTC

solrcloud load balancing

Hi,
I am trying to understand how load balancing works in solrcloud.

As per my understanding solrcloud provides load balancing when querying
using an http endpoint.  When a query is sent to any of the nodes , solr
will intelligently decide which server can fulfill the request and will be
processed by one of the nodes in the cluster.

1) Does the logic change when there is only 1 shard vs multiple shards?

2) Does the QTime displayed is sum of processing time for the query request
+ latency(if processed by another node) + time to decide which node will
process the request(which i am guessing is minimal and can be ignored)

3) In my solr logs i display the "slow" queries, is the qtime displayed
takes all of the above and shows the correct time taken.

Solr version: 5.5.0


-- 
Thanks
Jay

Re: solrcloud load balancing

Posted by Jay Potharaju <js...@gmail.com>.

Thanks Erick & Shawn for the response.

In case of non-distributed queries(single shard with replicas) is there a
way for me to determine how long does it take to retrieve the documents
 and send the response.

In my load test , i see that the response time at the client API is in
seconds but I am not able to see any high response time in the solr logs.
Is it possible that the under high load it takes a long time to retrieve
and send the documents?
If i run the same query in browser individually it comes back in quick time.

Thanks
Jay

On Sat, Oct 22, 2016 at 6:14 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/22/2016 6:19 PM, Jay Potharaju wrote:
> > I am trying to understand how load balancing works in solrcloud.
> >
> > As per my understanding solrcloud provides load balancing when querying
> > using an http endpoint.  When a query is sent to any of the nodes , solr
> > will intelligently decide which server can fulfill the request and will
> be
> > processed by one of the nodes in the cluster.
>
> Erick already responded, but I had this mostly written before I saw his
> response.  I decided to send it anyway.
>
> > 1) Does the logic change when there is only 1 shard vs multiple shards?
>
> The way I understand it, each shard is independently load balanced.  You
> might have a situation where one shard has more replicas than another
> shard, and I believe in that even in that situation, all replicas should
> be used.
>
> > 2) Does the QTime displayed is sum of processing time for the query
> request + latency(if processed by another node) + time to decide which node
> will process the request(which i am guessing is minimal and can be ignored)
>
> There are three phases in a distributed (multi-shard) query.
>
> 1) Each shard is sent the query, with the field list set to include the
> score, the unique key field, and if there is a sort parameter, whichever
> fields are used for sorting.  These requests happen in parallel.
> Whichever request takes the longest will determine the total time for
> this phase.
>
> 2) The responses from the subqueries are combined to determine which
> documents will make up the final result.
>
> 3) Additional queries are sent to the individual shards to retrieve the
> matching documents.  These requests are also in parallel, so the slowest
> such request will determine the time for this whole phase.
>
> > 3) In my solr logs i display the "slow" queries, is the qtime displayed
> > takes all of the above and shows the correct time taken.
>
> For non-distributed queries, QTime includes the time required to process
> the query, but not the time to retrieve the documents and send the
> response.  I *think* that when the query is distributed, QTime will be
> the sum of the first two phases that I mentioned above, but I'm not 100%
> sure.
>
> Thanks,
> Shawn
>
>


-- 
Thanks
Jay Potharaju

Re: solrcloud load balancing

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/22/2016 6:19 PM, Jay Potharaju wrote:
> I am trying to understand how load balancing works in solrcloud.
>
> As per my understanding solrcloud provides load balancing when querying
> using an http endpoint.  When a query is sent to any of the nodes , solr
> will intelligently decide which server can fulfill the request and will be
> processed by one of the nodes in the cluster.

Erick already responded, but I had this mostly written before I saw his
response.  I decided to send it anyway.

> 1) Does the logic change when there is only 1 shard vs multiple shards?

The way I understand it, each shard is independently load balanced.  You
might have a situation where one shard has more replicas than another
shard, and I believe in that even in that situation, all replicas should
be used.

> 2) Does the QTime displayed is sum of processing time for the query request + latency(if processed by another node) + time to decide which node will process the request(which i am guessing is minimal and can be ignored)

There are three phases in a distributed (multi-shard) query.

1) Each shard is sent the query, with the field list set to include the
score, the unique key field, and if there is a sort parameter, whichever
fields are used for sorting.  These requests happen in parallel. 
Whichever request takes the longest will determine the total time for
this phase.

2) The responses from the subqueries are combined to determine which
documents will make up the final result.

3) Additional queries are sent to the individual shards to retrieve the
matching documents.  These requests are also in parallel, so the slowest
such request will determine the time for this whole phase.

> 3) In my solr logs i display the "slow" queries, is the qtime displayed
> takes all of the above and shows the correct time taken.

For non-distributed queries, QTime includes the time required to process
the query, but not the time to retrieve the documents and send the
response.  I *think* that when the query is distributed, QTime will be
the sum of the first two phases that I mentioned above, but I'm not 100%
sure.

Thanks,
Shawn

Re: solrcloud load balancing

Posted by Jay Potharaju <js...@gmail.com>.

Thanks Erick for the response
I am currently using a load balancer for my solrcloud, but was particularly
interested to know if solrcloud is doing load balancing internally in the
case of a single shard.
All the documentation that I have seen assumes multi-shard scenarios but
not for a single shard. Can you please point me to some code/documenation
that can help me understand this better.

Thanks
Jay

On Sat, Oct 22, 2016 at 6:00 PM, Erick Erickson <er...@gmail.com>
wrote:

> 1) Single shards have some short circuiting in them. And anyway it's
> best to have some kind of load balancer in front or use SolrJ with
> CloudSolrClient. If you just use an HTTP end-point, you have a single
> point of failure if that node goes down.
>
> 2) yes. What it does _not_ include is the time taken to assemble the
> final document list, i.e. get the "fl" parameters. And also note that
> there's "the laggard problem" here. The time will be something close
> to the _longest_ time it takes any replica to respond. Say you have 4
> shards and the replica for one of them happens to hit a 5 second
> stop-the-world GC collection. Your QTime will be 5 seconds+. I really
> have no idea whether the QTime includes the decision process for
> selecting nodes, but I've also never heard of it being significant.
>
> 3) I guess, although I'm not quite sure I understand the question.
> Slow queries will include (roughly) the max of the sub-request QTimes.
>
> Best,
> Erick
>
> On Sat, Oct 22, 2016 at 5:19 PM, Jay Potharaju <js...@gmail.com>
> wrote:
> > Hi,
> > I am trying to understand how load balancing works in solrcloud.
> >
> > As per my understanding solrcloud provides load balancing when querying
> > using an http endpoint.  When a query is sent to any of the nodes , solr
> > will intelligently decide which server can fulfill the request and will
> be
> > processed by one of the nodes in the cluster.
> >
> > 1) Does the logic change when there is only 1 shard vs multiple shards?
> >
> > 2) Does the QTime displayed is sum of processing time for the query
> request
> > + latency(if processed by another node) + time to decide which node will
> > process the request(which i am guessing is minimal and can be ignored)
> >
> > 3) In my solr logs i display the "slow" queries, is the qtime displayed
> > takes all of the above and shows the correct time taken.
> >
> > Solr version: 5.5.0
> >
> >
> > --
> > Thanks
> > Jay
>



-- 
Thanks
Jay Potharaju

Re: solrcloud load balancing

Posted by Erick Erickson <er...@gmail.com>.

1) Single shards have some short circuiting in them. And anyway it's
best to have some kind of load balancer in front or use SolrJ with
CloudSolrClient. If you just use an HTTP end-point, you have a single
point of failure if that node goes down.

2) yes. What it does _not_ include is the time taken to assemble the
final document list, i.e. get the "fl" parameters. And also note that
there's "the laggard problem" here. The time will be something close
to the _longest_ time it takes any replica to respond. Say you have 4
shards and the replica for one of them happens to hit a 5 second
stop-the-world GC collection. Your QTime will be 5 seconds+. I really
have no idea whether the QTime includes the decision process for
selecting nodes, but I've also never heard of it being significant.

3) I guess, although I'm not quite sure I understand the question.
Slow queries will include (roughly) the max of the sub-request QTimes.

Best,
Erick

On Sat, Oct 22, 2016 at 5:19 PM, Jay Potharaju <js...@gmail.com> wrote:
> Hi,
> I am trying to understand how load balancing works in solrcloud.
>
> As per my understanding solrcloud provides load balancing when querying
> using an http endpoint.  When a query is sent to any of the nodes , solr
> will intelligently decide which server can fulfill the request and will be
> processed by one of the nodes in the cluster.
>
> 1) Does the logic change when there is only 1 shard vs multiple shards?
>
> 2) Does the QTime displayed is sum of processing time for the query request
> + latency(if processed by another node) + time to decide which node will
> process the request(which i am guessing is minimal and can be ignored)
>
> 3) In my solr logs i display the "slow" queries, is the qtime displayed
> takes all of the above and shows the correct time taken.
>
> Solr version: 5.5.0
>
>
> --
> Thanks
> Jay