You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zzT <zi...@gmail.com> on 2014/04/25 12:32:29 UTC

SolrCloud load balancing during heavy indexing

Hi all,

In SolrCloud all nodes are equal in the sense that they can perform indexing
as well as searching.

Let's say a leader node is busy performing heavy-indexing, I wouldn't like
to also send search requests to that node. As far as I can tell from
CloudSolrServer source code, all it does when dealing with search requests
is to shuffle the live nodes (info retrieved from Zookeeper) and choose one
of them. Does it perform anything smarter than that, like excluding leader
because of heavy indexing?

Basically, what I'm worried about is a scenario where heavy indexing on
leader node might affect search response times. Has anyone witnessed such a
case?

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-during-heavy-indexing-tp4133099.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud load balancing during heavy indexing

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

On Fri, Apr 25, 2014 at 12:54 PM, zzT <zi...@gmail.com> wrote:

> Erick Erickson wrote
> > Back up, you're misunderstanding the update process. A leader node
> > distributes the update to every replica. So _all_ your nodes in a
> > slice are indexing when _any_ of them index. So the idea of sending
> > queries to just the replicas to avoid performance problems isn't
> > relevant.
>
> Hmm, I thought that it's not actual indexing taking place on the replicas
> but that the changes were somehow transferred to the replicas and thus it
> was less intensive for them.
>

Unfortunately that's not the case.  Each node that gets a doc still has to
analyze and index it.  I think at some point I sent a message to the list
and/or created a JIRA issue to suggest doing analysis on just the receiving
node, in which case the other nodes that need to index could skip that step
and do a little less work, but that hasn't been implemented yet.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



>
>
> Erick Erickson wrote
> > In order to support NRT and HA/DR, it's required that all the nodes be
> > ready to take over, so the notion of the leader being the only node
> > that actually indexed the documents then distributing only the indexed
> > document to the other members of the slice isn't how it's done.
>
> So, this is where SolrCloud is different from legacy master/slave
> configuration? I mean master/slave sends segments to the slaves using e.g.
> rsync while SolrCloud forwards the indexing request to replicas where it's
> processed "locally" on each replica, right?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-during-heavy-indexing-tp4133099p4133160.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrCloud load balancing during heavy indexing

Posted by Erick Erickson <er...@gmail.com>.
What Shawn said....

Erick

On Fri, Apr 25, 2014 at 10:13 AM, Shawn Heisey <so...@elyograg.org> wrote:
> On 4/25/2014 10:54 AM, zzT wrote:
>>
>> So, this is where SolrCloud is different from legacy master/slave
>> configuration? I mean master/slave sends segments to the slaves using e.g.
>> rsync while SolrCloud forwards the indexing request to replicas where it's
>> processed "locally" on each replica, right?
>
>
> That's correct.  Each replica indexes each new document independently.
>
> There is one detail about SolrCloud that can be very confusing:  The
> /replication handler must be defined in solrconfig.xml for SolrCloud to
> function properly.  This is because when an index *recovery* is required, it
> will use the old-style replication to copy the index from the leader ...
> which will usually copy the entire index.
>
> Related tangent: I think that SolrCloud should use a dedicated and invisible
> handler for index recovery rather than an explictly defined handler named
> "/replication".  One possible name for this handler is "/cloudrecovery".
> SOLR-3990 should be fixed before that gets done.
>
> Thanks,
> Shawn
>

Re: SolrCloud load balancing during heavy indexing

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/25/2014 10:54 AM, zzT wrote:
> So, this is where SolrCloud is different from legacy master/slave
> configuration? I mean master/slave sends segments to the slaves using e.g.
> rsync while SolrCloud forwards the indexing request to replicas where it's
> processed "locally" on each replica, right?

That's correct.  Each replica indexes each new document independently.

There is one detail about SolrCloud that can be very confusing:  The 
/replication handler must be defined in solrconfig.xml for SolrCloud to 
function properly.  This is because when an index *recovery* is 
required, it will use the old-style replication to copy the index from 
the leader ... which will usually copy the entire index.

Related tangent: I think that SolrCloud should use a dedicated and 
invisible handler for index recovery rather than an explictly defined 
handler named "/replication".  One possible name for this handler is 
"/cloudrecovery".  SOLR-3990 should be fixed before that gets done.

Thanks,
Shawn


Re: SolrCloud load balancing during heavy indexing

Posted by zzT <zi...@gmail.com>.
Erick Erickson wrote
> Back up, you're misunderstanding the update process. A leader node
> distributes the update to every replica. So _all_ your nodes in a
> slice are indexing when _any_ of them index. So the idea of sending
> queries to just the replicas to avoid performance problems isn't
> relevant.

Hmm, I thought that it's not actual indexing taking place on the replicas
but that the changes were somehow transferred to the replicas and thus it
was less intensive for them.


Erick Erickson wrote
> In order to support NRT and HA/DR, it's required that all the nodes be
> ready to take over, so the notion of the leader being the only node
> that actually indexed the documents then distributing only the indexed
> document to the other members of the slice isn't how it's done.

So, this is where SolrCloud is different from legacy master/slave
configuration? I mean master/slave sends segments to the slaves using e.g.
rsync while SolrCloud forwards the indexing request to replicas where it's
processed "locally" on each replica, right?



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-during-heavy-indexing-tp4133099p4133160.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud load balancing during heavy indexing

Posted by Erick Erickson <er...@gmail.com>.
Back up, you're misunderstanding the update process. A leader node
distributes the update to every replica. So _all_ your nodes in a
slice are indexing when _any_ of them index. So the idea of sending
queries to just the replicas to avoid performance problems isn't
relevant.

In order to support NRT and HA/DR, it's required that all the nodes be
ready to take over, so the notion of the leader being the only node
that actually indexed the documents then distributing only the indexed
document to the other members of the slice isn't how it's done.

You're right, if you have a heavy indexing load it'll take resources
from your query rate and it's something to be cognizant of, but it's a
cluster-wide issue, not just an issue with the leader.

Best,
Erick


On Fri, Apr 25, 2014 at 3:32 AM, zzT <zi...@gmail.com> wrote:
> Hi all,
>
> In SolrCloud all nodes are equal in the sense that they can perform indexing
> as well as searching.
>
> Let's say a leader node is busy performing heavy-indexing, I wouldn't like
> to also send search requests to that node. As far as I can tell from
> CloudSolrServer source code, all it does when dealing with search requests
> is to shuffle the live nodes (info retrieved from Zookeeper) and choose one
> of them. Does it perform anything smarter than that, like excluding leader
> because of heavy indexing?
>
> Basically, what I'm worried about is a scenario where heavy indexing on
> leader node might affect search response times. Has anyone witnessed such a
> case?
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-load-balancing-during-heavy-indexing-tp4133099.html
> Sent from the Solr - User mailing list archive at Nabble.com.