You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nguyen Nguyen <ng...@gmail.com> on 2018/05/22 01:23:50 UTC

How to maintain fast query speed during heavy indexing?

Hello everyone,

I'm running SolrCloud cluster of 5 nodes with 5 shards and 3 replicas per
shard.  I usually see spikes in query performance during high indexing
period. I would like to have stable query response time even during high
indexing period.  I recently upgraded to Solr 7.3 and running with 2 TLOG
replicas and 1 PULL replica.  Using a small maxWriteMBPerSec for
replication and only query PULL replicas during indexing period, I'm still
seeing long query time for some queries (although not as often as before
the change).

My first question is 'Is it possible to control replication of non-leader
like in master/slave configuration (eg: disablepoll, fetchindex)?'.  This
way, I can disable replication on the followers until committing is
completed on the leaders while sending query requests to the followers (or
just PULL replica) only.  Then when data is committed on leaders, I would
send query requests back to only leaders and tell the followers to start to
fetch the newly updated index.

If manual replication control isn't possible, I'm planning to have
duplicate collections and use an alias to switch between the two collection
at different times.  For example: while 'collection1' collection being
indexed, and alias 'search' would point to 'collection2' collection to
serve query request.  Once indexing is completed on 'collection1', 'search'
alias would now point to 'collection1', and 'collection2' will be updated
to be in sync with 'collection1'.  The cycle repeats for  next indexing
cycle.  My question for this method would be if there is any existing
method to sync one collection to another so that I don't have to send the
same update requests to the two collections.

Also wondering if there are other better methods everyone is using?

Thanks much!

Cheers,

-Nguyen

Re: How to maintain fast query speed during heavy indexing?

Posted by Nguyen Nguyen <ng...@gmail.com>.
Great info!  Thanks, Erick!

Cheers,
Nguyen

On Tue, May 22, 2018 at 5:45 AM Erick Erickson <er...@gmail.com>
wrote:

> There are two issues:
>
> 1> autowarming on the replicas
>
> 2> Until https://issues.apache.org/jira/browse/SOLR-11982 (Solr 7.4,
> unreleased), requests would go to the leaders along with the PULL and
> TLOG replicas. Since the leaders were busily indexing, the entire
> query would suffer speed-wise.
>
> So what I'd do is see if you can apply the patch there and adjust your
> autowarming. Solr 7.4 will be out in the not-too-distant future,
> perhaps over the summer. No real schedule has been agreed on though,
>
> Best,
> Erick
>
> On Mon, May 21, 2018 at 9:23 PM, Nguyen Nguyen
> <ng...@gmail.com> wrote:
> > Hello everyone,
> >
> > I'm running SolrCloud cluster of 5 nodes with 5 shards and 3 replicas per
> > shard.  I usually see spikes in query performance during high indexing
> > period. I would like to have stable query response time even during high
> > indexing period.  I recently upgraded to Solr 7.3 and running with 2 TLOG
> > replicas and 1 PULL replica.  Using a small maxWriteMBPerSec for
> > replication and only query PULL replicas during indexing period, I'm
> still
> > seeing long query time for some queries (although not as often as before
> > the change).
> >
> > My first question is 'Is it possible to control replication of non-leader
> > like in master/slave configuration (eg: disablepoll, fetchindex)?'.  This
> > way, I can disable replication on the followers until committing is
> > completed on the leaders while sending query requests to the followers
> (or
> > just PULL replica) only.  Then when data is committed on leaders, I would
> > send query requests back to only leaders and tell the followers to start
> to
> > fetch the newly updated index.
> >
> > If manual replication control isn't possible, I'm planning to have
> > duplicate collections and use an alias to switch between the two
> collection
> > at different times.  For example: while 'collection1' collection being
> > indexed, and alias 'search' would point to 'collection2' collection to
> > serve query request.  Once indexing is completed on 'collection1',
> 'search'
> > alias would now point to 'collection1', and 'collection2' will be updated
> > to be in sync with 'collection1'.  The cycle repeats for  next indexing
> > cycle.  My question for this method would be if there is any existing
> > method to sync one collection to another so that I don't have to send the
> > same update requests to the two collections.
> >
> > Also wondering if there are other better methods everyone is using?
> >
> > Thanks much!
> >
> > Cheers,
> >
> > -Nguyen
>

Re: How to maintain fast query speed during heavy indexing?

Posted by Erick Erickson <er...@gmail.com>.
There are two issues:

1> autowarming on the replicas

2> Until https://issues.apache.org/jira/browse/SOLR-11982 (Solr 7.4,
unreleased), requests would go to the leaders along with the PULL and
TLOG replicas. Since the leaders were busily indexing, the entire
query would suffer speed-wise.

So what I'd do is see if you can apply the patch there and adjust your
autowarming. Solr 7.4 will be out in the not-too-distant future,
perhaps over the summer. No real schedule has been agreed on though,

Best,
Erick

On Mon, May 21, 2018 at 9:23 PM, Nguyen Nguyen
<ng...@gmail.com> wrote:
> Hello everyone,
>
> I'm running SolrCloud cluster of 5 nodes with 5 shards and 3 replicas per
> shard.  I usually see spikes in query performance during high indexing
> period. I would like to have stable query response time even during high
> indexing period.  I recently upgraded to Solr 7.3 and running with 2 TLOG
> replicas and 1 PULL replica.  Using a small maxWriteMBPerSec for
> replication and only query PULL replicas during indexing period, I'm still
> seeing long query time for some queries (although not as often as before
> the change).
>
> My first question is 'Is it possible to control replication of non-leader
> like in master/slave configuration (eg: disablepoll, fetchindex)?'.  This
> way, I can disable replication on the followers until committing is
> completed on the leaders while sending query requests to the followers (or
> just PULL replica) only.  Then when data is committed on leaders, I would
> send query requests back to only leaders and tell the followers to start to
> fetch the newly updated index.
>
> If manual replication control isn't possible, I'm planning to have
> duplicate collections and use an alias to switch between the two collection
> at different times.  For example: while 'collection1' collection being
> indexed, and alias 'search' would point to 'collection2' collection to
> serve query request.  Once indexing is completed on 'collection1', 'search'
> alias would now point to 'collection1', and 'collection2' will be updated
> to be in sync with 'collection1'.  The cycle repeats for  next indexing
> cycle.  My question for this method would be if there is any existing
> method to sync one collection to another so that I don't have to send the
> same update requests to the two collections.
>
> Also wondering if there are other better methods everyone is using?
>
> Thanks much!
>
> Cheers,
>
> -Nguyen