You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Taisuke Miyazaki <mi...@lifull.com> on 2020/04/15 05:31:14 UTC

On the delay in electing a leader when the leader is dead(Solr 7.5)

Hi,

Made Of: tlog replicas + pull replicas
Writing: leader and tlog replicas
Loading: pull replica only

Solr version: 7.5
Number of shards: 1
Write throughput: 10000 docs/minutes
Number of documents: 4,500,000
Size per document: about 4KB

During verification, the replay of the transaction log took a lot of time,
and it took 1 day and 6 hours to select a leader.
The tlog files are a few GB at best, so I don't think IO is the bottleneck,
but it's taking a lot longer than I imagined!

I'd like to make the leader election quicker because I can't write until
the leader is elected.
Is there a faster way to do it?

There is a mechanism to re-write records that have failed to write.
So, if I can't do leader selection quickly, I'm thinking of losing the
replica in the tlog and starting it automatically when the leader goes down
(if it's faster to recover that way).

Thank you to everyone in the community for always being so supportive.

Translated with www.DeepL.com/Translator (free version)

Re: On the delay in electing a leader when the leader is dead(Solr 7.5)

Posted by Taisuke Miyazaki <mi...@lifull.com>.

There are some additional things I'm curious about.

When I tested it in the development environment, I found that setting the
commit interval to 15 seconds was sufficient to select a leader.(Thank you.)
But if you commit every 15 seconds, doesn't it remake the "Searcher" each
time and clear the query result cache, document cache and filter cache more
often?
Each replica has about 10,000 read requests per minute, so I'm worried
about whether the latency gets worse quickly.

For now, I try to autowarm it.
The settings look like this.

<filterCache class="solr.FastLRUCache"
                 size="${solr.query.filterCache.size:50000}"
                 initialSize="${solr.query.filterCache.initialSize:10000}"

 autowarmCount="${solr.query.filterCache.autowarmCount:2000}"/>
    <queryResultCache class="solr.LRUCache"
                      size="${solr.query.queryResultCache.size:40000}"

initialSize="${solr.query.queryResultCache.initialSize:20000}"

autowarmCount="${solr.query.queryResultCache.autowarmCount:2000}"/>
    <documentCache class="solr.LRUCache"
                   size="${solr.query.documentCache.size:200000}"

 initialSize="${solr.query.documentCache.initialSize:200000}"

 autowarmCount="${solr.query.documentCache.autowarmCount:200000}"/>
    <fieldValueCache class="solr.FastLRUCache"
                     size="512"
                     autowarmCount="128"
                     showItems="32" />
    <cache name="perSegFilter"
           class="solr.search.LRUCache"
           size="10"
           initialSize="0"
           autowarmCount="10"
           regenerator="solr.NoOpRegenerator" />


Is there a way for a leader's re-election to be over in 20 minutes or so,
but without worsening the latency?


Translated with www.DeepL.com/Translator (free version)

2020年4月15日(水) 21:27 Erick Erickson <er...@gmail.com>:

> There’s no way leader election, even with tlog replay should take a day.
> 10,000 docs/minute doesn’t sound like enough to clog up
> replay either, so something’s definitely not what I’d expect.
>
> What is your hard commit interval? That controls how big the tlog
> is and thus how long it’d take to replay. Here’s more than you want to
> know about that:
>
>
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> I’d set it to, say, 15 seconds (openSearcher=false). This is entirely
> independent of the soft commit interval which governs the ability
> to search the docs…
>
> Best,
> Erick
>
> > On Apr 15, 2020, at 1:31 AM, Taisuke Miyazaki <
> miyazakitaisuke@lifull.com> wrote:
> >
> > Hi,
> >
> > Made Of: tlog replicas + pull replicas
> > Writing: leader and tlog replicas
> > Loading: pull replica only
> >
> > Solr version: 7.5
> > Number of shards: 1
> > Write throughput: 10000 docs/minutes
> > Number of documents: 4,500,000
> > Size per document: about 4KB
> >
> > During verification, the replay of the transaction log took a lot of
> time,
> > and it took 1 day and 6 hours to select a leader.
> > The tlog files are a few GB at best, so I don't think IO is the
> bottleneck,
> > but it's taking a lot longer than I imagined!
> >
> > I'd like to make the leader election quicker because I can't write until
> > the leader is elected.
> > Is there a faster way to do it?
> >
> > There is a mechanism to re-write records that have failed to write.
> > So, if I can't do leader selection quickly, I'm thinking of losing the
> > replica in the tlog and starting it automatically when the leader goes
> down
> > (if it's faster to recover that way).
> >
> > Thank you to everyone in the community for always being so supportive.
> >
> > Translated with www.DeepL.com/Translator (free version)
>
>

Re: On the delay in electing a leader when the leader is dead(Solr 7.5)

Posted by Erick Erickson <er...@gmail.com>.

There’s no way leader election, even with tlog replay should take a day.
10,000 docs/minute doesn’t sound like enough to clog up 
replay either, so something’s definitely not what I’d expect.

What is your hard commit interval? That controls how big the tlog
is and thus how long it’d take to replay. Here’s more than you want to
know about that:

https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

I’d set it to, say, 15 seconds (openSearcher=false). This is entirely
independent of the soft commit interval which governs the ability
to search the docs…

Best,
Erick

> On Apr 15, 2020, at 1:31 AM, Taisuke Miyazaki <mi...@lifull.com> wrote:
> 
> Hi,
> 
> Made Of: tlog replicas + pull replicas
> Writing: leader and tlog replicas
> Loading: pull replica only
> 
> Solr version: 7.5
> Number of shards: 1
> Write throughput: 10000 docs/minutes
> Number of documents: 4,500,000
> Size per document: about 4KB
> 
> During verification, the replay of the transaction log took a lot of time,
> and it took 1 day and 6 hours to select a leader.
> The tlog files are a few GB at best, so I don't think IO is the bottleneck,
> but it's taking a lot longer than I imagined!
> 
> I'd like to make the leader election quicker because I can't write until
> the leader is elected.
> Is there a faster way to do it?
> 
> There is a mechanism to re-write records that have failed to write.
> So, if I can't do leader selection quickly, I'm thinking of losing the
> replica in the tlog and starting it automatically when the leader goes down
> (if it's faster to recover that way).
> 
> Thank you to everyone in the community for always being so supportive.
> 
> Translated with www.DeepL.com/Translator (free version)