You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Papa Pappu <tu...@gmail.com> on 2018/04/24 05:56:52 UTC

Preventing solr cache flush when committing

Hi,
I've written down my query over stack-overflow. Here is the link for that :
https://stackoverflow.com/questions/49993681/preventing-solr-cache-flush-when-commiting

In short, I am facing troubles maintaining my solr caches when commits
happen and the question provides detailed description of the same.

Based on my use-case if someone can recommend what settings I should use or
practices I should follow it'll be really helpful.

Thanks and regards,
Dmitri

Re: Preventing solr cache flush when committing

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/23/2018 11:56 PM, Papa Pappu wrote:
> I've written down my query over stack-overflow. Here is the link for that :
> https://stackoverflow.com/questions/49993681/preventing-solr-cache-flush-when-commiting
>
> In short, I am facing troubles maintaining my solr caches when commits
> happen and the question provides detailed description of the same.

The information in Solr caches rely on Lucene internal doc IDs.

When changes to the index happen and a new searcher is created, there is
absolutely no guarantee that the Lucene document IDs will be the same as
they were on the old searcher.  Solr must assume that the IDs are
different, so it has no choice but to throw away its cache entries when
a new searcher is created.

> Based on my use-case if someone can recommend what settings I should use or
> practices I should follow it'll be really helpful.

This is similar information as you got in the SO post.  You can rely on
newSearcher cache warming, and the autowarming configured in the caches
themselves.  Be careful about making autowarmCount too large.  Large
values there can make commits very slow.

The basic advice for getting the most out of Solr caches is to put off
opening new searchers as long as you can.  Commit less frequently.

Thanks,
Shawn


Re: Preventing solr cache flush when committing

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Dmitri.

You can try to use segmented caches which are more realtime.
it should work like q={!parent which=COLOR:Blue v=''} instead of q=
COLOR:Blue
Make sure you have the following definition in solrconfig.xml this
regenerator should transfer filter bitsets between searchers.
  <query>
    <cache name="perSegFilter"
           class="solr.LRUCache"
           size="100"
           initialSize="10"
           autowarmCount="100%"
           regenerator="solr.NoOpRegenerator"/>
  </query>

It make sense to check this cache right after commit.

On Tue, Apr 24, 2018 at 8:56 AM, Papa Pappu <tu...@gmail.com> wrote:

> Hi,
> I've written down my query over stack-overflow. Here is the link for that :
> https://stackoverflow.com/questions/49993681/preventing-
> solr-cache-flush-when-commiting
>
> In short, I am facing troubles maintaining my solr caches when commits
> happen and the question provides detailed description of the same.
>
> Based on my use-case if someone can recommend what settings I should use or
> practices I should follow it'll be really helpful.
>
> Thanks and regards,
> Dmitri
>



-- 
Sincerely yours
Mikhail Khludnev

Re: Preventing solr cache flush when committing

Posted by Erick Erickson <er...@gmail.com>.
Had this typed up yesterday and forgot to send.

"Is there no way to ensure that the top level filter caches are not
expunged when some documents are added to the index and have the
changes available at the same time?"

no. And it's not something that you can do without major architectural
changes. When you commit, background merging kicks in which will
renumber the _internal_ Lucene document ID. This ID ranges 0-maxDoc
and is used as the bit to set in the filterCache object. So if you
preserved the filterCache, the bits will be wrong. The
queryResultCache is


"If that is the case, then do I need to always have to rely on warmup
of caches to get some documents in caches?"

Yes, that's exactly what the "autowarm" feature is on the caches. Also
the newSearcher event can be used to hand-craft warmup searches where
you know certain things about the index and you specifically want to
ensure certain warming.

Please start out with modest numbers for autowarm, as in 20-30. It's
very often the case that you don't need much more than that. What
those numbers do in filterCache and queryResultCache is re-execute the
associated fq or q clause, respectively.

"Are there any other approaches then warmup which folks usually do to
avoid this; if they want to build a fast searchable product and having
some write throughput as well?" and " I can't afford to get my cached
flushed".

What evidence do you have for this last statement?

"Currently I do commits via my indexing application (after every batch
of documents)"

Please, please, please do _not_ do this. It's especially egregious
because you do it after every batch of docs. So rather than flushing
your caches every 5 minutes (say), you hammer Solr with commit after
commit after commit. Configure your soft commit interval to your
latency requirements and forget about it. Or just configure hard
commit with openSearcher set to true. Or perhaps even just specify
commitWithin when you send docs to Solr. At a guess you may have seen
warnings about "too many on deck searchers" if your commit interval ls
shorter than your autowarm time.

I'll bend a little bit if the client only issues a commit at the very
end of the run and there's precisely one client running at a time and
you can _guarantee_ there's only one commit, but it's usually much
easier and more reliable to use the solr config settings.

Perhaps you're not entirely familiar with how openSearcher works, so
here's a brief review. This applies to either hard commit
(openSearcher=true) or soft commit.
1> a commit happens
2> a new searcher is being opened and autowarming kicks off
3> incoming searches are served by the _old_ searcher, using all the
_old_ caches.
4> autowarming completes
5a> incoming requests are routed to the new searcher
5b> the old searcher finishes serving the outstanding requests
received before <4> and closes
6> the old caches are flushed.

So having high read throughput

On Tue, Apr 24, 2018 at 10:36 AM, Lee Carroll
<le...@googlemail.com> wrote:
> From memory try the following:
> Don't manually commit from client after batch indexing
> set soft commit to be a a long time interval. As long as acceptable to run
> stale, say 5 mins or longer if you can.
> set hard commit to be short   (seconds ) to keep everything neat and tidy
> regards updates and avoid backing up log files
> set opensearcher=false
>
> I'm pretty sure that works for at least one of our indices. It's worth a go.
>
> Lee C
>
> On 24 April 2018 at 06:56, Papa Pappu <tu...@gmail.com> wrote:
>
>> Hi,
>> I've written down my query over stack-overflow. Here is the link for that :
>> https://stackoverflow.com/questions/49993681/preventing-
>> solr-cache-flush-when-commiting
>>
>> In short, I am facing troubles maintaining my solr caches when commits
>> happen and the question provides detailed description of the same.
>>
>> Based on my use-case if someone can recommend what settings I should use or
>> practices I should follow it'll be really helpful.
>>
>> Thanks and regards,
>> Dmitri
>>

Re: Preventing solr cache flush when committing

Posted by Lee Carroll <le...@googlemail.com>.
From memory try the following:
Don't manually commit from client after batch indexing
set soft commit to be a a long time interval. As long as acceptable to run
stale, say 5 mins or longer if you can.
set hard commit to be short   (seconds ) to keep everything neat and tidy
regards updates and avoid backing up log files
set opensearcher=false

I'm pretty sure that works for at least one of our indices. It's worth a go.

Lee C

On 24 April 2018 at 06:56, Papa Pappu <tu...@gmail.com> wrote:

> Hi,
> I've written down my query over stack-overflow. Here is the link for that :
> https://stackoverflow.com/questions/49993681/preventing-
> solr-cache-flush-when-commiting
>
> In short, I am facing troubles maintaining my solr caches when commits
> happen and the question provides detailed description of the same.
>
> Based on my use-case if someone can recommend what settings I should use or
> practices I should follow it'll be really helpful.
>
> Thanks and regards,
> Dmitri
>