You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Himanshu Sachdeva <hi...@limeroad.com> on 2017/04/06 11:00:15 UTC

Solr Index size keeps fluctuating, becomes ~4x normal size.

Hi all,

We use solr in our website for product search. Currently, we have 2.1
million documents in the products core and these documents each have around
350 fields. >90% of the fields are indexed. We have this master instance of
solr running on 15GB RAM and 200GB drive. We have also configured 10 slaves
for handling the reads from website. Slaves poll master at an interval of
20 minutes. We monitored the index size for a few days and found that it
varies widely from 11GB to 43GB.



Recently, we started getting a lot of out of memory errors on the master.
Everytime, solr becomes unresponsive and we need to restart jetty to bring
it back up. At the same we observed the variation in index size. We are
suspecting that these two problems may be linked.

What could be the reason that the index size becomes almost 4x?  Why does
it vary so much? Any pointers will be appreciated. If you need any more
details on the config, please let me know.

-- 
Himanshu Sachdeva

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

Posted by Toke Eskildsen <to...@kb.dk>.

On Mon, 2017-04-10 at 13:27 +0530, Himanshu Sachdeva wrote:
> Thanks for your time and quick response. As you said, I changed our
> logging level from SEVERE to INFO and indeed found the performance
> warning *Overlapping onDeckSearchers=2* in the logs.

If you only see it occasionally, it is probably not a problem. If you
see it often, that means that you are re-opening at a high rate,
relative to the time it takes for a searcher to be ready.

Since each searcher holds a lock on the files it searches, and you have
multiple concurrent open searchers on a volatile index, that helps
explain the index size fluctuations.

Each searcher also requires heap, which might explain why you get Out
Of Memory errors.

This all boils down to avoid having (too many) overlapping warming
searchers. 

* Reduce your auto-warm if it is high
* Prolong the time between searcher-opening commits
* Check that you have docValues on fields that you facet or group on

> I am considering limiting the *maxWarmingSearchers* count in
> configuration but want to be sure that nothing breaks in production
> in case simultaneous commits do happen afterwards.

That is one way of doing it, but it does not help you pinpoint where
your problem is. 

> What would happen if we set *maxWarmingSearchers* count to 1 and make
> simultaneous commit from different endpoints? I understand that solr
> will prevent opening a new searcher for the second commit but is that
> all there is to it? Does it mean solr will serve stale data( i.e.
> send stale data to the slaves) ignoring the changes from the second
> commit? [...]

Sorry, I am not that familiar with the details of master-slave-setups.
-- 
Toke Eskildsen, Royal Danish Library

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/10/2017 1:57 AM, Himanshu Sachdeva wrote:
> Thanks for your time and quick response. As you said, I changed our
> logging level from SEVERE to INFO and indeed found the performance
> warning *Overlapping onDeckSearchers=2* in the logs. I am considering
> limiting the *maxWarmingSearchers* count in configuration but want to
> be sure that nothing breaks in production in case simultaneous commits
> do happen afterwards.

Don't do commits from multiple sources.  A good general practice with
Solr is to either use autoSoftCommit or add a commitWithin parameter to
each indexing request, so commits are fully automated and can't
overlap.  Make the interval on whichever method you use as large as you
can.  I would personally use 60000 (one minute) as a bare minimum, and
would prefer a larger number.

A soft commit takes less time/resources than a hard commit that opens a
searcher, but they are NOT even close to "free".  Opening the searcher
(which all soft commits do) is the expensive part, not the commit itself.

Regardless of what else you do, you should have autoCommit configured
with openSearcher set to false.  I would personally use a maxTime of
60000 (one minute) or 120000 (two minutes) for autoCommit. 
Recommendations and example configs will commonly have this set to 15
seconds.  That value works well, and does not usually cause problems,
but I like to put less of a load on the server, so I use a larger interval.

See this blog post for a detailed discussion:

https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

> What would happen if we set *maxWarmingSearchers* count to 1 and make
> simultaneous commit from different endpoints? I understand that solr
> will prevent opening a new searcher for the second commit but is that
> all there is to it? Does it mean solr will serve stale data( i.e. send
> stale data to the slaves) ignoring the changes from the second commit?
> Will these changes reflect only when a new searcher is initialized and
> will they be ignored till then? Do we even need searchers on the
> master as we will be querying only the slaves? What purpose do the
> searchers serve exactly? Your time and guidance will be very much
> appreciated. Thank you. 

If the maxWarmingSearchers value prevents a commit from opening a
searcher, then changes between the previous commit and that commit will
not be visible *on the master* until a later commit happens and IS able
to open a new searcher.  What happens on the slaves may be a little bit
different, because commits normally only happen on the slave when a
changed index is replicated from the master.

The usual historical number for maxWarmingSearchers in example configs
on older versions is 2, while the intrinsic default is no limit
(Integer.MAX_VALUE).  Starting with 6.4.0, the intrinsic default has
been changed to 1, and the configuration has been removed from the
example configs.  Increasing it is almost always the wrong thing to do,
which is why the default has been lowered to 1.

https://issues.apache.org/jira/browse/SOLR-9712

https://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

On the master, you should set up automatic commits as I described above
and do not make explicit commit requests from update clients.  On the
slaves, autoCommit should be set up just like the master, but the other
automatic settings aren't typically necessary.  On slaves, as already
mentioned, commits only happen when the index is replicated from the
master -- you generally don't need to worry about any special
commit-related configuration, aside from making sure that the
autowarmCount value on the caches is not too high.  Masters that do not
receive queries can have autowarmCount set to zero, which can improve
commit speed by making the searcher open faster.

To fix problems with exceeding the warming searcher limit, you must
reduce the commit frequency or make commits happen faster.

Side issue:  If you don't want the verbosity of INFO logging, which is
really noisy, set it to WARN.  A properly configured Solr server that is
not having problems should not log ANYTHING when the severity is WARN. 
If the configuration is not optimal, you may see some WARN messages. 
Setting the level to SEVERE is extremely restrictive, and will prevent
you from seeing informative error messages when problems happen.

Recent Solr versions do have a tendency to log information like this
repeatedly, followed by a stacktrace:

2017-04-14 19:40:00.207 WARN  (qtp895947612-598) [   x:spark2live]
o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_o0e]
java.nio.file.NoSuchFileException:
/index/solr6/data/data/spark2_0/index/segments_o0e

We have an issue filed for this message, but it hasn't yet been fixed. 
It does not seem to cause actual problems, just an annoying log
message.  Until the reason for this error is found and the problem is
fixed, the message can be eliminated from the logs without omitting
other problems by changing the level on
org.apache.solr.handler.admin.LukeRequestHandler to ERROR.  This can
either be done in the logging UI, or if you don't want to do it manually
after every restart, in log4.properties.

https://issues.apache.org/jira/browse/SOLR-9120

Thanks,
Shawn

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

Posted by kshitij tyagi <ks...@gmail.com>.

Hi Himanshu,

maxWarmingSearchers would break nothing on production. Whenever you request
solr to open a new searcher, it autowarms the searcher so that it can
utilize caching. After autowarm is complete a new searcher is opened.

The questions you need to adress here are

1. Are you using soft-commit or hard-commit? If you are using hard commit
and update frequency is high then you need to switch to soft-commit.

2. you are dealing with only 2.1 millions that is a small set but stilll
you are facing issues, why are you indexing all the fields in solr?
you need to make significant changes in schema and index only those fields
upon which you are querying and not index all the fields.

3. Check you segment count configuration in solrconfig.xml, it should not
be too high or too low as it will affect indexing speed, a high number
would give good indexing speed but a low search result.

Hope these things would help you to tune better.

Regards,
Kshitij

On Mon, Apr 10, 2017 at 1:27 PM, Himanshu Sachdeva <hi...@limeroad.com>
wrote:

> Hi Toke,
>
> Thanks for your time and quick response. As you said, I changed our logging
> level from SEVERE to INFO and indeed found the performance warning
> *Overlapping
> onDeckSearchers=2* in the logs. I am considering limiting the
> *maxWarmingSearchers* count in configuration but want to be sure that
> nothing breaks in production in case simultaneous commits do happen
> afterwards.
>
> What would happen if we set *maxWarmingSearchers* count to 1 and make
> simultaneous commit from different endpoints? I understand that solr will
> prevent opening a new searcher for the second commit but is that all there
> is to it? Does it mean solr will serve stale data( i.e. send stale data to
> the slaves) ignoring the changes from the second commit? Will these changes
> reflect only when a new searcher is initialized and will they be ignored
> till
> then? Do we even need searchers on the master as we will be querying only
> the slaves? What purpose do the searchers serve exactly? Your time and
> guidance will be very much appreciated. Thank you.
>
> On Thu, Apr 6, 2017 at 6:12 PM, Toke Eskildsen <to...@kb.dk> wrote:
>
> > On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote:
> > > We monitored the index size for a few days and found that it varies
> > > widely from 11GB to 43GB.
> >
> > Lucene/Solr indexes consists of segments, each holding a number of
> > documents. When a document is deleted, its bytes are not removed
> > immediately, only marked. When a document is updated, it is effectively
> > a delete and an add.
> >
> > If you have an index with 3 documents
> >   segment-0 (live docs [0, 1, 2], deleted docs [])
> > and update document 0 and 1, you will have
> >   segment-0 (live docs [2], deleted docs [0, 1])
> >   segment-1 (live docs
> > [0, 1], deleted docs [])
> > if you then update document 1 again, you will
> > have
> >   segment-0 (live docs [2], deleted docs [0, 1])
> >   segment-1 (live
> > docs [0], deleted docs [1])
> >   segment-1 (live docs [1], deleted docs [])
> >
> > for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents.
> >
> > The space is reclaimed when segments are merged, but depending on your
> > setup and update pattern that may take some time. Furthermore there is a
> > temporary overhead of merging, when the merged segment is being written
> and
> > the old segments are still available. 4x the minimum size is fairly
> large,
> > but not unrealistic, with enough index-updates.
> >
> > > Recently, we started getting a lot of out of memory errors on the
> > > master. Everytime, solr becomes unresponsive and we need to restart
> > > jetty to bring it back up. At the same we observed the variation in
> > > index size. We are suspecting that these two problems may be linked.
> >
> > Quick sanity check: Look for "Overlapping onDeckSearchers" in your
> > solr.log to see if your memory problems are caused by multiple open
> > searchers:
> > https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm
> > ingSearchers.3DX.22_mean.3F
> > --
> > Toke Eskildsen, Royal Danish Library
> >
>
>
>
> --
> Himanshu Sachdeva
>

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

Posted by Himanshu Sachdeva <hi...@limeroad.com>.

Hi Toke,

Thanks for your time and quick response. As you said, I changed our logging
level from SEVERE to INFO and indeed found the performance warning *Overlapping
onDeckSearchers=2* in the logs. I am considering limiting the
*maxWarmingSearchers* count in configuration but want to be sure that
nothing breaks in production in case simultaneous commits do happen
afterwards.

What would happen if we set *maxWarmingSearchers* count to 1 and make
simultaneous commit from different endpoints? I understand that solr will
prevent opening a new searcher for the second commit but is that all there
is to it? Does it mean solr will serve stale data( i.e. send stale data to
the slaves) ignoring the changes from the second commit? Will these changes
reflect only when a new searcher is initialized and will they be ignored till
then? Do we even need searchers on the master as we will be querying only
the slaves? What purpose do the searchers serve exactly? Your time and
guidance will be very much appreciated. Thank you.

On Thu, Apr 6, 2017 at 6:12 PM, Toke Eskildsen <to...@kb.dk> wrote:

> On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote:
> > We monitored the index size for a few days and found that it varies
> > widely from 11GB to 43GB.
>
> Lucene/Solr indexes consists of segments, each holding a number of
> documents. When a document is deleted, its bytes are not removed
> immediately, only marked. When a document is updated, it is effectively
> a delete and an add.
>
> If you have an index with 3 documents
>   segment-0 (live docs [0, 1, 2], deleted docs [])
> and update document 0 and 1, you will have
>   segment-0 (live docs [2], deleted docs [0, 1])
>   segment-1 (live docs
> [0, 1], deleted docs [])
> if you then update document 1 again, you will
> have
>   segment-0 (live docs [2], deleted docs [0, 1])
>   segment-1 (live
> docs [0], deleted docs [1])
>   segment-1 (live docs [1], deleted docs [])
>
> for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents.
>
> The space is reclaimed when segments are merged, but depending on your
> setup and update pattern that may take some time. Furthermore there is a
> temporary overhead of merging, when the merged segment is being written and
> the old segments are still available. 4x the minimum size is fairly large,
> but not unrealistic, with enough index-updates.
>
> > Recently, we started getting a lot of out of memory errors on the
> > master. Everytime, solr becomes unresponsive and we need to restart
> > jetty to bring it back up. At the same we observed the variation in
> > index size. We are suspecting that these two problems may be linked.
>
> Quick sanity check: Look for "Overlapping onDeckSearchers" in your
> solr.log to see if your memory problems are caused by multiple open
> searchers:
> https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm
> ingSearchers.3DX.22_mean.3F
> --
> Toke Eskildsen, Royal Danish Library
>

-- 
Himanshu Sachdeva

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

Posted by Toke Eskildsen <to...@kb.dk>.

On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote:
> We monitored the index size for a few days and found that it varies
> widely from 11GB to 43GB. 

Lucene/Solr indexes consists of segments, each holding a number of
documents. When a document is deleted, its bytes are not removed
immediately, only marked. When a document is updated, it is effectively
a delete and an add.

If you have an index with 3 documents
  segment-0 (live docs [0, 1, 2], deleted docs [])
and update document 0 and 1, you will have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live docs
[0, 1], deleted docs [])
if you then update document 1 again, you will
have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live
docs [0], deleted docs [1])
  segment-1 (live docs [1], deleted docs [])

for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents.

The space is reclaimed when segments are merged, but depending on your setup and update pattern that may take some time. Furthermore there is a temporary overhead of merging, when the merged segment is being written and the old segments are still available. 4x the minimum size is fairly large, but not unrealistic, with enough index-updates.

> Recently, we started getting a lot of out of memory errors on the
> master. Everytime, solr becomes unresponsive and we need to restart
> jetty to bring it back up. At the same we observed the variation in
> index size. We are suspecting that these two problems may be linked.

Quick sanity check: Look for "Overlapping onDeckSearchers" in your
solr.log to see if your memory problems are caused by multiple open
searchers:
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm
ingSearchers.3DX.22_mean.3F
-- 
Toke Eskildsen, Royal Danish Library