You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by vsolakhian <vi...@zoominfo.com> on 2016/09/20 22:13:24 UTC

Very Slow Commits After Solr Index Optimization

We are using Solr Cloud 4.10.3-cdh5.4.5 that is part of CLoudera CDH 5.4.5.
Our collection (one shard with three replicas) became really big and we
decided to delete some old records to improve performance (tests in staging
environment have shown that after reaching 500 million records the index
becomes very slow and Solr is less responsive). After deleting about 100
million records (out of 260 mil.), they were still shown as "Deleted Docs'
in Solr Admin Statistics page. This page was showing 'Optimized: No (red)'
and 'Current: No (red)'. Theoretically, having 100 million deleted (but not
removed) records would be a performance issue and also, people tend to have
clean picture.

Information found in Solr forums was that the only way to removed deleted
records is to optimize the index.

We knew that optimization is not a good idea and it was discussed in forums
that it should be completely removed from API and Solr Admin, but discussing
is one thing and doing it is another. To make the story short, we tried to
optimize through Solr API to remove deleted records:

URL=http://<host>:8983/solr/<Collection>/update
curl "$URL?optimize=true&maxSegments=18&waitFlush=true"

and all three replicas of the collection were merged to 18 segments and Solr
Admin was showing "Optimized: Yes (green)", but the deleted records were not
removed (which is an inconsistency with Solr Admin or a bug in the API).
Finally, because people usually trust features fuond in UI (even if official
documentation is not found, see
https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface),
the "Optimize Now" button in Solr Admin was pressed and it removed all
deleted records and made the collection look very good (in UI). Here is the
problem:

1. The index was reduced to one large (60 GB) segment (some people's opinion
is that it is good, but I doubt).
2. Our use case includes batch updates and then a soft commit (after which
the user sees results). Commit operation that was taking about 1.5 minutes
now takes from 12 to 25 minutes.

Overall performance of our application is severely degraded.

I am not going to talk about how confusing Solr optimization is, but I am
asking if anyone knows *what caused slowness of the commit operation after
optimization*. If the issue is having a large segment, then how is it
possible to split this segment into smaller ones (without sharding)?

Thanks,

Victor

--
View this message in context: http://lucene.472066.n3.nabble.com/Very-Slow-Commits-After-Solr-Index-Optimization-tp4297022.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Very Slow Commits After Solr Index Optimization

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/22/2016 3:27 PM, vsolakhian wrote:
> This is not the cause of the problem though. The disk cache is
> important for queries and overall performance during optimization, but
> once it is done, everything should go back to "normal" (whatever that
> normal is). In our case it is the SOFT COMMIT (that opens a new
> Searcher) that takes 10 times longer AFTER the index was optimized and
> deleted records were removed (and index size went down to 60 GB).

It's difficult to say without hard numbers, and that is complicated by
my very limited understanding of how HDFS gets cached.

"Normal" is achieved only when relevant data is in the disk cache. 
Which will most likely not be the case after an optimize, unless you
have enough caching memory for both the before and after index to fit at
the same time.  Similar performance issues are likely to occur right
after a server reboot.

A soft commit opens a new searcher.  When a new searcher is opened, the
*Solr* caches (which are entirely different from the disk cache) look at
their autowarmCount settings.  Each cache gathers the top N queries
contained in the cache, up to the autowarmCount number, and proceeds to
execute the those queries on the index to create a brand new cache for
the new searcher.  The new searcher is not put into place until the
warming is done.  The commit will not finish until the new searcher is
online.

If the info sitting in the OS disk cache when the warming queries happen
is not useful for fast queries, then those queries will be very slow,
which makes the commit take longer.

For better commit times, reduce autowarmCount on your Solr caches.  This
will make it more likely that users will notice slow queries, though.

Good Solr performance with large indexes requires a LOT of memory.  The
amount required is usually very surprising to admins.

Thanks,
Shawn

Re: Very Slow Commits After Solr Index Optimization

Posted by vsolakhian <vi...@zoominfo.com>.

Thanks again, Shawn.

You are completely right about the use of disk cache and the special note
regarding the optimize operation in Solr wiki.

This is not the cause of the problem though. The disk cache is important for
queries and overall performance during optimization, but once it is done,
everything should go back to "normal" (whatever that normal is). In our case
it is the SOFT COMMIT (that opens a new Searcher) that takes 10 times longer
AFTER the index was optimized and deleted records were removed (and index
size went down to 60 GB).

Regards,

Victor



--
View this message in context: http://lucene.472066.n3.nabble.com/Very-Slow-Commits-After-Solr-Index-Optimization-tp4297022p4297588.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Very Slow Commits After Solr Index Optimization

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/22/2016 1:01 PM, vsolakhian wrote:
> Our index is in HDFS, but we did not change any configuration after we
> deleted 35% of records and optimized.
>
> The relatively slow commit (soft commit and warming up took 1.5 minutes) is
> OK for our use case (adding hundreds of thousands and even millions of
> records and then committing).
>
> The question is why it takes much longer after optimization, when disk
> caches, network and other configuration remained the same and the index is
> smaller?

When you optimize an index down to one segment, you are reading one
entire copy of the index and creating a second copy of the index.  This
is going to greatly affect the data that is in the disk cache.

Presumably you do not have enough caching memory to hold anywhere near
the entire 300GB index.  Memory sizes that large are possible, but not
common.  With HDFS, I think the amount of memory used for caching is
configurable.  I do not know if both HDFS clients and servers can do
caching, or if that's just a server-side option.  With a 300GB index,
150 to 250GB of memory should be available for caching if you want to
have stellar performance.  If you can get the entire 300GB to fit, then
you'd nearly be guaranteed good performance.

Assuming I'm right about the amount of caching memory available relative
to the index size, when the optimize is finished, chances are very good
that the particular data sitting in the disk cache is completely useless
for queries, so the first few warming and user queries will need to
actually read the *disk*, and put different data in the cache.  When
enough queries have been processed, eventually the disk cache will be
populated with enough relevant data that subsequent queries will be fast.

If there are other programs or Solr indexes competing for the same
caching memory, then the problem might be even worse.

You might want to refrain from optimizing indexes this large, at least
on a frequent basis, and just rely on normal index merging to handle
your deletes.

Optimizing is a special case when it comes to cache memory, and for
that, you need even more than in the general case.  There's a special
note about optimizes here:

https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

Thanks,
Shawn

Re: Very Slow Commits After Solr Index Optimization

Posted by vsolakhian <vi...@zoominfo.com>.

Hi Shawn,

Thank you for response. Everything you said is correct in general.

Our index is in HDFS, but we did not change any configuration after we
deleted 35% of records and optimized.

The relatively slow commit (soft commit and warming up took 1.5 minutes) is
OK for our use case (adding hundreds of thousands and even millions of
records and then committing).

The question is why it takes much longer after optimization, when disk
caches, network and other configuration remained the same and the index is
smaller?

Thanks,

Victor



--
View this message in context: http://lucene.472066.n3.nabble.com/Very-Slow-Commits-After-Solr-Index-Optimization-tp4297022p4297548.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Very Slow Commits After Solr Index Optimization

Posted by Shawn Heisey <ap...@elyograg.org>.

On 9/20/2016 4:13 PM, vsolakhian wrote:
> We knew that optimization is not a good idea and it was discussed in forums
> that it should be completely removed from API and Solr Admin, but discussing
> is one  thing and doing it is another. To make the story short, we tried to
> optimize through Solr API to remove deleted records:
>
>     URL=http://<host>:8983/solr/<Collection>/update
>     curl "$URL?optimize=true&maxSegments=18&waitFlush=true"
>
> and all three replicas of the collection were merged to 18 segments and Solr
> Admin was showing "Optimized: Yes (green)", but the deleted records were not
> removed (which is an inconsistency with Solr Admin or a bug in the API).

Very likely the deleted documents were contained in segments that were
NOT merged, and made up the total final segment count of 18. An optimize
will only guarantee all deleted documents are gone if it optimizes to
ONE segment, which is what the "Optimize" button in the admin UI does.

> Finally, because people usually trust features fuond in UI (even if official
> documentation is not found, see
> https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface),
> the "Optimize Now" button in Solr Admin was pressed and it removed all
> deleted records and made the collection look very good (in UI). Here is the
> problem:
>
> 1. The index was reduced to one large (60 GB) segment (some people's opinion
> is that it is good, but I doubt).
> 2. Our use case includes batch updates and then a soft commit (after which
> the user sees results). Commit operation that was taking about 1.5 minutes
> now takes from 12 to 25 minutes.
>
> Overall performance of our application is severely degraded.
>
> I am not going to talk about how confusing Solr optimization is,  but I am
> asking if anyone knows *what caused slowness of the commit operation after
> optimization*. If the issue is having a large segment, then how is it
> possible to split this segment into smaller ones (without sharding)?

Best guess is that actual disk I/O was required after the optimization,
because the important parts of the index were no longer in the OS disk
cache.  For good performance, Solr requires that data be cached and
immediately available -- disks are slow.  Performance would likely
increase as additional queries were made until it returned to normal.

If your indexes are in a filesystem local to the Solr server, then you
probably need more memory in the Solr server (not allocated to the Java
heap).  If they are in a remote filesystem (HDFS, NFS, etc) then the
remote filesystem device/server might need more memory and/or
configuration adjustments.  The speed of the network might be a factor
with remote filesystems.

Side note:  A commit that takes 1.5 minutes is ALREADY very slow. 
Commits should normally take seconds.  Well-tuned NRT environments will
probably have commit times well below one second.

Here's some specific info on slow commits:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

Thanks,
Shawn