You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeff Courtade <co...@gmail.com> on 2018/10/02 13:04:21 UTC

Opinions on index optimization...

We run an old master/slave solr 4.3.0 solr cluster

14 nodes 7/7
indexes average 47/5 gig per shard around 2 mill docs per shard.

We have constant daily additions and a small amount of deletes.

We optimize nightly currently and it is a system hog.

Is it feasible to never run optimize?

I ask because it seems like it would be very bad not to but this
information is out there apparently recommending exactly that... never
optimizing.

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/

https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

Re: Opinions on index optimization...

Posted by Erick Erickson <er...@gmail.com>.
The problem you're at now is that, having run optimize, that single
massive segment will accumulate deletes until it has < 2.5G "live"
documents. So once you do optimize (and until you get to Solr 7.5),
unless you can live with this one segment accumulating deletes for a
very long time, you must continue to optimize.

Or you could re-index from scratch if possible and never optimize.

Best,
Erick
On Tue, Oct 2, 2018 at 7:28 AM Walter Underwood <wu...@wunderwood.org> wrote:
>
> Don’t optimize. The first article isn’t as clear as it should be. The important sentence is "Unless you are running into resource problems, it’s best to leave merging alone.”
>
> I’ve been running Solr in production since version 1.3, with several different kinds and sizes of collections. I’ve never run a daily optimize, even on collections that only change once per day.
>
> The section titles "What? I can’t afford 50% “wasted” space” should have just been “Then don’t run Solr”. Really, you should have 100% free sapce, so a 22 Gb index would be on a volume with 22 Gb of free space.
>
> It was a mistake to name it “optimize”. It should have been “force merge”.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 2, 2018, at 6:04 AM, Jeff Courtade <co...@gmail.com> wrote:
> >
> > We run an old master/slave solr 4.3.0 solr cluster
> >
> > 14 nodes 7/7
> > indexes average 47/5 gig per shard around 2 mill docs per shard.
> >
> > We have constant daily additions and a small amount of deletes.
> >
> > We optimize nightly currently and it is a system hog.
> >
> > Is it feasible to never run optimize?
> >
> > I ask because it seems like it would be very bad not to but this
> > information is out there apparently recommending exactly that... never
> > optimizing.
> >
> > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> >
> > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >
> > https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations
>

Re: Opinions on index optimization...

Posted by Walter Underwood <wu...@wunderwood.org>.
Don’t optimize. The first article isn’t as clear as it should be. The important sentence is "Unless you are running into resource problems, it’s best to leave merging alone.”

I’ve been running Solr in production since version 1.3, with several different kinds and sizes of collections. I’ve never run a daily optimize, even on collections that only change once per day.

The section titles "What? I can’t afford 50% “wasted” space” should have just been “Then don’t run Solr”. Really, you should have 100% free sapce, so a 22 Gb index would be on a volume with 22 Gb of free space.

It was a mistake to name it “optimize”. It should have been “force merge”.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 2, 2018, at 6:04 AM, Jeff Courtade <co...@gmail.com> wrote:
> 
> We run an old master/slave solr 4.3.0 solr cluster
> 
> 14 nodes 7/7
> indexes average 47/5 gig per shard around 2 mill docs per shard.
> 
> We have constant daily additions and a small amount of deletes.
> 
> We optimize nightly currently and it is a system hog.
> 
> Is it feasible to never run optimize?
> 
> I ask because it seems like it would be very bad not to but this
> information is out there apparently recommending exactly that... never
> optimizing.
> 
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> 
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> 
> https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations


RE: Opinions on index optimization...

Posted by Markus Jelsma <ma...@openindex.io>.
There are a few bugs for which you require to merge the index, see SOLR-8807 and related bugs.

https://issues.apache.org/jira/browse/SOLR-8807

-----Original message-----
> From:Erick Erickson <er...@gmail.com>
> Sent: Wednesday 3rd October 2018 4:50
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Opinions on index optimization...
> 
> The problem you're at now is that, having run optimize, that single
> massive segment will accumulate deletes until it has < 2.5G "live"
> documents. So once you do optimize (and until you get to Solr 7.5),
> unless you can live with this one segment accumulating deletes for a
> very long time, you must continue to optimize.
> 
> Or you could re-index from scratch if possible and never optimize.
> 
> Best,
> Erick
> On Tue, Oct 2, 2018 at 7:28 AM Walter Underwood <wu...@wunderwood.org> wrote:
> >
> > Don’t optimize. The first article isn’t as clear as it should be. The important sentence is "Unless you are running into resource problems, it’s best to leave merging alone.”
> >
> > I’ve been running Solr in production since version 1.3, with several different kinds and sizes of collections. I’ve never run a daily optimize, even on collections that only change once per day.
> >
> > The section titles "What? I can’t afford 50% “wasted” space” should have just been “Then don’t run Solr”. Really, you should have 100% free sapce, so a 22 Gb index would be on a volume with 22 Gb of free space.
> >
> > It was a mistake to name it “optimize”. It should have been “force merge”.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Oct 2, 2018, at 6:04 AM, Jeff Courtade <co...@gmail.com> wrote:
> > >
> > > We run an old master/slave solr 4.3.0 solr cluster
> > >
> > > 14 nodes 7/7
> > > indexes average 47/5 gig per shard around 2 mill docs per shard.
> > >
> > > We have constant daily additions and a small amount of deletes.
> > >
> > > We optimize nightly currently and it is a system hog.
> > >
> > > Is it feasible to never run optimize?
> > >
> > > I ask because it seems like it would be very bad not to but this
> > > information is out there apparently recommending exactly that... never
> > > optimizing.
> > >
> > > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> > >
> > > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> > >
> > > https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations
> >
>