You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "michael.boom" <my...@yahoo.com> on 2013/11/14 11:39:24 UTC

Optimizing cores in SolrCloud

A few weeks ago optimization in SolrCloud was discussed in this thred:
http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020

The thread was covering the distributed optimization inside a collection.
My use case requires manually running optimizations every week or so,
because I do delete by query often, and deletedDocs number gets to huge
amounts, and the only way to regain that space is by optimizing.

Since I have a pretty steady high load, I can't do it over night and i was
thinking to do it one core at a time -> meaning optimizing shard1_replica1
and then shard1_replica2 and so on, using 
curl
'http://localhost:8983/solr/collection1_shard1_replica1/update?optimize=true&distrib=false'

My question is how would this reflect on the performance of the system? All
queries that would be routed to that shard replica would be very slow I
assume. 

Would there be any problems if a replica is optimized and another is not?
Anybody tried something like this? Any tips or stories ?
Thank you!



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimizing cores in SolrCloud

Posted by Walter Underwood <wu...@wunderwood.org>.
Earlier, you said that optimize is the only way that deleted documents are expunged. That is false. They are expunged when the segment they are in is merged. A forced merge (optimize) merges all segments, so will expunge all deleted document. But those documents will be expunged by merges eventually.

When you have deleted docs in the largest segment, you have to wait for a merge of that segment.

My best advice is to stop looking at the deleted documents count and worry about something that makes a difference to your users.

For about 10 years, I worked on Ultraseek Server, a search engine with the same design for merging and document deletion. With over 10K installations, we never had a customer who had a problem caused by deleted documents.

wunder

On Nov 14, 2013, at 7:41 AM, "michael.boom" <my...@yahoo.com> wrote:

> Thanks Erick!
> 
> That's a really interesting idea, i'll try it!
> Another question would be, when does the merging actually happens? Is it
> triggered or conditioned by something?
> 
> Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although
> I see a lot of merges in SPM, deleted documents aren't really going
> anywhere.
> For merging I have the example settings, haven't changed it.
> 
> 
> 
> 
> -----
> Thanks,
> Michael
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871p4100936.html
> Sent from the Solr - User mailing list archive at Nabble.com.





Re: Optimizing cores in SolrCloud

Posted by "michael.boom" <my...@yahoo.com>.
Thanks Erick!

That's a really interesting idea, i'll try it!
Another question would be, when does the merging actually happens? Is it
triggered or conditioned by something?

Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although
I see a lot of merges in SPM, deleted documents aren't really going
anywhere.
For merging I have the example settings, haven't changed it.




-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871p4100936.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimizing cores in SolrCloud

Posted by Erick Erickson <er...@gmail.com>.
I'm going to answer with something completely different <G>....

First, though, optimization happens in the background, so it
shouldn't have too big an impact on query performance outside of
I/O contention. There also "shouldn't" be any problem with one
shard being optimized and one not.

Second, have you considered tweaking some of the TieredMergePolicy
knobs? In particular.
reclaimDeletesWeight
which defaults to 2.0. You can set this in your solrconfig.xml. Through
a clever bit of reflection, you can actually set most (all?) of the
member vars in TieredMergePolicy.java.

Bumping up the weight might cause the segment merges to merge-away
the deleted docs frequently enough to satisfy you.

Best,
Erick


On Thu, Nov 14, 2013 at 5:39 AM, michael.boom <my...@yahoo.com> wrote:

> A few weeks ago optimization in SolrCloud was discussed in this thred:
>
> http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020
>
> The thread was covering the distributed optimization inside a collection.
> My use case requires manually running optimizations every week or so,
> because I do delete by query often, and deletedDocs number gets to huge
> amounts, and the only way to regain that space is by optimizing.
>
> Since I have a pretty steady high load, I can't do it over night and i was
> thinking to do it one core at a time -> meaning optimizing shard1_replica1
> and then shard1_replica2 and so on, using
> curl
> '
> http://localhost:8983/solr/collection1_shard1_replica1/update?optimize=true&distrib=false
> '
>
> My question is how would this reflect on the performance of the system? All
> queries that would be routed to that shard replica would be very slow I
> assume.
>
> Would there be any problems if a replica is optimized and another is not?
> Anybody tried something like this? Any tips or stories ?
> Thank you!
>
>
>
> -----
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>