You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "michael.boom" <my...@yahoo.com> on 2013/10/24 12:37:33 UTC

SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

Hi!

I have a SolrCloud setup, on two servers 3 shards, replicationFactor=2.
Today I trigered the optimization on core *shard2_replica2* which only
contained 3M docs, and 2.7G.
The size of the other shards were shard3=2.7G and shard1=48G (the routing is
implicit but after some update deadlocks and restarts the shard range in
Zookeeper got null and everything since then apparently got indexed to
shard1)

So, half an hour after i triggered the optimization, via the Admin UI, i
noticed that used space was increasing alot on *both servers* for cores
*shard1_replica1 and shard1_replica2*. 
It was now 67G and increasing. In the end after about 40 minutes from the
start operation shard1 was done optimizing on both servers leaving
shard1_replica1 and shard1_replica2 at about 33G.

Any idea what is happening and why the core on which i wanted the
optimization to happen, got no optimization and instead another shard got
optimized, on both servers?



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimizations-of-all-cores-in-that-collection-tp4097499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

Posted by Shawn Heisey <so...@elyograg.org>.
On 10/25/2013 2:29 PM, michael.boom wrote:
> As for why I am optimizing, well i do lots of delete by id and by query and
> after a while about 30% of maxDocs are deletedDocs. On a 50G index that
> means about 15G of space which I am trying to free by doing the
> optimization.
>
> "it's usually better NOT to optimize...."
> Could you provide some more details on this?
> Thank you!

Improvements in Lucene have made performance better on multi-segment 
indexes than it was in the past.  There is still a small performance 
gain when optimizing multiple segments down to one, but it's not as much 
as it once was.

Optimizing is the only real way to shrink the index when there are large 
numbers of deleted documents, so in your case, doing an optimize is not 
a bad thing.  It might be the kind of thing you manually trigger when 
you notice that there are a lot of deleted documents.

All of the arguments against optimization really boil down to one, and 
it's a really good one.  Optimization rewrites your entire index.  This 
means it has to read the whole thing, look at each document, and write 
non-deleted documents back out.  This takes some of your CPU resources, 
but it's not usually a lot on modern hardware.  The part that's really 
bad is that it generates a HUGE amount of I/O, and unless you have 
enough extra RAM to hold your index *twice*, will generally result in 
the OS disk cache being far less efficient while it's happening.  This 
major I/O burden will generally make queries very slow while the 
optimize is happening.

Some might make the argument that optimizing requires a lot of disk 
space, but regular merges during indexing can result in the same 
behavior, so it's always recommended that you have enough space for 2-3 
times your actual index size.

If optimizes happen really fast because your index is not very big, or 
you have a period of time during the day or night where your index is 
mostly idle, then it can make a lot of sense to do regular optimizes for 
performance reasons or to shrink the index when there are a lot of deletes.

Thanks,
Shawn


Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

Posted by "michael.boom" <my...@yahoo.com>.
Thanks Erick! I will try specifying the distrib parameter.

As for why I am optimizing, well i do lots of delete by id and by query and
after a while about 30% of maxDocs are deletedDocs. On a 50G index that
means about 15G of space which I am trying to free by doing the
optimization.

"it's usually better NOT to optimize.... "
Could you provide some more details on this?
Thank you!



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-tp4097499p4097828.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

Posted by Erick Erickson <er...@gmail.com>.
I don't know if this works for optimizing or not, but try attaching
&distrib=false to the optimization request.

Hmmm, something that might be added to the UI, any admin UI guys listening?
:).

But I have to ask why you're optimizing anyway. Unless you have a very
specific reason,
it's usually better NOT to optimize....

Best,
Erick


On Thu, Oct 24, 2013 at 6:37 AM, michael.boom <my...@yahoo.com> wrote:

> Hi!
>
> I have a SolrCloud setup, on two servers 3 shards, replicationFactor=2.
> Today I trigered the optimization on core *shard2_replica2* which only
> contained 3M docs, and 2.7G.
> The size of the other shards were shard3=2.7G and shard1=48G (the routing
> is
> implicit but after some update deadlocks and restarts the shard range in
> Zookeeper got null and everything since then apparently got indexed to
> shard1)
>
> So, half an hour after i triggered the optimization, via the Admin UI, i
> noticed that used space was increasing alot on *both servers* for cores
> *shard1_replica1 and shard1_replica2*.
> It was now 67G and increasing. In the end after about 40 minutes from the
> start operation shard1 was done optimizing on both servers leaving
> shard1_replica1 and shard1_replica2 at about 33G.
>
> Any idea what is happening and why the core on which i wanted the
> optimization to happen, got no optimization and instead another shard got
> optimized, on both servers?
>
>
>
> -----
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimizations-of-all-cores-in-that-collection-tp4097499.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

Posted by "michael.boom" <my...@yahoo.com>.
Thanks @Mark & @Erick

Should I create a JIRA issue for this ?



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-tp4097499p4098020.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?

Posted by Mark Miller <ma...@gmail.com>.
On Oct 24, 2013, at 6:37 AM, michael.boom <my...@yahoo.com> wrote:

> Any idea what is happening and why the core on which i wanted the
> optimization to happen, got no optimization and instead another shard got
> optimized, on both servers?

Sounds like a bug we should fix. If you don’t specify distrib=false, it should optimize your whole collection. I’ve never really looked into this though - I’m sure we need some tests.

- Mark