You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sesha Sendhil Subramanian <se...@indix.com> on 2014/02/06 07:20:26 UTC

Optimize Index in solr 4.6

Hi,

I am running solr cloud with 10 shards. I do a batch indexing once everyday
and once indexing is done I call optimize.

I see that optimize happens on each shard one at a time and not in
parallel. Is it possible for the optimize to happen in parallel? Each shard
is on a separate box.

Thanks
Sesha

Re: Optimize Index in solr 4.6

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/6/2014 4:00 AM, Shawn Heisey wrote:
> I would not recommend it, but if you know for sure that your
> infrastructure can handle it, then you should be able to optimize them
> all at once by sending parallel optimize requests with distrib=false
> directly to the Solr cores that hold the shard replicas, not the collection.

Followup on this thread:

Evidence now suggests (thank you, Yago!) that sending an optimize 
request with distrib="false" might *NOT* optimize just the core that 
receives the request.  I can confirm that this is the case on a 
SolrCloud 4.2.1 setup with one shard and replicationFactor=2.  It 
optimized that core, then when that was finished, optimized the other 
replica.

I would have already filed an issue in Jira, except that I do not 
currently have any way to test this on 4.6.1, so I do not know if this 
is still the way it works.  Also, I do not have a distributed SolrCloud 
index available.  I will be looking into writing a unit test, but my 
grasp of SolrCloud tests is very weak.

Thanks,
Shawn


Re: Optimize Index in solr 4.6

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/5/2014 11:20 PM, Sesha Sendhil Subramanian wrote:
> I am running solr cloud with 10 shards. I do a batch indexing once everyday
> and once indexing is done I call optimize.
> 
> I see that optimize happens on each shard one at a time and not in
> parallel. Is it possible for the optimize to happen in parallel? Each shard
> is on a separate box.

I assume that you are optimizing the collection, and that SolrCloud is
taking care of the optimization of each core automatically.  I've not
looked into how this works, so I could be completely wrong about this.
If this is what you are doing, here's my best guess as to why it works
the way it does:

Optimizing an index is extremely I/O intensive.  The full index must be
read from the original files and re-written.  Unless the index is small
or available RAM is very large, it is also likely that doing an optimize
will temporarily push relevant data out of the OS disk cache.  This has
a strong negative impact on performance.  If you do this on all your
shards at once, the performance impact could be catastrophic, even if
they are all on separate machines.

I would not recommend it, but if you know for sure that your
infrastructure can handle it, then you should be able to optimize them
all at once by sending parallel optimize requests with distrib=false
directly to the Solr cores that hold the shard replicas, not the collection.

Thanks,
Shawn