You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Joshi, Shital" <Sh...@gs.com> on 2013/08/28 22:20:23 UTC

purge and optimize questions for solr 4.4.0

We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes with 500 million documents. We're using custom sharding where we  direct all documents with specific business date to specific shard.

With Solr 3.6 we used this command to optimize documents on master and then let replication take care of updating documents on slave1 and slave2.

curl --proxy "" 'http://prod-solr-master.xyz.com:8983/solr/core1/update?optimize=true&waitFlush=false&maxSegments=1'

How do we optimize documents for all shards in Solr Cloud? Do we have to fire five different optimize commands to all five leaders? Also, looks like optimize will be going away and might no longer be necessary - see SOLR-3141<https://issues.apache.org/jira/browse/SOLR-3141> Is that true? With Solr 3.6 we purge millions of documents every month and then run optimize. We're planning to do same with Solr Cloud set up.

With Solr 3.6 we used following curl command to purge documents. Now with multiple shards can we still use the same command? We will definitely experiment with our QA set up of 500 million documents.

curl --proxy "" http://prod-solr-master.xyz.com:8983/solr/core1/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>busdate_i:[* TO 20130208]</query></delete>'

Thanks!







Re: purge and optimize questions for solr 4.4.0

Posted by Chris Hostetter <ho...@fucit.org>.
: We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes with 500 
: million documents. We're using custom sharding where we direct all
: documents with specific business date to specific shard.

	...

: How do we optimize documents for all shards in Solr Cloud? Do we have to 
: fire five different optimize commands to all five leaders? Also, looks 

Commands like Optimize and deleteByQuery are automatically propogated to 
all shards -- you only need to send that command to one node in the 
collection.

: like optimize will be going away and might no longer be necessary - see 
: SOLR-3141<https://issues.apache.org/jira/browse/SOLR-3141> Is that true? 

it's still up for debate, and as you can see from the comments hasn't had 
much traction lately.  Even if, at some point in the future, sending a 
command named "optimize" ceasees to work, the underlying functinoality of  
being able to say "force merge down to N segments" will always exist under 
some name, provided you don't go out of your way to use a MergePolicy that 
ignores that command.

: With Solr 3.6 we used following curl command to purge documents. Now 
: with multiple shards can we still use the same command? We will 

as mentioned above, a deleteByQuery command can be sent to a single node 
and it will be propogated automatically.

However: if you are already using custom sharding to shard by date, then a 
blanket deleteByQuery across all shards may not be neccessary -- you may 
find it easier/faster/cleaner to just delete the shards you no longer need 
as the data in them "expires" ...

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DeleteaShard

-Hoss