You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pranav Prakash <pr...@gmail.com> on 2011/08/10 09:12:28 UTC

Is optimize needed on slaves if it replicates from optimized master?

Do slaves need a separate optimize command if they replicate from optimized
master?

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Walter Underwood <wu...@wunderwood.org>.
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

The defaults are very good. I have never changed them, and I've had Solr in production at two major sites, Netflix and Chegg.

Don't spend any more time worrying about merges.

wunder

On May 31, 2012, at 10:51 AM, sudarshan wrote:

> Walter,
>         Thanks again. Can you specify the criteria based on which Solr
> optimizes/force merges segments automatically.  Is this defined by the
> MergeFactor parameter - like if the mergefactor is 10, then merge happens
> for every 10 segments? Please explain. 
> 
> Thanks,
> Sudarshan 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3987086.html
> Sent from the Solr - User mailing list archive at Nabble.com.






Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by sudarshan <ch...@gmail.com>.
Walter,
         Thanks again. Can you specify the criteria based on which Solr
optimizes/force merges segments automatically.  Is this defined by the
MergeFactor parameter - like if the mergefactor is 10, then merge happens
for every 10 segments? Please explain. 

Thanks,
Sudarshan 

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3987086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Walter Underwood <wu...@wunderwood.org>.
You do not need to use optimize at all.

Solr continually merges segments ("optimizes") as needed.

wunder

On May 29, 2012, at 6:08 AM, sudarshan wrote:

> Hi Walter,
>             Thank you. Do you mean that optimize need not be used at all?
> If Solr merges segments (when needed as you said), is there a criteria
> during which Solr does this automatically. If I want the search to be faster
> and Solr does not optimize for quite a long time, would it not compromise my
> query processing rate?
> 
> To All,            
>             I have another doubt. If I optimize and replicate, for the
> first time it would transfer all the segments from the master to slave
> irrespective of the modified segment(s). After first replication, how the
> transfer would be made  - again all segments are replicated or only the
> modified segments are replicated? I believe after the first replication
> (master and slave in sync), only the modified segments would be transferred
> just like the  non-optimized index transfer. Am I right? 
> 
> Regards,
> Sudarshan  
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986597.html
> Sent from the Solr - User mailing list archive at Nabble.com.






Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by sudarshan <ch...@gmail.com>.
Hi Walter,
             Thank you. Do you mean that optimize need not be used at all?
If Solr merges segments (when needed as you said), is there a criteria
during which Solr does this automatically. If I want the search to be faster
and Solr does not optimize for quite a long time, would it not compromise my
query processing rate?

To All,            
             I have another doubt. If I optimize and replicate, for the
first time it would transfer all the segments from the master to slave
irrespective of the modified segment(s). After first replication, how the
transfer would be made  - again all segments are replicated or only the
modified segments are replicated? I believe after the first replication
(master and slave in sync), only the modified segments would be transferred
just like the  non-optimized index transfer. Am I right? 

Regards,
Sudarshan  

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986597.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Walter Underwood <wu...@wunderwood.org>.
A. Never optimize on the slave.
B. You probably do not need to optimize on the master.

"Optimize" does not optimize anything. It is forced merge, combining segments. Solr automatically combines segments as needed.

wunder

On May 26, 2012, at 1:57 PM, sudarshan wrote:

> Hi All,
>       I happen to see this message board just now. I want to clarify
> certain things. I'm new to Solr. I'm trying to combine Solr's index
> replication and optimization. I have some doubts about the working of
> replication in a master slave setup. 
> 
> From the post, I understand that if the index is not optimized, only the
> modified segments are transferred from the master to slave. I have two
> scenarios.
> 
> 1. Optimizing the index only in the master and replicating the optimized
> index to the slave - from my understanding this would copy the whole index
> every time (pol interval).
> 
> 2. Optimizing the index only in the slave. From my testing, I observed that
> slave replaces its optimized index with that of the masters non-optimized
> index during every replication (poll interval). So even if the index has not
> changed in the master, if the slave tries to optimize after every
> replication, sooner it will be replaced by the masters index - based on my
> observations. 
> 
> Questions:
> From my opinion, if at all I want to optimize, doing it in master and
> replicating optimized index to slaves would be more sensible. Am I right? 
> 
> 1.Is there a way to combine optimization along with replication? 
> 2. I could not understand when merging of indexes would be useful. I believe
> that master and slave should always have a consistent view of the index
> which is what replication guarantees. So why should I merge index? 
> 3. If I have to optimize either in the master or in the slave, will the
> entire index be copied always to slave? 
> 4. During replication, I found that the size of the index and the number of
> files in the index are different in the master and slave. Still they were in
> sync. Do they have some internal meta data calculations to find the
> difference (number of files per index version) between the master and the
> slave to initiate replication? 
> 
> Your suggestions/guidance would me very helpful to get a clear
> understanding. Please help.
> 
> Thanks,
> Sudarshan 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986259.html
> Sent from the Solr - User mailing list archive at Nabble.com.





Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by sudarshan <ch...@gmail.com>.
Hi All,
       I happen to see this message board just now. I want to clarify
certain things. I'm new to Solr. I'm trying to combine Solr's index
replication and optimization. I have some doubts about the working of
replication in a master slave setup. 

>From the post, I understand that if the index is not optimized, only the
modified segments are transferred from the master to slave. I have two
scenarios.

1. Optimizing the index only in the master and replicating the optimized
index to the slave - from my understanding this would copy the whole index
every time (pol interval).

2. Optimizing the index only in the slave. From my testing, I observed that
slave replaces its optimized index with that of the masters non-optimized
index during every replication (poll interval). So even if the index has not
changed in the master, if the slave tries to optimize after every
replication, sooner it will be replaced by the masters index - based on my
observations. 

Questions:
>From my opinion, if at all I want to optimize, doing it in master and
replicating optimized index to slaves would be more sensible. Am I right? 

1.Is there a way to combine optimization along with replication? 
2. I could not understand when merging of indexes would be useful. I believe
that master and slave should always have a consistent view of the index
which is what replication guarantees. So why should I merge index? 
3. If I have to optimize either in the master or in the slave, will the
entire index be copied always to slave? 
4. During replication, I found that the size of the index and the number of
files in the index are different in the master and slave. Still they were in
sync. Do they have some internal meta data calculations to find the
difference (number of files per index version) between the master and the
slave to initiate replication? 

Your suggestions/guidance would me very helpful to get a clear
understanding. Please help.

Thanks,
Sudarshan 

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986259.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Pranav Prakash <pr...@gmail.com>.
Very well explained. Thanks. Yes, we do optimize Index before replication. I
am not particularly worried about disk space usage. I was more curious of
that behavior.

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>


On Wed, Aug 10, 2011 at 19:55, Erick Erickson <er...@gmail.com>wrote:

> This is expected behavior. You might be optimizing
> your index on the master after every set of changes,
> in which case the entire index is copied. During this
> period, the space on disk will at least double, there's no
> way around that.
>
> If you do NOT optimize, then the slave will only copy changed
> segments instead of the entire index. Optimizing isn't
> usually necessary except periodically (daily, perhaps weekly,
> perhaps never actually).
>
> All that said, depending on how merging happens, you will always
> have the possibility of the entire index being copied sometimes
> because you'll happen to hit a merge that merges all segments
> into one.
>
> There are some advanced options that can control some parts
> of merging, but you need to get to the bottom of why the whole
> index is getting copied every time before you go there. I'd bet
> you're issuing an optimize.
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 5:30 AM, Pranav Prakash <pr...@gmail.com> wrote:
> > That is not true. Replication is roughly a copy of the diff between the
> >>> master and the slave's index.
> >>
> >>
> > In my case, during replication entire index is copied from master to
> slave,
> > during which the size of index goes a little over double. Then it shrinks
> to
> > its original size. Am I doing something wrong? How can I get the master
> to
> > serve only delta index instead of serving whole index and the slaves
> merging
> > the new and old index?
> >
> > *Pranav Prakash*
> >
>

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Erick Erickson <er...@gmail.com>.
This is expected behavior. You might be optimizing
your index on the master after every set of changes,
in which case the entire index is copied. During this
period, the space on disk will at least double, there's no
way around that.

If you do NOT optimize, then the slave will only copy changed
segments instead of the entire index. Optimizing isn't
usually necessary except periodically (daily, perhaps weekly,
perhaps never actually).

All that said, depending on how merging happens, you will always
have the possibility of the entire index being copied sometimes
because you'll happen to hit a merge that merges all segments
into one.

There are some advanced options that can control some parts
of merging, but you need to get to the bottom of why the whole
index is getting copied every time before you go there. I'd bet
you're issuing an optimize.

Best
Erick

On Wed, Aug 10, 2011 at 5:30 AM, Pranav Prakash <pr...@gmail.com> wrote:
> That is not true. Replication is roughly a copy of the diff between the
>>> master and the slave's index.
>>
>>
> In my case, during replication entire index is copied from master to slave,
> during which the size of index goes a little over double. Then it shrinks to
> its original size. Am I doing something wrong? How can I get the master to
> serve only delta index instead of serving whole index and the slaves merging
> the new and old index?
>
> *Pranav Prakash*
>

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Pranav Prakash <pr...@gmail.com>.
That is not true. Replication is roughly a copy of the diff between the
>> master and the slave's index.
>
>
In my case, during replication entire index is copied from master to slave,
during which the size of index goes a little over double. Then it shrinks to
its original size. Am I doing something wrong? How can I get the master to
serve only delta index instead of serving whole index and the slaves merging
the new and old index?

*Pranav Prakash*

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
Sure there is actually no optimizing on the slave needed,
but after calling optimize on the slave the write.lock will be removed.
So why is the replication process not doing this?

Regards
Bernd


Am 10.08.2011 10:57, schrieb Shalin Shekhar Mangar:
> On Wed, Aug 10, 2011 at 1:11 PM, Bernd Fehling<
> bernd.fehling@uni-bielefeld.de>  wrote:
>
>>
>>  From what I see on my slaves, yes.
>> After replication has finished and new index is in place and new reader has
>> started
>> I have always a write.lock file in my index directory on slaves, even
>> though the index
>> on master is optimized.
>>
>
> That is not true. Replication is roughly a copy of the diff between the
> master and the slave's index. An optimized index is a merged and re-written
> index so replication from an optimized master will give an optimized copy on
> the slave.
>
> The write lock is due to the fact that an IndexWriter is always open in Solr
> even on the slaves.
>

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Aug 10, 2011 at 1:11 PM, Bernd Fehling <
bernd.fehling@uni-bielefeld.de> wrote:

>
> From what I see on my slaves, yes.
> After replication has finished and new index is in place and new reader has
> started
> I have always a write.lock file in my index directory on slaves, even
> though the index
> on master is optimized.
>

That is not true. Replication is roughly a copy of the diff between the
master and the slave's index. An optimized index is a merged and re-written
index so replication from an optimized master will give an optimized copy on
the slave.

The write lock is due to the fact that an IndexWriter is always open in Solr
even on the slaves.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Is optimize needed on slaves if it replicates from optimized master?

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
 From what I see on my slaves, yes.
After replication has finished and new index is in place and new reader has started
I have always a write.lock file in my index directory on slaves, even though the index
on master is optimized.

Regards
Bernd


Am 10.08.2011 09:12, schrieb Pranav Prakash:
> Do slaves need a separate optimize command if they replicate from optimized
> master?
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter<http://twitter.com/pranavprakash>  | Blog<http://blog.myblive.com>  |
> Google<http://www.google.com/profiles/pranny>
>