You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sureshrk19 <su...@gmail.com> on 2014/01/23 22:26:54 UTC

SOLR 4.4 - Slave always replicates full index

Hi,

I have configured single core master, slave nodes on 2 different machines.
The replication configuration is fine and it is working but, what I observed
is, on every change to master index full replication is being triggered on
slave. 
I was planning to get only incremental indexing done on every change.

*Master config:*

<requestHandler name="/replication" class="solr.ReplicationHandler">
      <lst name="master">
          <str name="replicateAfter">startup</str>
          <str name="replicateAfter">commit</str>
           <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
            <str name="commitReserveDuration">00:00:20</str>
     </lst>
       <str name="maxNumberOfBackups">1</str>
</requestHandler>

*Slave config:*

<requestHandler name="/replication" class="solr.ReplicationHandler" >
  <lst name="slave">
    <str name="masterUrl">http://<IP>:<Port>/solr/core0/replication</str>
    <str name="pollInterval">00:00:20</str>
  </lst>
</requestHandler>


What I observed is, the index directory name is appended with timestamp
i.e., /index.<timestamp>/ on slave instance. 

I have seen a similar issue on older version of SOLR and it is fixed in 4.2
(per description). So, not sure if this is related to the same.

https://issues.apache.org/jira/browse/SOLR-4471
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-amp-Index-versions-td4041256.html#a4041808


Any pointers would be highly appreciated.

Thanks,
Suresh



--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4.4 - Slave always replicates full index

Posted by Robin Woods <wo...@gmail.com>.
Thanks Shawn. that makes sense.





--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4148909.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4.4 - Slave always replicates full index

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/22/2014 5:00 PM, Robin Woods wrote:
> I think, I found the issue!
>
> I actually missed to mention a very important step that I did, which is,
> CORE SWAP
> otherwise, it's not replicating the full index.
>
> when we do CORE SWAP, doesn't it do the same checks of copying only deltas?

Yes, it will look for differences and only copy what's changed ... but
when you swap cores, you're pretty much guaranteed that the entire index
is different on the master compared to the slave, so it will have to
copy the entire thing.  Even if you build the index in exactly the same
way in two cores at exactly the same time on the same machine, the end
result will have minor differences, such as the timestamp on each file.

Thanks,
Shawn


Re: SOLR 4.4 - Slave always replicates full index

Posted by Robin Woods <wo...@gmail.com>.
I think, I found the issue!

I actually missed to mention a very important step that I did, which is,
CORE SWAP
otherwise, it's not replicating the full index.

when we do CORE SWAP, doesn't it do the same checks of copying only deltas?





--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4148678.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4.4 - Slave always replicates full index

Posted by Robin Woods <wo...@gmail.com>.
I did observe the same.. 

1. updated an existing document.. means potentially marking the previous
document as "deleted" and adding a new version of it.. posted the JSON doc
using the Documents interface on the Admin UI.. left the default commit
within "1000" ms there on the Documents UI..

2. NOT optimized

3. Slave seems to be triggering the full replication (full index seems to be
replicated)

4. Using Solr 4.9

5. Does have the following on the master (but commit within from the
Documents UI might be overloading this.. (mentioning here just in case)
<autoCommit> 
       <maxDocs>100000</maxDocs>
       <openSearcher>false</openSearcher> 
     </autoCommit>

Can someone look into this and suggest...

Thanks!




--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4148526.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4.4 - Slave always replicates full index

Posted by Dominik Siebel <me...@dsiebel.de>.
Erick:

I now that. I didn't optimize the index frequently. The problem was more
that a lot of documents have been added (without commit or autoCommit
configured) to the index and MergePolicy kicked in and started merging the
segments (i guess). This led to all segments beeing replicated because all
of them had changed.

I didn't find a solution yet so I am *now* optimizing the index after every
full import (once a day) and replicate then to save the bandwidth..

I was more interested in the change to index strategy that Suresh mentioned.

~ Dom


2014-06-26 1:57 GMT+02:00 Erick Erickson <er...@gmail.com>:

> Dominik:
>
> If you optimize your index, then the entire thing will be replicated
> from the master to the slave every time. In general, optimizing isn't
> necessary even though it sounds like something that's A Good Thing.
>
> I suspect that's the nub of the issue.
>
> Erick
>
> On Tue, Jun 24, 2014 at 11:14 PM, Dominik Siebel <me...@dsiebel.de> wrote:
> > Hey Suresh,
> >
> > could you get a little more specific on what solved your problem here?
> > I am currently facing the same problem and am trying to find a proper
> > solution.
> > Thanks!
> >
> > ~ Dom
> >
> >
> > 2014-02-28 7:46 GMT+01:00 sureshrk19 <su...@gmail.com>:
> >
> >> Thanks Shawn and Erick.
> >>
> >> I followed SOLR configuration document and modified index strategy.
> >>
> >> Looks good now. I haven't seen any problems in last 1 week.
> >>
> >> Thanks for your suggestions.
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>

Re: SOLR 4.4 - Slave always replicates full index

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Note that this problem can also happen if the RealTimeGet handler is
missing from your solrconfig.xml because PeerSync will always fail and a
full replication will be triggerred. I added warn-level logging to complain
when this happens but it is possible that you are using an older version of
Solr which does not have that logging.


On Thu, Jun 26, 2014 at 5:27 AM, Erick Erickson <er...@gmail.com>
wrote:

> Dominik:
>
> If you optimize your index, then the entire thing will be replicated
> from the master to the slave every time. In general, optimizing isn't
> necessary even though it sounds like something that's A Good Thing.
>
> I suspect that's the nub of the issue.
>
> Erick
>
> On Tue, Jun 24, 2014 at 11:14 PM, Dominik Siebel <me...@dsiebel.de> wrote:
> > Hey Suresh,
> >
> > could you get a little more specific on what solved your problem here?
> > I am currently facing the same problem and am trying to find a proper
> > solution.
> > Thanks!
> >
> > ~ Dom
> >
> >
> > 2014-02-28 7:46 GMT+01:00 sureshrk19 <su...@gmail.com>:
> >
> >> Thanks Shawn and Erick.
> >>
> >> I followed SOLR configuration document and modified index strategy.
> >>
> >> Looks good now. I haven't seen any problems in last 1 week.
> >>
> >> Thanks for your suggestions.
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: SOLR 4.4 - Slave always replicates full index

Posted by Erick Erickson <er...@gmail.com>.
Dominik:

If you optimize your index, then the entire thing will be replicated
from the master to the slave every time. In general, optimizing isn't
necessary even though it sounds like something that's A Good Thing.

I suspect that's the nub of the issue.

Erick

On Tue, Jun 24, 2014 at 11:14 PM, Dominik Siebel <me...@dsiebel.de> wrote:
> Hey Suresh,
>
> could you get a little more specific on what solved your problem here?
> I am currently facing the same problem and am trying to find a proper
> solution.
> Thanks!
>
> ~ Dom
>
>
> 2014-02-28 7:46 GMT+01:00 sureshrk19 <su...@gmail.com>:
>
>> Thanks Shawn and Erick.
>>
>> I followed SOLR configuration document and modified index strategy.
>>
>> Looks good now. I haven't seen any problems in last 1 week.
>>
>> Thanks for your suggestions.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: SOLR 4.4 - Slave always replicates full index

Posted by Dominik Siebel <me...@dsiebel.de>.
Hey Suresh,

could you get a little more specific on what solved your problem here?
I am currently facing the same problem and am trying to find a proper
solution.
Thanks!

~ Dom


2014-02-28 7:46 GMT+01:00 sureshrk19 <su...@gmail.com>:

> Thanks Shawn and Erick.
>
> I followed SOLR configuration document and modified index strategy.
>
> Looks good now. I haven't seen any problems in last 1 week.
>
> Thanks for your suggestions.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SOLR 4.4 - Slave always replicates full index

Posted by sureshrk19 <su...@gmail.com>.
Thanks Shawn and Erick.

I followed SOLR configuration document and modified index strategy.

Looks good now. I haven't seen any problems in last 1 week.

Thanks for your suggestions.



--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4120337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4.4 - Slave always replicates full index

Posted by Shawn Heisey <so...@elyograg.org>.
On 1/24/2014 10:36 AM, sureshrk19 wrote:
> I'm not committing each document but, have following configuration in
> solrconfig.xml (commit every 5mins).
>
>      <autoCommit>
>           <maxTime>300000</maxTime>
>           <openSearcher>false</openSearcher>
>       </autoCommit>
>
> Also, if you look at my master config, I do not have 'optimize'.
>
>   <str name="replicateAfter">startup</str>
>   <str name="replicateAfter">commit</str>
>
> Is there any way other option which triggers 'optimize'?

I think Erick was actually asking if you are optimizing your index 
frequently, not whether you have replication configured to replicate 
after optimize.

Optimizing your index (a forced merge down to one Lucene index segment) 
is something you have to do yourself.  It won't happen automatically.  
If you optimize your index, all old segments are gone and only a single 
new segment remains.  Even if you don't replicate immediately, the next 
time you commit, the entire index will need to be copied to the slave.

Your autoCommit cannot be the only committing that you do, because that 
configuration will not make new documents visible - it has 
openSearcher=false.  Therefore if you are adding new content, you must 
be doing additional soft commits, or hard commits with 
openSearcher=true.  This might be accomplished with a parameter on your 
updates, like commit, softCommit, or commitWithin. It might also be an 
explicit commit.

Optimizing *IS* a useful feature, but if you optimize very frequently 
(especially if it's done every time you add new documents), Solr's 
performance will really suffer.

Personal anecdote: One of my shards is very tiny and holds all new 
content.  That gets optimized once an hour.  In general, this is pretty 
frequently, but it happens very quickly, so in my setup it's not 
excessive.  That is a LOT more often than what I do for my other shards, 
the large ones.  I optimize one of those once every day, so each one 
only gets optimized once every six days.

Thanks,
Shawn


Re: SOLR 4.4 - Slave always replicates full index

Posted by sureshrk19 <su...@gmail.com>.
Erick,

Thanks for the reply..

I'm not committing each document but, have following configuration in
solrconfig.xml (commit every 5mins).

    <autoCommit>
         <maxTime>300000</maxTime>
         <openSearcher>false</openSearcher>
     </autoCommit>

Also, if you look at my master config, I do not have 'optimize'.

 <str name="replicateAfter">startup</str> 
 <str name="replicateAfter">commit</str> 

Is there any way other option which triggers 'optimize'?

Thanks,
Suresh





--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089p4113249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4.4 - Slave always replicates full index

Posted by Erick Erickson <er...@gmail.com>.
How are you committing? Are you committing every document? (you shouldn't).

Or, sin of all sins, are you _optimizing_ frequently? That'll cause
your entire index
to be replicated every time.

Best,
Erick

On Thu, Jan 23, 2014 at 3:26 PM, sureshrk19 <su...@gmail.com> wrote:
> Hi,
>
> I have configured single core master, slave nodes on 2 different machines.
> The replication configuration is fine and it is working but, what I observed
> is, on every change to master index full replication is being triggered on
> slave.
> I was planning to get only incremental indexing done on every change.
>
> *Master config:*
>
> <requestHandler name="/replication" class="solr.ReplicationHandler">
>       <lst name="master">
>           <str name="replicateAfter">startup</str>
>           <str name="replicateAfter">commit</str>
>            <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
>             <str name="commitReserveDuration">00:00:20</str>
>      </lst>
>        <str name="maxNumberOfBackups">1</str>
> </requestHandler>
>
> *Slave config:*
>
> <requestHandler name="/replication" class="solr.ReplicationHandler" >
>   <lst name="slave">
>     <str name="masterUrl">http://<IP>:<Port>/solr/core0/replication</str>
>     <str name="pollInterval">00:00:20</str>
>   </lst>
> </requestHandler>
>
>
> What I observed is, the index directory name is appended with timestamp
> i.e., /index.<timestamp>/ on slave instance.
>
> I have seen a similar issue on older version of SOLR and it is fixed in 4.2
> (per description). So, not sure if this is related to the same.
>
> https://issues.apache.org/jira/browse/SOLR-4471
> http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-amp-Index-versions-td4041256.html#a4041808
>
>
> Any pointers would be highly appreciated.
>
> Thanks,
> Suresh
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089.html
> Sent from the Solr - User mailing list archive at Nabble.com.