You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alessandro Bon <v-...@expedia.com> on 2016/07/22 10:02:37 UTC

Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)

Hi everyone,
I am experiencing a replication issue on a master/slave configuration,

Issue: Full index replicas occur sometimes on master startup and after commits, despite only the <str name="replicateAfter">optimize</str> directive is specified. In the case of replica on commit, it occurs only for sufficiently big commits. Replica correctly starts again at the end of my indexing job, after the optimization phase. As result of this behaviour I get incomplete indexes on slaves during the indexing process.
Solr version: 5.5.2
Configuration:

<config>
    <abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>

    <luceneMatchVersion>5.5.1</luceneMatchVersion>

    <dataDir>${solr.data.dir:}</dataDir>

    <directoryFactory name="DirectoryFactory"
                      class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>

    <indexConfig>
        <writeLockTimeout>1000</writeLockTimeout>
        <useCompoundFile>false</useCompoundFile>
        <ramBufferSizeMB>32</ramBufferSizeMB>
        <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
           <int name="maxMergeAtOnce">10</int>
           <int name="segmentsPerTier">10</int>
        </mergePolicyFactory>
        <lockType>native</lockType>
        <reopenReaders>true</reopenReaders>
        <deletionPolicy class="solr.SolrDeletionPolicy">
            <str name="maxCommitsToKeep">1</str>
            <str name="maxOptimizedCommitsToKeep">0</str>
        </deletionPolicy>
        <maxFieldLength>100000</maxFieldLength>
        <infoStream file="INFOSTREAM.txt">false</infoStream>
    </indexConfig>

    <jmx />

    <updateHandler class="solr.DirectUpdateHandler2">
        <autoCommit>
            <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered -->
            <maxTime>600000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit is triggered -->
            <openSearcher>false</openSearcher>
        </autoCommit>
    </updateHandler>

    [...]

    <!--Replication from the Solr Master to the Slaves. The Job SrsIndexUpdateJob will create the index on the master which will be replicated on the slaves.-->
    <requestHandler name="/replication" class="solr.ReplicationHandler">
        <lst name="master">
            <str name="enable">${solr.master.enable:false}</str>
            <str name="replicateAfter">optimize</str>
            <str name="backupAfter">optimize</str>
        </lst>
        <str name="maxNumberOfBackups">${solr.numberOfVersionToKeep:3}</str>
        <lst name="slave">
            <str name="enable">${solr.slave.enable:false}</str>
            <str name="masterUrl">${solr.master.url:}/replication</str>
            <str name="pollInterval">${solr.replication.pollInterval:00:00:30}</str>
        </lst>
    </requestHandler>

    [...]

    </config>


Any idea on how to solve this issue would be greatly appreciated.

Many thanks,
Alessandro

Re: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)

Posted by Erick Erickson <er...@gmail.com>.
Well, if it is a bug you can spoof it by not issuing any commits until
the indexing
is completed. Certainly not elegant, and you risk having to re-index
from scratch
if your machine dies.

Or take explicit control over it, which in your case might be preferable through
the replication API, see:
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler

Best,
Erick

On Fri, Jul 22, 2016 at 7:00 AM, Alessandro Bon <v-...@expedia.com> wrote:
> Thanks for your answer Shawn,
>
> If I got you, you are saying that regardless the "replicateAfter" directive is "commit" or "optimize", a replication is triggered whenever a segments merge occurs. Is that right?
> Or is it triggered only when a full index merge occurs, which could happen after a commit as well (other than after an optimization)?
>
> I would love to switch to SolrCloud, and for sure I will in the future, but right now I just have to get the old master/slave architecture to work properly.
>
> Thanks again,
> Alessandro
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apache@elyograg.org]
> Sent: Friday, July 22, 2016 3:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)
>
> On 7/22/2016 4:02 AM, Alessandro Bon wrote:
>> Issue: Full index replicas occur sometimes on master startup and after
>> commits, despite only the <str name="replicateAfter">optimize</str>
>> directive is specified. In the case of replica on commit, it occurs
>> only for sufficiently big commits. Replica correctly starts again at
>> the end of my indexing job, after the optimization phase. As result of
>> this behaviour I get incomplete indexes on slaves during the indexing
>> process.
>
> There's a known bug where full index replication happens after master restart.  This was supposed to be fixed in 5.5.2and 6.1.0, but you say you are running 5.5.2.
>
> https://issues.apache.org/jira/browse/SOLR-9036
>
> All replications are *supposed* to be delta replications -- only new/changed files.  Note that normal commits can cause segment merging, up to and including the entire index if conditions are just right.
> Segment merges can result in new segment files that are very large, which could take a long time to replicate.
>
> Optimizing the index is a forced merge to one segment.  This will always lead to a full-index replication, because the entire index is rewritten into a single segment and all the other segment files are deleted.
>
> You might want to give SolrCloud a try.  There are no masters and no slaves.  It is a true redundant cluster.
>
> Thanks,
> Shawn
>

RE: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)

Posted by Alessandro Bon <v-...@expedia.com>.
Thanks for your answer Shawn,

If I got you, you are saying that regardless the "replicateAfter" directive is "commit" or "optimize", a replication is triggered whenever a segments merge occurs. Is that right?
Or is it triggered only when a full index merge occurs, which could happen after a commit as well (other than after an optimization)?

I would love to switch to SolrCloud, and for sure I will in the future, but right now I just have to get the old master/slave architecture to work properly.

Thanks again,
Alessandro

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: Friday, July 22, 2016 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)

On 7/22/2016 4:02 AM, Alessandro Bon wrote:
> Issue: Full index replicas occur sometimes on master startup and after 
> commits, despite only the <str name="replicateAfter">optimize</str>
> directive is specified. In the case of replica on commit, it occurs 
> only for sufficiently big commits. Replica correctly starts again at 
> the end of my indexing job, after the optimization phase. As result of 
> this behaviour I get incomplete indexes on slaves during the indexing 
> process.

There's a known bug where full index replication happens after master restart.  This was supposed to be fixed in 5.5.2and 6.1.0, but you say you are running 5.5.2.

https://issues.apache.org/jira/browse/SOLR-9036

All replications are *supposed* to be delta replications -- only new/changed files.  Note that normal commits can cause segment merging, up to and including the entire index if conditions are just right. 
Segment merges can result in new segment files that are very large, which could take a long time to replicate.

Optimizing the index is a forced merge to one segment.  This will always lead to a full-index replication, because the entire index is rewritten into a single segment and all the other segment files are deleted.

You might want to give SolrCloud a try.  There are no masters and no slaves.  It is a true redundant cluster.

Thanks,
Shawn


Re: Solr "replicateAfter optimize" is specified, but replication starts also on commits and master startup (tested on solr 5.5.2)

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/22/2016 4:02 AM, Alessandro Bon wrote:
> Issue: Full index replicas occur sometimes on master startup and after
> commits, despite only the <str name="replicateAfter">optimize</str>
> directive is specified. In the case of replica on commit, it occurs
> only for sufficiently big commits. Replica correctly starts again at
> the end of my indexing job, after the optimization phase. As result of
> this behaviour I get incomplete indexes on slaves during the indexing
> process. 

There's a known bug where full index replication happens after master
restart.  This was supposed to be fixed in 5.5.2and 6.1.0, but you say
you are running 5.5.2.

https://issues.apache.org/jira/browse/SOLR-9036

All replications are *supposed* to be delta replications -- only
new/changed files.  Note that normal commits can cause segment merging,
up to and including the entire index if conditions are just right. 
Segment merges can result in new segment files that are very large,
which could take a long time to replicate.

Optimizing the index is a forced merge to one segment.  This will always
lead to a full-index replication, because the entire index is rewritten
into a single segment and all the other segment files are deleted.

You might want to give SolrCloud a try.  There are no masters and no
slaves.  It is a true redundant cluster.

Thanks,
Shawn