You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "MOIS Martin (MORPHO)" <ma...@morpho.com> on 2015/10/12 10:01:41 UTC

Replication and soft commits for NRT searches

Hello,

I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been created with replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am using autoCommit/maxDocs=10000 and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior.

As far as I understand from section "Write Side Fault Tolerance" in the documentation (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance), I cannot enforce that an update gets replicated to all replicas, but I can only get the achieved replication factor by requesting the return value rf.

My question is now, what exactly does rf=2 mean? Does it only mean that the replica has written the update to its transaction log? Or has the replica also performed the soft commit as configured with autoSoftCommits/maxDocs=1? The answer is important for me, as if the update would only get written to the transaction log, I could not search for it reliable, as the replica may not have added it to the searchable index.

My second question is, does rf=1 mean that the update was definitely not successful on the replica or could it also represent a timeout of the replication request from the shard leader? If it could also represent a timeout, then there would be a small chance that the replication was successfully despite of the timeout.

Is there a way to retrieve the replication factor for a specific document after the update in order to check if replication was successful in the meantime?

Thanks in advance.

Best Regards,
Martin Mois
#
" This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system."
#

Re: Replication and soft commits for NRT searches

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Comments inline:

On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
<ma...@morpho.com> wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been created with replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am using autoCommit/maxDocs=10000 and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the documentation (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance), I cannot enforce that an update gets replicated to all replicas, but I can only get the achieved replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the replica has written the update to its transaction log? Or has the replica also performed the soft commit as configured with autoSoftCommits/maxDocs=1? The answer is important for me, as if the update would only get written to the transaction log, I could not search for it reliable, as the replica may not have added it to the searchable index.

rf=2 means that the update was successfully replicated to and
acknowledged by two replicas (including the leader). The rf only deals
with the durability of the update and has no relation to visibility of
the update to searchers. The auto(soft)commit settings are applied
asynchronously and do not block an update request.

>
> My second question is, does rf=1 mean that the update was definitely not successful on the replica or could it also represent a timeout of the replication request from the shard leader? If it could also represent a timeout, then there would be a small chance that the replication was successfully despite of the timeout.

Well, rf=1 implies that the update was only applied on the leader's
index + tlog and either replicas weren't available or returned an
error or the request timed out. So yes, you are right that it can
represent a timeout and as such there is a chance that the replication
was indeed successful despite of the timeout.

>
> Is there a way to retrieve the replication factor for a specific document after the update in order to check if replication was successful in the meantime?
>

No, there is no way to do that.

> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system."
> #



-- 
Regards,
Shalin Shekhar Mangar.

Re: Replication and soft commits for NRT searches

Posted by Erick Erickson <er...@gmail.com>.
First of all, setting soft commit with maxDocs=1 is almost (but not
quite) guaranteed to lead to problems. For _every_ document you add to
Solr, all your top-level caches (i.e. the ones configured in
solrconrig.xml) will be thrown away, all autowarming will be performed
etc. Essentially assuming a constant indexing load none of your
top-level caches are doing you any good.

This might help:
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

By the time an indexing request returns, the document(s) have all been
forwarded to all replicas and indexed to the in-memory structures
_and_ written to the tlog. The next expiration of the soft commit
interval will allow them to be searched, assuming that autowarming is
completed.

I'm going to guess that you'll see a bunch of warnings like
"overlapping ondeck searchers" and you'll be tempted to set
maxWarmingSearchers to some number greater than 2 in solrconfig.xml. I
recommend against this too, that setting is there for a reason.

Do you have any evidence of a problem or is this theoretical?

All that said, I would _strongly_ urge you to revisit the requirement
of having your soft commit maxDocs set to 1.

Best,
Erick

On Mon, Oct 12, 2015 at 1:01 AM, MOIS Martin (MORPHO)
<ma...@morpho.com> wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been created with replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am using autoCommit/maxDocs=10000 and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the documentation (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance), I cannot enforce that an update gets replicated to all replicas, but I can only get the achieved replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the replica has written the update to its transaction log? Or has the replica also performed the soft commit as configured with autoSoftCommits/maxDocs=1? The answer is important for me, as if the update would only get written to the transaction log, I could not search for it reliable, as the replica may not have added it to the searchable index.
>
> My second question is, does rf=1 mean that the update was definitely not successful on the replica or could it also represent a timeout of the replication request from the shard leader? If it could also represent a timeout, then there would be a small chance that the replication was successfully despite of the timeout.
>
> Is there a way to retrieve the replication factor for a specific document after the update in order to check if replication was successful in the meantime?
>
> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system."
> #