You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pavel Micka <Pa...@zoomint.com> on 2018/07/23 10:01:33 UTC

SolrCloud acceptable latency, when to use CDCR?

Hi,

We are discussing advantages of SolrCloud Replication and Cross Data Center Replication (CDCR). In CDCR docs, it is written that
"The SolrCloud architecture is not particularly well suited for situations where a single SolrCloud cluster consists of nodes in separated data clusters connected by an expensive pipe".

But we fail to find, what latency is acceptable for SolrCloud/ZK and when we should start considering using CDCR (master-slave). And what would be the issues if we install SolrCloud on problematic network?

Thanks in advance,

Pavel

Re: SolrCloud acceptable latency, when to use CDCR?

Posted by Erick Erickson <er...@gmail.com>.
It Depends (tm).

There are several issues when communications are unreliable, basically
all having to do with timeouts.

> ZK not getting "keep alive" requests back in time and marking the node as down
> leaders not getting responses back in time from followers in time and putting them into recovery.
> client timeouts because of slow connections

Plus, consider a single Solr query in a sharded environment. It has to:
> send a sub request to one replica of each shard
> get the sub request back
> sort the true top N
> request the actual doc from each replica in the first step
> send the final response to the client.

Or indexing. Let's say you get a doc in to replica N. Then
> the doc is forwarded to the leader for the appropriate shard
> the doc is sent to each replica in that shard
> the response comes back to the leader
> the response is ack'd back to the original node receiving the request
> the response is ack'd back to the client.

The point is that there's a _lot_ of communication between Solr nodes,
and if you have a slow
pipe connecting them overall latency is increased. But whether that
latency is acceptable for
your application only you can tell.

But be a little careful here. None of the above addresses a
"problematic network" in the
sense that a slow network is totally expected. CDCR is intended to
solve the problem of
"problematic" being defined as separate data centers connected by a network with
latency.

Best,
Erick

On Mon, Jul 23, 2018 at 3:01 AM, Pavel Micka <Pa...@zoomint.com> wrote:
> Hi,
>
> We are discussing advantages of SolrCloud Replication and Cross Data Center Replication (CDCR). In CDCR docs, it is written that
> "The SolrCloud architecture is not particularly well suited for situations where a single SolrCloud cluster consists of nodes in separated data clusters connected by an expensive pipe".
>
> But we fail to find, what latency is acceptable for SolrCloud/ZK and when we should start considering using CDCR (master-slave). And what would be the issues if we install SolrCloud on problematic network?
>
> Thanks in advance,
>
> Pavel