You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marcin Rzewucki <mr...@gmail.com> on 2012/11/19 12:54:30 UTC

CloudSolrServer or load-balancer for indexing

Hi,

As far as I know CloudSolrServer is recommended to be used for indexing to
SolrCloud. I wonder what are advantages of this approach over external
load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) +
1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
load-balancer and send updates to any existing node. In former case it
seems that ZooKeeper is a single point of failure - indexing is not
possible if it is down. In latter case I can still indexing data even if
some nodes are down (no data outage). What is better for reliable indexing
- CloudSolrServer, load-balancer or you know some different methods worth
to consider ?

Regards.

Re: CloudSolrServer or load-balancer for indexing

Posted by Upayavira <uv...@odoko.co.uk>.
A single zookeeper node could be a single point of failure. It is
recommended that you have at least one three zookeeper nodes running as
an ensemble.

Zookeeper has a simple rule - over half of your nodes must be available
to achieve quorum and thus be functioning. This is to avoid
'split-brain'. Thus, with three servers, you could handle the loss of
one zookeeper node. Five would allow the loss of two nodes.

More to the point, you're pushing the static configuration from being a
list of solr nodes, to being a list of Zookeeper nodes. The expectation
is clearly that you'll need to scale your Zookeeper nodes far less often
than you'd need to do it with Solr.

Upayavira

On Mon, Nov 19, 2012, at 09:39 PM, Marcin Rzewucki wrote:
> OK, got it. Thanks.
> 
> On 19 November 2012 15:00, Mark Miller <ma...@gmail.com> wrote:
> 
> > Nodes stop accepting updates if they cannot talk to Zookeeper, so the
> > external load balancer is no advantage there.
> >
> > CloudSolrServer will be smart about knowing who the leaders are,
> > eventually will do hashing, will auto add/remove nodes from rotation based
> > on the cluster state in Zookeeper, and is probably out of the box more
> > intelligent about retrying on some responses (for example responses that
> > are returned on shutdown or startup).
> >
> > - Mark
> >
> > On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki <mr...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > As far as I know CloudSolrServer is recommended to be used for indexing
> > to
> > > SolrCloud. I wonder what are advantages of this approach over external
> > > load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
> > +
> > > 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
> > > load-balancer and send updates to any existing node. In former case it
> > > seems that ZooKeeper is a single point of failure - indexing is not
> > > possible if it is down. In latter case I can still indexing data even if
> > > some nodes are down (no data outage). What is better for reliable
> > indexing
> > > - CloudSolrServer, load-balancer or you know some different methods worth
> > > to consider ?
> > >
> > > Regards.
> >
> >

Re: CloudSolrServer or load-balancer for indexing

Posted by Marcin Rzewucki <mr...@gmail.com>.
OK, got it. Thanks.

On 19 November 2012 15:00, Mark Miller <ma...@gmail.com> wrote:

> Nodes stop accepting updates if they cannot talk to Zookeeper, so the
> external load balancer is no advantage there.
>
> CloudSolrServer will be smart about knowing who the leaders are,
> eventually will do hashing, will auto add/remove nodes from rotation based
> on the cluster state in Zookeeper, and is probably out of the box more
> intelligent about retrying on some responses (for example responses that
> are returned on shutdown or startup).
>
> - Mark
>
> On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki <mr...@gmail.com> wrote:
>
> > Hi,
> >
> > As far as I know CloudSolrServer is recommended to be used for indexing
> to
> > SolrCloud. I wonder what are advantages of this approach over external
> > load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
> +
> > 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
> > load-balancer and send updates to any existing node. In former case it
> > seems that ZooKeeper is a single point of failure - indexing is not
> > possible if it is down. In latter case I can still indexing data even if
> > some nodes are down (no data outage). What is better for reliable
> indexing
> > - CloudSolrServer, load-balancer or you know some different methods worth
> > to consider ?
> >
> > Regards.
>
>

Re: CloudSolrServer or load-balancer for indexing

Posted by Mark Miller <ma...@gmail.com>.
Nodes stop accepting updates if they cannot talk to Zookeeper, so the external load balancer is no advantage there.

CloudSolrServer will be smart about knowing who the leaders are, eventually will do hashing, will auto add/remove nodes from rotation based on the cluster state in Zookeeper, and is probably out of the box more intelligent about retrying on some responses (for example responses that are returned on shutdown or startup).

- Mark

On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki <mr...@gmail.com> wrote:

> Hi,
> 
> As far as I know CloudSolrServer is recommended to be used for indexing to
> SolrCloud. I wonder what are advantages of this approach over external
> load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) +
> 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
> load-balancer and send updates to any existing node. In former case it
> seems that ZooKeeper is a single point of failure - indexing is not
> possible if it is down. In latter case I can still indexing data even if
> some nodes are down (no data outage). What is better for reliable indexing
> - CloudSolrServer, load-balancer or you know some different methods worth
> to consider ?
> 
> Regards.