You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marcin Rzewucki <mr...@gmail.com> on 2012/11/21 14:54:37 UTC

SolrCloud and external Zookeeper ensemble

Hi,

I have 4 solr collections, 2-3mn documents per collection, up to 100K
updates per collection daily (roughly). I'm going to create SolrCloud4x on
Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
what about zookeeper? It's going to be external ensemble, but is it better
to use same nodes as solr or dedicated micro instances? Zookeeper does not
seem to be resources demanding process, but what would be better in this
case ? To keep it inside of solrcloud or separately (micro instances seem
to be enough here) ?

Thanks in advance.
Regards.

Re: SolrCloud and external Zookeeper ensemble

Posted by Jack Krupansky <ja...@basetechnology.com>.
That's a tradeoff for you to make based on your own requirements, but the 
point is that it is LESS SAFE to run zookeeper on the same machine as a Solr 
instance.

Also keep in mind that the goal is to have at least THREE zookeeper 
instances running at any moment, so if you run zookeeper on the same machine 
as a Solr instance, you will need more than three zookeepeers. Figure three 
plus the MAXIMUM number of Solr nodes that you expect could be down 
simultaneously.

Also keep in mind that SolrCloud is about scaling,  but the intention is NOT 
to scale the zookeeper ensemble linearly with the number of Solr nodes. That 
means you would have to deal with the messiness of sometimes running 
zookeeper with Solr and sometimes not. So, unless you are running a very 
small SolrCloud cluster, you are much better off keeping zookeeper off your 
Solr machines.

The intent is that there will be a relatively small "ensemble" of zookeepers 
that service a large "army" or "armada" of Solr nodes.

-- Jack Krupansky

-----Original Message----- 
From: Marcin Rzewucki
Sent: Wednesday, November 21, 2012 5:06 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud and external Zookeeper ensemble

Yes, I meant the same (not -zkRun). However, I was asking if it is safe to
have zookeeper and solr processes running on the same node or better on
different machines?

On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:

> Hello!
>
> As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> rather setup a standalone one.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > First of all: thank you for your answers. Yes, I meant side by side
> > configuration. I think the worst case for ZKs here is to loose two of
> them.
> > However, I'm going to use 4 availability zones in same region so at 
> > least
> > this will reduce the risk of loosing both of them at the same time.
> > Regards.
>
> > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
>
> >> Hello!
> >>
> >> Zookeeper by itself is not demanding, but if something happens to your
> >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> >> them installed side by side. However if you will have 4 Solr nodes and
> >> 3 ZK instances you can get them running side by side.
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >>
> >> > Separate is generally nice because then you can restart Solr nodes
> >> > without consideration for ZooKeeper.
> >>
> >> > Performance-wise, I doubt it's a big deal either way.
> >>
> >> > - Mark
> >>
> >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com>
> >> wrote:
> >>
> >> >> Hi,
> >> >>
> >> >> I have 4 solr collections, 2-3mn documents per collection, up to 
> >> >> 100K
> >> >> updates per collection daily (roughly). I'm going to create
> SolrCloud4x
> >> on
> >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> question is
> >> >> what about zookeeper? It's going to be external ensemble, but is it
> >> better
> >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> does
> >> not
> >> >> seem to be resources demanding process, but what would be better in
> this
> >> >> case ? To keep it inside of solrcloud or separately (micro instances
> >> seem
> >> >> to be enough here) ?
> >> >>
> >> >> Thanks in advance.
> >> >> Regards.
> >>
> >>
>
> 


Re: SolrCloud and external Zookeeper ensemble

Posted by Jack Krupansky <ja...@basetechnology.com>.
That is an interesting point - what size of instance is needed for a 
zookeeper. Can it run well in a micro?

Another issue I wanted to raise is that maybe questions, advice, and 
guidelines should be relative to the "shirt size" of your cluster - small, 
medium, or large. SolrCloud is clearly more optimized for medium to large 
clusters. Sure, you can use it for small clusters, but then some of the 
features and guidance do seem like overkill. Nonetheless, I would hate to 
see anybody take the compromised guidance for very small clusters (3 or 4 
machines) and apply it to even medium-size clusters (10 to 20 machines), let 
alone large clusters (dozens to 100 or more machines).

-- Jack Krupansky

-----Original Message----- 
From: Otis Gospodnetic
Sent: Thursday, November 22, 2012 9:37 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud and external Zookeeper ensemble

If your Solr instances don't max out your ec2 instances you should be fine.
But maybe even micro instances will suffice. Or 1 on demand and 2 spot
ones. If cost is the concern, that is.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Nov 21, 2012 5:07 PM, "Marcin Rzewucki" <mr...@gmail.com> wrote:

> Yes, I meant the same (not -zkRun). However, I was asking if it is safe to
> have zookeeper and solr processes running on the same node or better on
> different machines?
>
> On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:
>
> > Hello!
> >
> > As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> > rather setup a standalone one.
> >
> > --
> > Regards,
> >  Rafał Kuć
> >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >
> > > First of all: thank you for your answers. Yes, I meant side by side
> > > configuration. I think the worst case for ZKs here is to loose two of
> > them.
> > > However, I'm going to use 4 availability zones in same region so at
> least
> > > this will reduce the risk of loosing both of them at the same time.
> > > Regards.
> >
> > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
> >
> > >> Hello!
> > >>
> > >> Zookeeper by itself is not demanding, but if something happens to 
> > >> your
> > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> > >> them installed side by side. However if you will have 4 Solr nodes 
> > >> and
> > >> 3 ZK instances you can get them running side by side.
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >>
> > >> > Separate is generally nice because then you can restart Solr nodes
> > >> > without consideration for ZooKeeper.
> > >>
> > >> > Performance-wise, I doubt it's a big deal either way.
> > >>
> > >> > - Mark
> > >>
> > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com>
> > >> wrote:
> > >>
> > >> >> Hi,
> > >> >>
> > >> >> I have 4 solr collections, 2-3mn documents per collection, up to
> 100K
> > >> >> updates per collection daily (roughly). I'm going to create
> > SolrCloud4x
> > >> on
> > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> > question is
> > >> >> what about zookeeper? It's going to be external ensemble, but is 
> > >> >> it
> > >> better
> > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> > does
> > >> not
> > >> >> seem to be resources demanding process, but what would be better 
> > >> >> in
> > this
> > >> >> case ? To keep it inside of solrcloud or separately (micro
> instances
> > >> seem
> > >> >> to be enough here) ?
> > >> >>
> > >> >> Thanks in advance.
> > >> >> Regards.
> > >>
> > >>
> >
> >
> 


Re: SolrCloud and external Zookeeper ensemble

Posted by Otis Gospodnetic <ot...@gmail.com>.
If your Solr instances don't max out your ec2 instances you should be fine.
But maybe even micro instances will suffice. Or 1 on demand and 2 spot
ones. If cost is the concern, that is.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Nov 21, 2012 5:07 PM, "Marcin Rzewucki" <mr...@gmail.com> wrote:

> Yes, I meant the same (not -zkRun). However, I was asking if it is safe to
> have zookeeper and solr processes running on the same node or better on
> different machines?
>
> On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:
>
> > Hello!
> >
> > As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> > rather setup a standalone one.
> >
> > --
> > Regards,
> >  Rafał Kuć
> >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >
> > > First of all: thank you for your answers. Yes, I meant side by side
> > > configuration. I think the worst case for ZKs here is to loose two of
> > them.
> > > However, I'm going to use 4 availability zones in same region so at
> least
> > > this will reduce the risk of loosing both of them at the same time.
> > > Regards.
> >
> > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
> >
> > >> Hello!
> > >>
> > >> Zookeeper by itself is not demanding, but if something happens to your
> > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> > >> them installed side by side. However if you will have 4 Solr nodes and
> > >> 3 ZK instances you can get them running side by side.
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >>
> > >> > Separate is generally nice because then you can restart Solr nodes
> > >> > without consideration for ZooKeeper.
> > >>
> > >> > Performance-wise, I doubt it's a big deal either way.
> > >>
> > >> > - Mark
> > >>
> > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com>
> > >> wrote:
> > >>
> > >> >> Hi,
> > >> >>
> > >> >> I have 4 solr collections, 2-3mn documents per collection, up to
> 100K
> > >> >> updates per collection daily (roughly). I'm going to create
> > SolrCloud4x
> > >> on
> > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> > question is
> > >> >> what about zookeeper? It's going to be external ensemble, but is it
> > >> better
> > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> > does
> > >> not
> > >> >> seem to be resources demanding process, but what would be better in
> > this
> > >> >> case ? To keep it inside of solrcloud or separately (micro
> instances
> > >> seem
> > >> >> to be enough here) ?
> > >> >>
> > >> >> Thanks in advance.
> > >> >> Regards.
> > >>
> > >>
> >
> >
>

Re: SolrCloud and external Zookeeper ensemble

Posted by Marcin Rzewucki <mr...@gmail.com>.
Yes, this is exactly my case. I prefer 3rd option too. As I have 2 more
instances to be used for my purposes (SolrCloud4x + 2 more instances for
loading) it will be easier to configure zookeeper ensemble (as I can use
those 2 additional machines + 1 from SolrCloud) and avoid more instances to
be purchased and maintained.

On 22 November 2012 10:18, Luis Cappa Banda <lu...@gmail.com> wrote:

> Hello,
>
> I´ve been dealing with the same question these days. In architecture terms,
> it´s always better to separate services (Solr and Zookeeper, in this case)
> rather to keep them in a single instance. However, when we have to deal
> with costs issues, all of use we are quite limitated and we must elect the
> best architecture/scalable/single point of failure option. As I see, the
> options are:
>
>
> *1. *Solr servers with Zookeeper embeded.
> *2. *Solr servers with external Zookeeper.
> *3.* Solr servers with external Zookeeper ensemble.
>
> *Note*: as far as I know, the recommended number of Zookeeper services to
> avoid single points of failure is:* ZkNum = 2 * Numshards - 1*. If you have
>
>
> The best option is the third one. Reasons:
>
> *1. *If one of your Solr servers goes down, Zookeeper services still up.
> *2.* If one of your Zookeeper services goes down, Solr servers and the rest
> of Zookeeper services still up.
>
> Considering that option, we have two ways to implement it in production:
>
> *1. *Each service (Solr and Zookeeper) in separate machines. Let´s imagine
> that we have 2 shards for a given collection, so we need at least 4 Solr
> servers to complete the leader-replica configuration. The best option is to
> deploy them in for Amazon instances, one per each server. We need at least
> 3 Zookeeper services in a Zookeeper ensemble configuration. The optimal way
> to install them is in separates machines (micro instance will be nice for
> Zookeeper), so we will have 7 Amazon instances. The reason is that if one
> machine goes down (Solr or Zookeeper one) the others services may still up
> and your production environment will be safe. However,* for me this is the
> best case, but it´s the more expensive one*, so in my case is imposible to
> make real.
>
> *2. *As wee need at least 4 Solr servers and 3 Zookeeper services up, I
> would install three Amazon instances with Solr and Zookeeper, and one of
> them only with Solr. So we´ll have: 3 complete Amazon instances (Solr +
> Zookeeper) and 1 single Amazon instance  (only Solr). If one of them goes
> down, the production environment will be safe. This architecture is not the
> best one, as I told you, but I think that is optimal in terms of
> robustness, single point of failure and costs.
>
>
> It would be a pleasure to hear new suggestions from other people that
> dealed with this kind of issues.
>
> Regards,
>
>
> - Luis Cappa.
>
>
> 2012/11/21 Marcin Rzewucki <mr...@gmail.com>
>
> > Yes, I meant the same (not -zkRun). However, I was asking if it is safe
> to
> > have zookeeper and solr processes running on the same node or better on
> > different machines?
> >
> > On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:
> >
> > > Hello!
> > >
> > > As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> > > rather setup a standalone one.
> > >
> > > --
> > > Regards,
> > >  Rafał Kuć
> > >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >
> > > > First of all: thank you for your answers. Yes, I meant side by side
> > > > configuration. I think the worst case for ZKs here is to loose two of
> > > them.
> > > > However, I'm going to use 4 availability zones in same region so at
> > least
> > > > this will reduce the risk of loosing both of them at the same time.
> > > > Regards.
> > >
> > > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
> > >
> > > >> Hello!
> > > >>
> > > >> Zookeeper by itself is not demanding, but if something happens to
> your
> > > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> > > >> them installed side by side. However if you will have 4 Solr nodes
> and
> > > >> 3 ZK instances you can get them running side by side.
> > > >>
> > > >> --
> > > >> Regards,
> > > >>  Rafał Kuć
> > > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > ElasticSearch
> > > >>
> > > >> > Separate is generally nice because then you can restart Solr nodes
> > > >> > without consideration for ZooKeeper.
> > > >>
> > > >> > Performance-wise, I doubt it's a big deal either way.
> > > >>
> > > >> > - Mark
> > > >>
> > > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mrzewucki@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> >> Hi,
> > > >> >>
> > > >> >> I have 4 solr collections, 2-3mn documents per collection, up to
> > 100K
> > > >> >> updates per collection daily (roughly). I'm going to create
> > > SolrCloud4x
> > > >> on
> > > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> > > question is
> > > >> >> what about zookeeper? It's going to be external ensemble, but is
> it
> > > >> better
> > > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> > > does
> > > >> not
> > > >> >> seem to be resources demanding process, but what would be better
> in
> > > this
> > > >> >> case ? To keep it inside of solrcloud or separately (micro
> > instances
> > > >> seem
> > > >> >> to be enough here) ?
> > > >> >>
> > > >> >> Thanks in advance.
> > > >> >> Regards.
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
>
> - Luis Cappa
>

Re: SolrCloud and external Zookeeper ensemble

Posted by Otis Gospodnetic <ot...@gmail.com>.
Note the number of zookeeper nodes is independent of number of shards.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Nov 22, 2012 4:19 AM, "Luis Cappa Banda" <lu...@gmail.com> wrote:

> Hello,
>
> I´ve been dealing with the same question these days. In architecture terms,
> it´s always better to separate services (Solr and Zookeeper, in this case)
> rather to keep them in a single instance. However, when we have to deal
> with costs issues, all of use we are quite limitated and we must elect the
> best architecture/scalable/single point of failure option. As I see, the
> options are:
>
>
> *1. *Solr servers with Zookeeper embeded.
> *2. *Solr servers with external Zookeeper.
> *3.* Solr servers with external Zookeeper ensemble.
>
> *Note*: as far as I know, the recommended number of Zookeeper services to
> avoid single points of failure is:* ZkNum = 2 * Numshards - 1*. If you have
>
>
> The best option is the third one. Reasons:
>
> *1. *If one of your Solr servers goes down, Zookeeper services still up.
> *2.* If one of your Zookeeper services goes down, Solr servers and the rest
> of Zookeeper services still up.
>
> Considering that option, we have two ways to implement it in production:
>
> *1. *Each service (Solr and Zookeeper) in separate machines. Let´s imagine
> that we have 2 shards for a given collection, so we need at least 4 Solr
> servers to complete the leader-replica configuration. The best option is to
> deploy them in for Amazon instances, one per each server. We need at least
> 3 Zookeeper services in a Zookeeper ensemble configuration. The optimal way
> to install them is in separates machines (micro instance will be nice for
> Zookeeper), so we will have 7 Amazon instances. The reason is that if one
> machine goes down (Solr or Zookeeper one) the others services may still up
> and your production environment will be safe. However,* for me this is the
> best case, but it´s the more expensive one*, so in my case is imposible to
> make real.
>
> *2. *As wee need at least 4 Solr servers and 3 Zookeeper services up, I
> would install three Amazon instances with Solr and Zookeeper, and one of
> them only with Solr. So we´ll have: 3 complete Amazon instances (Solr +
> Zookeeper) and 1 single Amazon instance  (only Solr). If one of them goes
> down, the production environment will be safe. This architecture is not the
> best one, as I told you, but I think that is optimal in terms of
> robustness, single point of failure and costs.
>
>
> It would be a pleasure to hear new suggestions from other people that
> dealed with this kind of issues.
>
> Regards,
>
>
> - Luis Cappa.
>
>
> 2012/11/21 Marcin Rzewucki <mr...@gmail.com>
>
> > Yes, I meant the same (not -zkRun). However, I was asking if it is safe
> to
> > have zookeeper and solr processes running on the same node or better on
> > different machines?
> >
> > On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:
> >
> > > Hello!
> > >
> > > As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> > > rather setup a standalone one.
> > >
> > > --
> > > Regards,
> > >  Rafał Kuć
> > >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >
> > > > First of all: thank you for your answers. Yes, I meant side by side
> > > > configuration. I think the worst case for ZKs here is to loose two of
> > > them.
> > > > However, I'm going to use 4 availability zones in same region so at
> > least
> > > > this will reduce the risk of loosing both of them at the same time.
> > > > Regards.
> > >
> > > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
> > >
> > > >> Hello!
> > > >>
> > > >> Zookeeper by itself is not demanding, but if something happens to
> your
> > > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> > > >> them installed side by side. However if you will have 4 Solr nodes
> and
> > > >> 3 ZK instances you can get them running side by side.
> > > >>
> > > >> --
> > > >> Regards,
> > > >>  Rafał Kuć
> > > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > ElasticSearch
> > > >>
> > > >> > Separate is generally nice because then you can restart Solr nodes
> > > >> > without consideration for ZooKeeper.
> > > >>
> > > >> > Performance-wise, I doubt it's a big deal either way.
> > > >>
> > > >> > - Mark
> > > >>
> > > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mrzewucki@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> >> Hi,
> > > >> >>
> > > >> >> I have 4 solr collections, 2-3mn documents per collection, up to
> > 100K
> > > >> >> updates per collection daily (roughly). I'm going to create
> > > SolrCloud4x
> > > >> on
> > > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> > > question is
> > > >> >> what about zookeeper? It's going to be external ensemble, but is
> it
> > > >> better
> > > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> > > does
> > > >> not
> > > >> >> seem to be resources demanding process, but what would be better
> in
> > > this
> > > >> >> case ? To keep it inside of solrcloud or separately (micro
> > instances
> > > >> seem
> > > >> >> to be enough here) ?
> > > >> >>
> > > >> >> Thanks in advance.
> > > >> >> Regards.
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
>
> - Luis Cappa
>

Re: SolrCloud and external Zookeeper ensemble

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/22/2012 2:18 AM, Luis Cappa Banda wrote:
> I´ve been dealing with the same question these days. In architecture terms,
> it´s always better to separate services (Solr and Zookeeper, in this case)
> rather to keep them in a single instance. However, when we have to deal
> with costs issues, all of use we are quite limitated and we must elect the
> best architecture/scalable/single point of failure option. As I see, the
> options are:
>
>
> *1. *Solr servers with Zookeeper embeded.
> *2. *Solr servers with external Zookeeper.
> *3.* Solr servers with external Zookeeper ensemble.

I've never used SolrCloud, so this is all speculation based on what I've 
been reading.  That has been mostly on this list, but also on dev@l.o 
and the IRC channel.

I have a four-node Solr 3.5 deployment with about 80 million documents 
(130GB) in the distributed index.  I think of my installation as small.  
Others might disagree with my opinion, but I know there are a lot of 
indexes out there that make mine look tiny.

If I needed to set a similarly small setup with SolrCloud on four Solr 
servers, what I would pitch to management would be one extra machine 
(cheap, 1U, low-end processor, etc) to act as a standalone zookeeper 
node.  For the other two zookeper instances, I would run standalone 
zookeeper (separate JVM from Solr) on two of the Solr servers.  I might 
ask for a small boost in RAM and/or CPU on the two servers that serve 
double-duty.  I would not run zookeeper in the same JVM as Solr.

With a little bit of growth in the cluster, I would ask for a second 
standalone zookeeper node, pulling zookeeper off one of the Solr 
servers.  If it continued to grow, then I would ask for the third.  I 
would leave blank spots in the rack for those standalone servers.

Thanks,
Shawn


Re: SolrCloud and external Zookeeper ensemble

Posted by Luis Cappa Banda <lu...@gmail.com>.
Hello,

I´ve been dealing with the same question these days. In architecture terms,
it´s always better to separate services (Solr and Zookeeper, in this case)
rather to keep them in a single instance. However, when we have to deal
with costs issues, all of use we are quite limitated and we must elect the
best architecture/scalable/single point of failure option. As I see, the
options are:


*1. *Solr servers with Zookeeper embeded.
*2. *Solr servers with external Zookeeper.
*3.* Solr servers with external Zookeeper ensemble.

*Note*: as far as I know, the recommended number of Zookeeper services to
avoid single points of failure is:* ZkNum = 2 * Numshards - 1*. If you have


The best option is the third one. Reasons:

*1. *If one of your Solr servers goes down, Zookeeper services still up.
*2.* If one of your Zookeeper services goes down, Solr servers and the rest
of Zookeeper services still up.

Considering that option, we have two ways to implement it in production:

*1. *Each service (Solr and Zookeeper) in separate machines. Let´s imagine
that we have 2 shards for a given collection, so we need at least 4 Solr
servers to complete the leader-replica configuration. The best option is to
deploy them in for Amazon instances, one per each server. We need at least
3 Zookeeper services in a Zookeeper ensemble configuration. The optimal way
to install them is in separates machines (micro instance will be nice for
Zookeeper), so we will have 7 Amazon instances. The reason is that if one
machine goes down (Solr or Zookeeper one) the others services may still up
and your production environment will be safe. However,* for me this is the
best case, but it´s the more expensive one*, so in my case is imposible to
make real.

*2. *As wee need at least 4 Solr servers and 3 Zookeeper services up, I
would install three Amazon instances with Solr and Zookeeper, and one of
them only with Solr. So we´ll have: 3 complete Amazon instances (Solr +
Zookeeper) and 1 single Amazon instance  (only Solr). If one of them goes
down, the production environment will be safe. This architecture is not the
best one, as I told you, but I think that is optimal in terms of
robustness, single point of failure and costs.


It would be a pleasure to hear new suggestions from other people that
dealed with this kind of issues.

Regards,


- Luis Cappa.


2012/11/21 Marcin Rzewucki <mr...@gmail.com>

> Yes, I meant the same (not -zkRun). However, I was asking if it is safe to
> have zookeeper and solr processes running on the same node or better on
> different machines?
>
> On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:
>
> > Hello!
> >
> > As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> > rather setup a standalone one.
> >
> > --
> > Regards,
> >  Rafał Kuć
> >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >
> > > First of all: thank you for your answers. Yes, I meant side by side
> > > configuration. I think the worst case for ZKs here is to loose two of
> > them.
> > > However, I'm going to use 4 availability zones in same region so at
> least
> > > this will reduce the risk of loosing both of them at the same time.
> > > Regards.
> >
> > > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
> >
> > >> Hello!
> > >>
> > >> Zookeeper by itself is not demanding, but if something happens to your
> > >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> > >> them installed side by side. However if you will have 4 Solr nodes and
> > >> 3 ZK instances you can get them running side by side.
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > ElasticSearch
> > >>
> > >> > Separate is generally nice because then you can restart Solr nodes
> > >> > without consideration for ZooKeeper.
> > >>
> > >> > Performance-wise, I doubt it's a big deal either way.
> > >>
> > >> > - Mark
> > >>
> > >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com>
> > >> wrote:
> > >>
> > >> >> Hi,
> > >> >>
> > >> >> I have 4 solr collections, 2-3mn documents per collection, up to
> 100K
> > >> >> updates per collection daily (roughly). I'm going to create
> > SolrCloud4x
> > >> on
> > >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> > question is
> > >> >> what about zookeeper? It's going to be external ensemble, but is it
> > >> better
> > >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> > does
> > >> not
> > >> >> seem to be resources demanding process, but what would be better in
> > this
> > >> >> case ? To keep it inside of solrcloud or separately (micro
> instances
> > >> seem
> > >> >> to be enough here) ?
> > >> >>
> > >> >> Thanks in advance.
> > >> >> Regards.
> > >>
> > >>
> >
> >
>



-- 

- Luis Cappa

Re: SolrCloud and external Zookeeper ensemble

Posted by Marcin Rzewucki <mr...@gmail.com>.
Yes, I meant the same (not -zkRun). However, I was asking if it is safe to
have zookeeper and solr processes running on the same node or better on
different machines?

On 21 November 2012 21:18, Rafał Kuć <r....@solr.pl> wrote:

> Hello!
>
> As I told I wouldn't use the Zookeeper that is embedded into Solr, but
> rather setup a standalone one.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > First of all: thank you for your answers. Yes, I meant side by side
> > configuration. I think the worst case for ZKs here is to loose two of
> them.
> > However, I'm going to use 4 availability zones in same region so at least
> > this will reduce the risk of loosing both of them at the same time.
> > Regards.
>
> > On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:
>
> >> Hello!
> >>
> >> Zookeeper by itself is not demanding, but if something happens to your
> >> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> >> them installed side by side. However if you will have 4 Solr nodes and
> >> 3 ZK instances you can get them running side by side.
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> ElasticSearch
> >>
> >> > Separate is generally nice because then you can restart Solr nodes
> >> > without consideration for ZooKeeper.
> >>
> >> > Performance-wise, I doubt it's a big deal either way.
> >>
> >> > - Mark
> >>
> >> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com>
> >> wrote:
> >>
> >> >> Hi,
> >> >>
> >> >> I have 4 solr collections, 2-3mn documents per collection, up to 100K
> >> >> updates per collection daily (roughly). I'm going to create
> SolrCloud4x
> >> on
> >> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The
> question is
> >> >> what about zookeeper? It's going to be external ensemble, but is it
> >> better
> >> >> to use same nodes as solr or dedicated micro instances? Zookeeper
> does
> >> not
> >> >> seem to be resources demanding process, but what would be better in
> this
> >> >> case ? To keep it inside of solrcloud or separately (micro instances
> >> seem
> >> >> to be enough here) ?
> >> >>
> >> >> Thanks in advance.
> >> >> Regards.
> >>
> >>
>
>

Re: SolrCloud and external Zookeeper ensemble

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

As I told I wouldn't use the Zookeeper that is embedded into Solr, but
rather setup a standalone one. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> First of all: thank you for your answers. Yes, I meant side by side
> configuration. I think the worst case for ZKs here is to loose two of them.
> However, I'm going to use 4 availability zones in same region so at least
> this will reduce the risk of loosing both of them at the same time.
> Regards.

> On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:

>> Hello!
>>
>> Zookeeper by itself is not demanding, but if something happens to your
>> nodes that have Solr on it, you'll loose ZooKeeper too if you have
>> them installed side by side. However if you will have 4 Solr nodes and
>> 3 ZK instances you can get them running side by side.
>>
>> --
>> Regards,
>>  Rafał Kuć
>>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>>
>> > Separate is generally nice because then you can restart Solr nodes
>> > without consideration for ZooKeeper.
>>
>> > Performance-wise, I doubt it's a big deal either way.
>>
>> > - Mark
>>
>> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com>
>> wrote:
>>
>> >> Hi,
>> >>
>> >> I have 4 solr collections, 2-3mn documents per collection, up to 100K
>> >> updates per collection daily (roughly). I'm going to create SolrCloud4x
>> on
>> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
>> >> what about zookeeper? It's going to be external ensemble, but is it
>> better
>> >> to use same nodes as solr or dedicated micro instances? Zookeeper does
>> not
>> >> seem to be resources demanding process, but what would be better in this
>> >> case ? To keep it inside of solrcloud or separately (micro instances
>> seem
>> >> to be enough here) ?
>> >>
>> >> Thanks in advance.
>> >> Regards.
>>
>>


Re: SolrCloud and external Zookeeper ensemble

Posted by Marcin Rzewucki <mr...@gmail.com>.
First of all: thank you for your answers. Yes, I meant side by side
configuration. I think the worst case for ZKs here is to loose two of them.
However, I'm going to use 4 availability zones in same region so at least
this will reduce the risk of loosing both of them at the same time.
Regards.

On 21 November 2012 17:06, Rafał Kuć <r....@solr.pl> wrote:

> Hello!
>
> Zookeeper by itself is not demanding, but if something happens to your
> nodes that have Solr on it, you'll loose ZooKeeper too if you have
> them installed side by side. However if you will have 4 Solr nodes and
> 3 ZK instances you can get them running side by side.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > Separate is generally nice because then you can restart Solr nodes
> > without consideration for ZooKeeper.
>
> > Performance-wise, I doubt it's a big deal either way.
>
> > - Mark
>
> > On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com>
> wrote:
>
> >> Hi,
> >>
> >> I have 4 solr collections, 2-3mn documents per collection, up to 100K
> >> updates per collection daily (roughly). I'm going to create SolrCloud4x
> on
> >> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
> >> what about zookeeper? It's going to be external ensemble, but is it
> better
> >> to use same nodes as solr or dedicated micro instances? Zookeeper does
> not
> >> seem to be resources demanding process, but what would be better in this
> >> case ? To keep it inside of solrcloud or separately (micro instances
> seem
> >> to be enough here) ?
> >>
> >> Thanks in advance.
> >> Regards.
>
>

Re: SolrCloud and external Zookeeper ensemble

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

Zookeeper by itself is not demanding, but if something happens to your
nodes that have Solr on it, you'll loose ZooKeeper too if you have
them installed side by side. However if you will have 4 Solr nodes and
3 ZK instances you can get them running side by side. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Separate is generally nice because then you can restart Solr nodes
> without consideration for ZooKeeper.

> Performance-wise, I doubt it's a big deal either way.

> - Mark

> On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com> wrote:

>> Hi,
>> 
>> I have 4 solr collections, 2-3mn documents per collection, up to 100K
>> updates per collection daily (roughly). I'm going to create SolrCloud4x on
>> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
>> what about zookeeper? It's going to be external ensemble, but is it better
>> to use same nodes as solr or dedicated micro instances? Zookeeper does not
>> seem to be resources demanding process, but what would be better in this
>> case ? To keep it inside of solrcloud or separately (micro instances seem
>> to be enough here) ?
>> 
>> Thanks in advance.
>> Regards.


Re: SolrCloud and external Zookeeper ensemble

Posted by Mark Miller <ma...@gmail.com>.
Separate is generally nice because then you can restart Solr nodes without consideration for ZooKeeper.

Performance-wise, I doubt it's a big deal either way.

- Mark

On Nov 21, 2012, at 8:54 AM, Marcin Rzewucki <mr...@gmail.com> wrote:

> Hi,
> 
> I have 4 solr collections, 2-3mn documents per collection, up to 100K
> updates per collection daily (roughly). I'm going to create SolrCloud4x on
> Amazon's m1.large instances (7GB mem,2x2.4GHz cpu each). The question is
> what about zookeeper? It's going to be external ensemble, but is it better
> to use same nodes as solr or dedicated micro instances? Zookeeper does not
> seem to be resources demanding process, but what would be better in this
> case ? To keep it inside of solrcloud or separately (micro instances seem
> to be enough here) ?
> 
> Thanks in advance.
> Regards.