You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nick Chase <nc...@earthlink.net> on 2012/11/11 16:12:40 UTC

Internal Vs. External ZooKeeper

OK, I can't find a definitive answer on this.  The wiki says not to use 
the embedded ZooKeeper servers for production.  But my question is: why 
not?  Basically, what are the reasons and circumstances that make you 
better off using an external ZooKeeper ensemble?

Thanks...

---- Nick

Re: Internal Vs. External ZooKeeper

Posted by Anirudha Jadhav <an...@nyu.edu>.
Thanks mark !


On Sun, Nov 11, 2012 at 5:46 PM, Mark Miller <ma...@gmail.com> wrote:

> When SolrCloud is in a steady state (eg the number of nodes in the cluster
> is not changing and config is not changing), Solr does not really talk to
> ZooKeeper other than really light stuff like a heartbeat and maintaining a
> connection. So performance is not likely a large concern here.
>
> Mostly it's just a hassle because ZooKeeper does not currently support
> dynamically changing the nodes in an ensemble without doing a rolling
> restart. There are JIRA issues that are being worked on that will help with
> this though.
>
> Until then, it's just kind of a pain that some nodes have to be special or
> you have to do rolling restarts to make additional nodes part of the zk
> quorum.
>
> It's really up to you though - having the services separate just seems
> "nicer" to me. Easier to maintain. Often, once you start running ZooKeeper
> for one thing, you may end up running other things that use ZooKeeper as
> well - many people like to colocate this stuff on a single dedicated
> ZooKeeper ensemble.
>
> Embedded will run just fine - we simply recommend the other way to save
> headaches. If you know what you are getting into, it's certainly a valid
> choice.
>
> - Mark
>
>
> On 11/11/2012 05:11 PM, Anirudha Jadhav wrote:
>
>> let me see if i get this correctly,
>>
>> greater the no.of zookeeper nodes , more the time it takes to come to a
>> consensus.
>>
>> During an indexing operation, how many times does a solr client needs to
>> contact zookeeper for consensus ?
>> - per docs ? per commit ? ?
>>
>> thanks,
>> Ani
>>
>>
>> On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase <nc...@earthlink.net>
>> wrote:
>>
>>  Thanks, Jack, this is a great explanation!  And since a greater number of
>>> ZK nodes tends to degrade write performance, that would be a factor in
>>> making every Solr node a ZK node as well.  Much obliged!
>>>
>>> ----  Nick
>>>
>>>
>>> On 11/11/2012 10:45 AM, Jack Krupansky wrote:
>>>
>>>  "Production" typically implies "high availability" and in a distributed
>>>> system the goal is that the overall cluster integrity and performance
>>>> should not be compromised just because a few "worker" nodes go down.
>>>> Solr nodes do a lot of complex operations and are quite prone to running
>>>> into "issues" that compromise their integrity and require that they be
>>>> taken down, restarted, etc. In fact, taking down a "bunch" of Solr
>>>> "worker" nodes should not be a big deal (unless they are all of the
>>>> nodes/replicas from a single shard/slice), while taking down a "bunch"
>>>> of zookeepers could be catastrophic to maintaining the integrity of the
>>>> zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
>>>> a "bunch" of Solr nodes would generally be less than a quorum, so maybe
>>>> that is not an absolute issue per se.) Zookeeper nodes are categorically
>>>> distinct in terms of their importance to maintaining the integrity and
>>>> availability of the overall cluster. They are special in that sense. And
>>>> they are special because they are maintaining the integrity of the
>>>> cluster's configuration information. Even for large clusters their
>>>> number will be relatively "few" compared to the "many" of "worker" nodes
>>>> (replicas), so zookeeper nodes need to be "protected" from the vagaries
>>>> that can disrupt and take Solr nodes down, not the least of which is
>>>> incoming traffic.
>>>>
>>>> I'm not sure what the implications would be if you had a large cluster
>>>> and because Zookeeper was embedded you had a large number of zookeepers.
>>>> Any of the inter-zookeeper operations would take longer and could be
>>>> compromised by even a single busy/overloaded/dead Solr node. OTOH, the
>>>> Zookeeper ensemble design is supposed to be able to handle a far number
>>>> of missing zookeeper nodes.
>>>>
>>>> OTOH, if high availability is not a requirement for a production cluster
>>>> (use case?), then non-embedded zookeepers are certainly an annoyance.
>>>>
>>>> Maybe you could think of embedded zookeeper like every employee having
>>>> their manager sitting right next to them all the time. How could that be
>>>> anything but a bad idea in terms of maximizing worker output - and
>>>> distracting/preventing managers from focusing on their own "work"?
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Nick Chase
>>>> Sent: Sunday, November 11, 2012 7:12 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Internal Vs. External ZooKeeper
>>>>
>>>> OK, I can't find a definitive answer on this.  The wiki says not to use
>>>> the embedded ZooKeeper servers for production.  But my question is: why
>>>> not?  Basically, what are the reasons and circumstances that make you
>>>> better off using an external ZooKeeper ensemble?
>>>>
>>>> Thanks...
>>>>
>>>> ---- Nick
>>>>
>>>>
>>>>
>>
>


-- 
Anirudha P. Jadhav

Re: Internal Vs. External ZooKeeper

Posted by Mark Miller <ma...@gmail.com>.
When SolrCloud is in a steady state (eg the number of nodes in the 
cluster is not changing and config is not changing), Solr does not 
really talk to ZooKeeper other than really light stuff like a heartbeat 
and maintaining a connection. So performance is not likely a large 
concern here.

Mostly it's just a hassle because ZooKeeper does not currently support 
dynamically changing the nodes in an ensemble without doing a rolling 
restart. There are JIRA issues that are being worked on that will help 
with this though.

Until then, it's just kind of a pain that some nodes have to be special 
or you have to do rolling restarts to make additional nodes part of the 
zk quorum.

It's really up to you though - having the services separate just seems 
"nicer" to me. Easier to maintain. Often, once you start running 
ZooKeeper for one thing, you may end up running other things that use 
ZooKeeper as well - many people like to colocate this stuff on a single 
dedicated ZooKeeper ensemble.

Embedded will run just fine - we simply recommend the other way to save 
headaches. If you know what you are getting into, it's certainly a valid 
choice.

- Mark

On 11/11/2012 05:11 PM, Anirudha Jadhav wrote:
> let me see if i get this correctly,
>
> greater the no.of zookeeper nodes , more the time it takes to come to a
> consensus.
>
> During an indexing operation, how many times does a solr client needs to
> contact zookeeper for consensus ?
> - per docs ? per commit ? ?
>
> thanks,
> Ani
>
>
> On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase <nc...@earthlink.net> wrote:
>
>> Thanks, Jack, this is a great explanation!  And since a greater number of
>> ZK nodes tends to degrade write performance, that would be a factor in
>> making every Solr node a ZK node as well.  Much obliged!
>>
>> ----  Nick
>>
>>
>> On 11/11/2012 10:45 AM, Jack Krupansky wrote:
>>
>>> "Production" typically implies "high availability" and in a distributed
>>> system the goal is that the overall cluster integrity and performance
>>> should not be compromised just because a few "worker" nodes go down.
>>> Solr nodes do a lot of complex operations and are quite prone to running
>>> into "issues" that compromise their integrity and require that they be
>>> taken down, restarted, etc. In fact, taking down a "bunch" of Solr
>>> "worker" nodes should not be a big deal (unless they are all of the
>>> nodes/replicas from a single shard/slice), while taking down a "bunch"
>>> of zookeepers could be catastrophic to maintaining the integrity of the
>>> zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
>>> a "bunch" of Solr nodes would generally be less than a quorum, so maybe
>>> that is not an absolute issue per se.) Zookeeper nodes are categorically
>>> distinct in terms of their importance to maintaining the integrity and
>>> availability of the overall cluster. They are special in that sense. And
>>> they are special because they are maintaining the integrity of the
>>> cluster's configuration information. Even for large clusters their
>>> number will be relatively "few" compared to the "many" of "worker" nodes
>>> (replicas), so zookeeper nodes need to be "protected" from the vagaries
>>> that can disrupt and take Solr nodes down, not the least of which is
>>> incoming traffic.
>>>
>>> I'm not sure what the implications would be if you had a large cluster
>>> and because Zookeeper was embedded you had a large number of zookeepers.
>>> Any of the inter-zookeeper operations would take longer and could be
>>> compromised by even a single busy/overloaded/dead Solr node. OTOH, the
>>> Zookeeper ensemble design is supposed to be able to handle a far number
>>> of missing zookeeper nodes.
>>>
>>> OTOH, if high availability is not a requirement for a production cluster
>>> (use case?), then non-embedded zookeepers are certainly an annoyance.
>>>
>>> Maybe you could think of embedded zookeeper like every employee having
>>> their manager sitting right next to them all the time. How could that be
>>> anything but a bad idea in terms of maximizing worker output - and
>>> distracting/preventing managers from focusing on their own "work"?
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Nick Chase
>>> Sent: Sunday, November 11, 2012 7:12 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Internal Vs. External ZooKeeper
>>>
>>> OK, I can't find a definitive answer on this.  The wiki says not to use
>>> the embedded ZooKeeper servers for production.  But my question is: why
>>> not?  Basically, what are the reasons and circumstances that make you
>>> better off using an external ZooKeeper ensemble?
>>>
>>> Thanks...
>>>
>>> ---- Nick
>>>
>>>
>


Re: Internal Vs. External ZooKeeper

Posted by Anirudha Jadhav <an...@nyu.edu>.
let me see if i get this correctly,

greater the no.of zookeeper nodes , more the time it takes to come to a
consensus.

During an indexing operation, how many times does a solr client needs to
contact zookeeper for consensus ?
- per docs ? per commit ? ?

thanks,
Ani


On Sun, Nov 11, 2012 at 11:17 AM, Nick Chase <nc...@earthlink.net> wrote:

> Thanks, Jack, this is a great explanation!  And since a greater number of
> ZK nodes tends to degrade write performance, that would be a factor in
> making every Solr node a ZK node as well.  Much obliged!
>
> ----  Nick
>
>
> On 11/11/2012 10:45 AM, Jack Krupansky wrote:
>
>> "Production" typically implies "high availability" and in a distributed
>> system the goal is that the overall cluster integrity and performance
>> should not be compromised just because a few "worker" nodes go down.
>> Solr nodes do a lot of complex operations and are quite prone to running
>> into "issues" that compromise their integrity and require that they be
>> taken down, restarted, etc. In fact, taking down a "bunch" of Solr
>> "worker" nodes should not be a big deal (unless they are all of the
>> nodes/replicas from a single shard/slice), while taking down a "bunch"
>> of zookeepers could be catastrophic to maintaining the integrity of the
>> zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
>> a "bunch" of Solr nodes would generally be less than a quorum, so maybe
>> that is not an absolute issue per se.) Zookeeper nodes are categorically
>> distinct in terms of their importance to maintaining the integrity and
>> availability of the overall cluster. They are special in that sense. And
>> they are special because they are maintaining the integrity of the
>> cluster's configuration information. Even for large clusters their
>> number will be relatively "few" compared to the "many" of "worker" nodes
>> (replicas), so zookeeper nodes need to be "protected" from the vagaries
>> that can disrupt and take Solr nodes down, not the least of which is
>> incoming traffic.
>>
>> I'm not sure what the implications would be if you had a large cluster
>> and because Zookeeper was embedded you had a large number of zookeepers.
>> Any of the inter-zookeeper operations would take longer and could be
>> compromised by even a single busy/overloaded/dead Solr node. OTOH, the
>> Zookeeper ensemble design is supposed to be able to handle a far number
>> of missing zookeeper nodes.
>>
>> OTOH, if high availability is not a requirement for a production cluster
>> (use case?), then non-embedded zookeepers are certainly an annoyance.
>>
>> Maybe you could think of embedded zookeeper like every employee having
>> their manager sitting right next to them all the time. How could that be
>> anything but a bad idea in terms of maximizing worker output - and
>> distracting/preventing managers from focusing on their own "work"?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Nick Chase
>> Sent: Sunday, November 11, 2012 7:12 AM
>> To: solr-user@lucene.apache.org
>> Subject: Internal Vs. External ZooKeeper
>>
>> OK, I can't find a definitive answer on this.  The wiki says not to use
>> the embedded ZooKeeper servers for production.  But my question is: why
>> not?  Basically, what are the reasons and circumstances that make you
>> better off using an external ZooKeeper ensemble?
>>
>> Thanks...
>>
>> ---- Nick
>>
>>


-- 
Anirudha P. Jadhav

Re: Internal Vs. External ZooKeeper

Posted by Nick Chase <nc...@earthlink.net>.
Thanks, Jack, this is a great explanation!  And since a greater number 
of ZK nodes tends to degrade write performance, that would be a factor 
in making every Solr node a ZK node as well.  Much obliged!

----  Nick

On 11/11/2012 10:45 AM, Jack Krupansky wrote:
> "Production" typically implies "high availability" and in a distributed
> system the goal is that the overall cluster integrity and performance
> should not be compromised just because a few "worker" nodes go down.
> Solr nodes do a lot of complex operations and are quite prone to running
> into "issues" that compromise their integrity and require that they be
> taken down, restarted, etc. In fact, taking down a "bunch" of Solr
> "worker" nodes should not be a big deal (unless they are all of the
> nodes/replicas from a single shard/slice), while taking down a "bunch"
> of zookeepers could be catastrophic to maintaining the integrity of the
> zookeeper ensemble. (OTOH, if every Solr node is also a zookeeper node,
> a "bunch" of Solr nodes would generally be less than a quorum, so maybe
> that is not an absolute issue per se.) Zookeeper nodes are categorically
> distinct in terms of their importance to maintaining the integrity and
> availability of the overall cluster. They are special in that sense. And
> they are special because they are maintaining the integrity of the
> cluster's configuration information. Even for large clusters their
> number will be relatively "few" compared to the "many" of "worker" nodes
> (replicas), so zookeeper nodes need to be "protected" from the vagaries
> that can disrupt and take Solr nodes down, not the least of which is
> incoming traffic.
>
> I'm not sure what the implications would be if you had a large cluster
> and because Zookeeper was embedded you had a large number of zookeepers.
> Any of the inter-zookeeper operations would take longer and could be
> compromised by even a single busy/overloaded/dead Solr node. OTOH, the
> Zookeeper ensemble design is supposed to be able to handle a far number
> of missing zookeeper nodes.
>
> OTOH, if high availability is not a requirement for a production cluster
> (use case?), then non-embedded zookeepers are certainly an annoyance.
>
> Maybe you could think of embedded zookeeper like every employee having
> their manager sitting right next to them all the time. How could that be
> anything but a bad idea in terms of maximizing worker output - and
> distracting/preventing managers from focusing on their own "work"?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Nick Chase
> Sent: Sunday, November 11, 2012 7:12 AM
> To: solr-user@lucene.apache.org
> Subject: Internal Vs. External ZooKeeper
>
> OK, I can't find a definitive answer on this.  The wiki says not to use
> the embedded ZooKeeper servers for production.  But my question is: why
> not?  Basically, what are the reasons and circumstances that make you
> better off using an external ZooKeeper ensemble?
>
> Thanks...
>
> ---- Nick
>

Re: Internal Vs. External ZooKeeper

Posted by Jack Krupansky <ja...@basetechnology.com>.
"Production" typically implies "high availability" and in a distributed 
system the goal is that the overall cluster integrity and performance should 
not be compromised just because a few "worker" nodes go down. Solr nodes do 
a lot of complex operations and are quite prone to running into "issues" 
that compromise their integrity and require that they be taken down, 
restarted, etc. In fact, taking down a "bunch" of Solr "worker" nodes should 
not be a big deal (unless they are all of the nodes/replicas from a single 
shard/slice), while taking down a "bunch" of zookeepers could be 
catastrophic to maintaining the integrity of the zookeeper ensemble. (OTOH, 
if every Solr node is also a zookeeper node, a "bunch" of Solr nodes would 
generally be less than a quorum, so maybe that is not an absolute issue per 
se.) Zookeeper nodes are categorically distinct in terms of their importance 
to maintaining the integrity and availability of the overall cluster. They 
are special in that sense. And they are special because they are maintaining 
the integrity of the cluster's configuration information. Even for large 
clusters their number will be relatively "few" compared to the "many" of 
"worker" nodes (replicas), so zookeeper nodes need to be "protected" from 
the vagaries that can disrupt and take Solr nodes down, not the least of 
which is incoming traffic.

I'm not sure what the implications would be if you had a large cluster and 
because Zookeeper was embedded you had a large number of zookeepers. Any of 
the inter-zookeeper operations would take longer and could be compromised by 
even a single busy/overloaded/dead Solr node. OTOH, the Zookeeper ensemble 
design is supposed to be able to handle a far number of missing zookeeper 
nodes.

OTOH, if high availability is not a requirement for a production cluster 
(use case?), then non-embedded zookeepers are certainly an annoyance.

Maybe you could think of embedded zookeeper like every employee having their 
manager sitting right next to them all the time. How could that be anything 
but a bad idea in terms of maximizing worker output - and 
distracting/preventing managers from focusing on their own "work"?

-- Jack Krupansky

-----Original Message----- 
From: Nick Chase
Sent: Sunday, November 11, 2012 7:12 AM
To: solr-user@lucene.apache.org
Subject: Internal Vs. External ZooKeeper

OK, I can't find a definitive answer on this.  The wiki says not to use
the embedded ZooKeeper servers for production.  But my question is: why
not?  Basically, what are the reasons and circumstances that make you
better off using an external ZooKeeper ensemble?

Thanks...

---- Nick