You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Neha Narkhede <ne...@gmail.com> on 2011/12/20 21:14:02 UTC

Performing no downtime hardware changes to a live zookeeper cluster

Hi,

As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
zookeeper cluster to new hardware. I'm trying to figure out the best
strategy to achieve that with no downtime.
Here are some possible solutions I see at the moment, I could have
missed a few though -

1. Swap each machine out with a new machine, but with the same host/IP.

Pros: No client side config needs to be changed.
Cons: Relatively tedious task for Operations

2. Add new machines, with different host/IPs to the existing cluster,
and remove the older machines, taking care to maintain the quorum at
all times

Pros: Easier for Operations
Cons: Client side configs need to be changed and clients need to be
restarted/bounced. Another problem is having a large quorum for
sometime (potentially 9 nodes).

3. Hide the new cluster behind either a Hardware load balancer or a
DNS server resolving to all host ips.

Pros: Makes it easier to move hardware around in the future
Cons: Possible timeout issues with load balancers messing with
zookeeper functionality or performance

Read this and found it helpful -
http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
But would like to hear from the authors and the users who might have
tried this in a real production setup.

I'm very interested in finding a long term solution for masking the
zookeeper host names. Any inputs here are appreciated !

In addition to this, it will also be great to know what people think
about options 1 and 2, as a solution for hardware changes in
Zookeeper.

Thanks,
Neha

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Patrick Hunt <ph...@apache.org>.
Looks like to me. I marked it as such.

Patrick

On Mon, Jan 9, 2012 at 6:49 PM, Neha Narkhede <ne...@gmail.com> wrote:
> Patrick,
>
> Looks like https://issues.apache.org/jira/browse/ZOOKEEPER-1356 is a
> duplicate of 338 ? If yes, then I'll mark it to reflect the same.
>
> Thanks,
> Neha
>
> On Mon, Jan 9, 2012 at 5:36 PM, Patrick Hunt <ph...@apache.org> wrote:
>> dup of https://issues.apache.org/jira/browse/ZOOKEEPER-338 ?
>>
>> Patrick
>>
>> On Mon, Jan 9, 2012 at 3:17 PM, Ted Dunning <te...@gmail.com> wrote:
>>> Neha
>>>
>>> Filing a jira is a great way to further the discussion.
>>>
>>> Sent from my iPhone
>>>
>>> On Jan 9, 2012, at 15:33, Neha Narkhede <ne...@gmail.com> wrote:
>>>
>>>>>> If you just have machine names in a list that you pass in, then yes, we
>>>> could re-resolve on every reconnect and you could just re-alias that name
>>>> to a new IP. But you'll have to put in logic that will do that but not
>>>> break people using DNS RR.
>>>>
>>>> Having a list of machine names that can be changed to point to new IPs
>>>> seems reasonable too. To be able to do the upgrade without having to
>>>> restart all clients, besides turning off DNS caching in the JVM, we
>>>> still have to solve the problem of zookeeper client caching the IPs in
>>>> code. Having 2 levels of DNS caching, one in the JVM and one in code
>>>> (which cannot be turned off) doesn't look like a good idea. Unless I'm
>>>> missing the purpose of such IP caching in zookeeper ?
>>>>
>>>>>> I realize that moving machines is difficult when you have lots of clients.
>>>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>>>> machine move given a cluster of that complexity, though
>>>>
>>>> Its not like it can't be done, it definitely has quite some
>>>> operational overhead. We are trying to brainstorm various approaches
>>>> and come up with one that will involve the least overhead on such
>>>> upgrades going forward.
>>>>
>>>> Having said that, seems like re-resolving host names in reconnect
>>>> doesn't look like a bad idea, provided it doesn't break the DNS RR use
>>>> case. If that sounds good, can I go ahead a file a JIRA for this ?
>>>>
>>>> Thanks,
>>>> Neha
>>>>
>>>> On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <ca...@apache.org> wrote:
>>>>> We don't shuffle IPs after the initial resolution of IP addresses.
>>>>>
>>>>> In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
>>>>> robin through them trying to connect. If you re-resolve on every
>>>>> round-robin, you have to put in logic to know which ones have changed and
>>>>> somehow maintain that shuffle order or you aren't doing a fair back end
>>>>> round robin, which people using the ZK client against DNS RR are relying on
>>>>> today.
>>>>>
>>>>> If you just have machine names in a list that you pass in, then yes, we
>>>>> could re-resolve on every reconnect and you could just re-alias that name
>>>>> to a new IP. But you'll have to put in logic that will do that but not
>>>>> break people using DNS RR.
>>>>>
>>>>> I realize that moving machines is difficult when you have lots of clients.
>>>>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>>>>> machine move given a cluster of that complexity, though. I also think that
>>>>> if we're going to be putting special cases like this in we might just want
>>>>> to go all the way to a pluggable reconnection scheme, but maybe that is too
>>>>> aggressive.
>>>>>
>>>>> C
>>>>>
>>>>> On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <ne...@gmail.com>wrote:
>>>>>
>>>>>> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
>>>>>> simplest implementation which resolves a hostname to multiple IPs.
>>>>>>
>>>>>> Whatever method you use to map host names to IPs, the problem is that
>>>>>> the zookeeper client code will always cache the IPs. So to be able to
>>>>>> swap out a machine, all clients would have to be restarted, which if
>>>>>> you have 100s of clients, is a major pain. If you want to move the
>>>>>> entire cluster to new machines, this becomes even harder.
>>>>>>
>>>>>> I don't see why re-resolving host names to IPs in the reconnect logic
>>>>>> is a problem for zookeeper, since you shuffle the list of IPs anyways.
>>>>>>
>>>>>> Thanks,
>>>>>> Neha
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org>
>>>>>> wrote:
>>>>>>> You can't sensibly round robin within the client code if you re-resolve
>>>>>> on
>>>>>>> every reconnect, if you're using dns rr. If that's your goal you'd want a
>>>>>>> list of dns alias names and re-resolve each hostname when you hit it on
>>>>>>> reconnect. But that will break people using dns rr.
>>>>>>> You can look into writing a pluggable reconnect logic into the zk client,
>>>>>>> that's what would be required to do this but at the end of the day you'll
>>>>>>> have to give your users special clients to make that work.
>>>>>>>
>>>>>>> C
>>>>>>>  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> I was reading through the client code and saw that zookeeper client
>>>>>>>> caches the server IPs during startup and maintains it for the rest of
>>>>>>>> its lifetime. If we go with the DNS RR approach or a load balancer
>>>>>>>> approach, and later swap out a server with a new one ( with a new IP
>>>>>>>> ), all clients would have to be restarted to be able to "forget" the
>>>>>>>> old IP and see the new one. That doesn't look like a clean approach to
>>>>>>>> such upgrades. One way of getting around this problem, is adding the
>>>>>>>> resolution of host names to IPs in the "reconnect" logic in addition
>>>>>>>> to the constructor. So when such upgrades happen and the client
>>>>>>>> reconnects, it will see the new list of IPs, and wouldn't require to
>>>>>>>> be restarted.
>>>>>>>>
>>>>>>>> Does this approach sound good or am I missing something here ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Neha
>>>>>>>>
>>>>>>>> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
>>>>>>>> wrote:
>>>>>>>>> DNS RR is good. I had good experiences using that for my client
>>>>>>>>> configs for exactly the reasons you are listing.
>>>>>>>>>
>>>>>>>>> On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
>>>>>> neha.narkhede@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>> Thanks for the responses!
>>>>>>>>>>
>>>>>>>>>>>> How are your clients configured to find the zks now?
>>>>>>>>>>
>>>>>>>>>> Our clients currently use the list of hostnames and ports that
>>>>>>>>>> comprise the zookeeper cluster. For example,
>>>>>>>>>> zoo1:port1,zoo2:port2,zoo3:port3
>>>>>>>>>>
>>>>>>>>>>>>> - switch DNS,
>>>>>>>>>>> - wait for caches to die,
>>>>>>>>>>
>>>>>>>>>> This is something we thought about however, if I understand it
>>>>>>>>>> correctly, doesn't JVM cache DNS entries forever until it is
>>>>>> restarted
>>>>>>>>>> ? We haven't specifically turned DNS caching off on our clients. So
>>>>>>>>>> this solution would require us to restart the clients to see the new
>>>>>>>>>> list of zookeeper hosts.
>>>>>>>>>>
>>>>>>>>>> Another thought is to use DNS RR and have the client zk url have one
>>>>>>>>>> name that resolves to and returns a list of IPs to the zookeeper
>>>>>>>>>> client. This has the advantage of being able to perform hardware
>>>>>>>>>> migration without changing the client connection url, in the future.
>>>>>>>>>> Do people have thoughts about using a DNS RR ?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Neha
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> In particular, aren't you using DNS names?  If you are, then you can
>>>>>>>>>>>
>>>>>>>>>>> - expand the quorum with the new hardware on new IP addresses,
>>>>>>>>>>> - switch DNS,
>>>>>>>>>>> - wait for caches to die,
>>>>>>>>>>> - restart applications without reconfig or otherwise force new
>>>>>>>> connections,
>>>>>>>>>>> - decrease quorum size again
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
>>>>>> camille@apache.org
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How are your clients configured to find the zks now? How many
>>>>>> clients
>>>>>>>> do
>>>>>>>>>>>> you have?
>>>>>>>>>>>>
>>>>>>>>>>>> From my phone
>>>>>>>>>>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> As part of upgrading to Zookeeper 3.3.4, we also have to migrate
>>>>>> our
>>>>>>>>>>>>> zookeeper cluster to new hardware. I'm trying to figure out the
>>>>>> best
>>>>>>>>>>>>> strategy to achieve that with no downtime.
>>>>>>>>>>>>> Here are some possible solutions I see at the moment, I could
>>>>>> have
>>>>>>>>>>>>> missed a few though -
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Swap each machine out with a new machine, but with the same
>>>>>>>> host/IP.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pros: No client side config needs to be changed.
>>>>>>>>>>>>> Cons: Relatively tedious task for Operations
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Add new machines, with different host/IPs to the existing
>>>>>>>> cluster,
>>>>>>>>>>>>> and remove the older machines, taking care to maintain the
>>>>>> quorum at
>>>>>>>>>>>>> all times
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pros: Easier for Operations
>>>>>>>>>>>>> Cons: Client side configs need to be changed and clients need to
>>>>>> be
>>>>>>>>>>>>> restarted/bounced. Another problem is having a large quorum for
>>>>>>>>>>>>> sometime (potentially 9 nodes).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3. Hide the new cluster behind either a Hardware load balancer
>>>>>> or a
>>>>>>>>>>>>> DNS server resolving to all host ips.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pros: Makes it easier to move hardware around in the future
>>>>>>>>>>>>> Cons: Possible timeout issues with load balancers messing with
>>>>>>>>>>>>> zookeeper functionality or performance
>>>>>>>>>>>>>
>>>>>>>>>>>>> Read this and found it helpful -
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>>>>>>>>>>>>> But would like to hear from the authors and the users who might
>>>>>> have
>>>>>>>>>>>>> tried this in a real production setup.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm very interested in finding a long term solution for masking
>>>>>> the
>>>>>>>>>>>>> zookeeper host names. Any inputs here are appreciated !
>>>>>>>>>>>>>
>>>>>>>>>>>>> In addition to this, it will also be great to know what people
>>>>>> think
>>>>>>>>>>>>> about options 1 and 2, as a solution for hardware changes in
>>>>>>>>>>>>> Zookeeper.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Neha
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Neha Narkhede <ne...@gmail.com>.
Patrick,

Looks like https://issues.apache.org/jira/browse/ZOOKEEPER-1356 is a
duplicate of 338 ? If yes, then I'll mark it to reflect the same.

Thanks,
Neha

On Mon, Jan 9, 2012 at 5:36 PM, Patrick Hunt <ph...@apache.org> wrote:
> dup of https://issues.apache.org/jira/browse/ZOOKEEPER-338 ?
>
> Patrick
>
> On Mon, Jan 9, 2012 at 3:17 PM, Ted Dunning <te...@gmail.com> wrote:
>> Neha
>>
>> Filing a jira is a great way to further the discussion.
>>
>> Sent from my iPhone
>>
>> On Jan 9, 2012, at 15:33, Neha Narkhede <ne...@gmail.com> wrote:
>>
>>>>> If you just have machine names in a list that you pass in, then yes, we
>>> could re-resolve on every reconnect and you could just re-alias that name
>>> to a new IP. But you'll have to put in logic that will do that but not
>>> break people using DNS RR.
>>>
>>> Having a list of machine names that can be changed to point to new IPs
>>> seems reasonable too. To be able to do the upgrade without having to
>>> restart all clients, besides turning off DNS caching in the JVM, we
>>> still have to solve the problem of zookeeper client caching the IPs in
>>> code. Having 2 levels of DNS caching, one in the JVM and one in code
>>> (which cannot be turned off) doesn't look like a good idea. Unless I'm
>>> missing the purpose of such IP caching in zookeeper ?
>>>
>>>>> I realize that moving machines is difficult when you have lots of clients.
>>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>>> machine move given a cluster of that complexity, though
>>>
>>> Its not like it can't be done, it definitely has quite some
>>> operational overhead. We are trying to brainstorm various approaches
>>> and come up with one that will involve the least overhead on such
>>> upgrades going forward.
>>>
>>> Having said that, seems like re-resolving host names in reconnect
>>> doesn't look like a bad idea, provided it doesn't break the DNS RR use
>>> case. If that sounds good, can I go ahead a file a JIRA for this ?
>>>
>>> Thanks,
>>> Neha
>>>
>>> On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <ca...@apache.org> wrote:
>>>> We don't shuffle IPs after the initial resolution of IP addresses.
>>>>
>>>> In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
>>>> robin through them trying to connect. If you re-resolve on every
>>>> round-robin, you have to put in logic to know which ones have changed and
>>>> somehow maintain that shuffle order or you aren't doing a fair back end
>>>> round robin, which people using the ZK client against DNS RR are relying on
>>>> today.
>>>>
>>>> If you just have machine names in a list that you pass in, then yes, we
>>>> could re-resolve on every reconnect and you could just re-alias that name
>>>> to a new IP. But you'll have to put in logic that will do that but not
>>>> break people using DNS RR.
>>>>
>>>> I realize that moving machines is difficult when you have lots of clients.
>>>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>>>> machine move given a cluster of that complexity, though. I also think that
>>>> if we're going to be putting special cases like this in we might just want
>>>> to go all the way to a pluggable reconnection scheme, but maybe that is too
>>>> aggressive.
>>>>
>>>> C
>>>>
>>>> On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <ne...@gmail.com>wrote:
>>>>
>>>>> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
>>>>> simplest implementation which resolves a hostname to multiple IPs.
>>>>>
>>>>> Whatever method you use to map host names to IPs, the problem is that
>>>>> the zookeeper client code will always cache the IPs. So to be able to
>>>>> swap out a machine, all clients would have to be restarted, which if
>>>>> you have 100s of clients, is a major pain. If you want to move the
>>>>> entire cluster to new machines, this becomes even harder.
>>>>>
>>>>> I don't see why re-resolving host names to IPs in the reconnect logic
>>>>> is a problem for zookeeper, since you shuffle the list of IPs anyways.
>>>>>
>>>>> Thanks,
>>>>> Neha
>>>>>
>>>>>
>>>>> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org>
>>>>> wrote:
>>>>>> You can't sensibly round robin within the client code if you re-resolve
>>>>> on
>>>>>> every reconnect, if you're using dns rr. If that's your goal you'd want a
>>>>>> list of dns alias names and re-resolve each hostname when you hit it on
>>>>>> reconnect. But that will break people using dns rr.
>>>>>> You can look into writing a pluggable reconnect logic into the zk client,
>>>>>> that's what would be required to do this but at the end of the day you'll
>>>>>> have to give your users special clients to make that work.
>>>>>>
>>>>>> C
>>>>>>  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> I was reading through the client code and saw that zookeeper client
>>>>>>> caches the server IPs during startup and maintains it for the rest of
>>>>>>> its lifetime. If we go with the DNS RR approach or a load balancer
>>>>>>> approach, and later swap out a server with a new one ( with a new IP
>>>>>>> ), all clients would have to be restarted to be able to "forget" the
>>>>>>> old IP and see the new one. That doesn't look like a clean approach to
>>>>>>> such upgrades. One way of getting around this problem, is adding the
>>>>>>> resolution of host names to IPs in the "reconnect" logic in addition
>>>>>>> to the constructor. So when such upgrades happen and the client
>>>>>>> reconnects, it will see the new list of IPs, and wouldn't require to
>>>>>>> be restarted.
>>>>>>>
>>>>>>> Does this approach sound good or am I missing something here ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Neha
>>>>>>>
>>>>>>> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
>>>>>>> wrote:
>>>>>>>> DNS RR is good. I had good experiences using that for my client
>>>>>>>> configs for exactly the reasons you are listing.
>>>>>>>>
>>>>>>>> On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
>>>>> neha.narkhede@gmail.com>
>>>>>>> wrote:
>>>>>>>>> Thanks for the responses!
>>>>>>>>>
>>>>>>>>>>> How are your clients configured to find the zks now?
>>>>>>>>>
>>>>>>>>> Our clients currently use the list of hostnames and ports that
>>>>>>>>> comprise the zookeeper cluster. For example,
>>>>>>>>> zoo1:port1,zoo2:port2,zoo3:port3
>>>>>>>>>
>>>>>>>>>>>> - switch DNS,
>>>>>>>>>> - wait for caches to die,
>>>>>>>>>
>>>>>>>>> This is something we thought about however, if I understand it
>>>>>>>>> correctly, doesn't JVM cache DNS entries forever until it is
>>>>> restarted
>>>>>>>>> ? We haven't specifically turned DNS caching off on our clients. So
>>>>>>>>> this solution would require us to restart the clients to see the new
>>>>>>>>> list of zookeeper hosts.
>>>>>>>>>
>>>>>>>>> Another thought is to use DNS RR and have the client zk url have one
>>>>>>>>> name that resolves to and returns a list of IPs to the zookeeper
>>>>>>>>> client. This has the advantage of being able to perform hardware
>>>>>>>>> migration without changing the client connection url, in the future.
>>>>>>>>> Do people have thoughts about using a DNS RR ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Neha
>>>>>>>>>
>>>>>>>>> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> In particular, aren't you using DNS names?  If you are, then you can
>>>>>>>>>>
>>>>>>>>>> - expand the quorum with the new hardware on new IP addresses,
>>>>>>>>>> - switch DNS,
>>>>>>>>>> - wait for caches to die,
>>>>>>>>>> - restart applications without reconfig or otherwise force new
>>>>>>> connections,
>>>>>>>>>> - decrease quorum size again
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
>>>>> camille@apache.org
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> How are your clients configured to find the zks now? How many
>>>>> clients
>>>>>>> do
>>>>>>>>>>> you have?
>>>>>>>>>>>
>>>>>>>>>>> From my phone
>>>>>>>>>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> As part of upgrading to Zookeeper 3.3.4, we also have to migrate
>>>>> our
>>>>>>>>>>>> zookeeper cluster to new hardware. I'm trying to figure out the
>>>>> best
>>>>>>>>>>>> strategy to achieve that with no downtime.
>>>>>>>>>>>> Here are some possible solutions I see at the moment, I could
>>>>> have
>>>>>>>>>>>> missed a few though -
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Swap each machine out with a new machine, but with the same
>>>>>>> host/IP.
>>>>>>>>>>>>
>>>>>>>>>>>> Pros: No client side config needs to be changed.
>>>>>>>>>>>> Cons: Relatively tedious task for Operations
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Add new machines, with different host/IPs to the existing
>>>>>>> cluster,
>>>>>>>>>>>> and remove the older machines, taking care to maintain the
>>>>> quorum at
>>>>>>>>>>>> all times
>>>>>>>>>>>>
>>>>>>>>>>>> Pros: Easier for Operations
>>>>>>>>>>>> Cons: Client side configs need to be changed and clients need to
>>>>> be
>>>>>>>>>>>> restarted/bounced. Another problem is having a large quorum for
>>>>>>>>>>>> sometime (potentially 9 nodes).
>>>>>>>>>>>>
>>>>>>>>>>>> 3. Hide the new cluster behind either a Hardware load balancer
>>>>> or a
>>>>>>>>>>>> DNS server resolving to all host ips.
>>>>>>>>>>>>
>>>>>>>>>>>> Pros: Makes it easier to move hardware around in the future
>>>>>>>>>>>> Cons: Possible timeout issues with load balancers messing with
>>>>>>>>>>>> zookeeper functionality or performance
>>>>>>>>>>>>
>>>>>>>>>>>> Read this and found it helpful -
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>>>>>>>>>>>> But would like to hear from the authors and the users who might
>>>>> have
>>>>>>>>>>>> tried this in a real production setup.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm very interested in finding a long term solution for masking
>>>>> the
>>>>>>>>>>>> zookeeper host names. Any inputs here are appreciated !
>>>>>>>>>>>>
>>>>>>>>>>>> In addition to this, it will also be great to know what people
>>>>> think
>>>>>>>>>>>> about options 1 and 2, as a solution for hardware changes in
>>>>>>>>>>>> Zookeeper.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Neha
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Patrick Hunt <ph...@apache.org>.
dup of https://issues.apache.org/jira/browse/ZOOKEEPER-338 ?

Patrick

On Mon, Jan 9, 2012 at 3:17 PM, Ted Dunning <te...@gmail.com> wrote:
> Neha
>
> Filing a jira is a great way to further the discussion.
>
> Sent from my iPhone
>
> On Jan 9, 2012, at 15:33, Neha Narkhede <ne...@gmail.com> wrote:
>
>>>> If you just have machine names in a list that you pass in, then yes, we
>> could re-resolve on every reconnect and you could just re-alias that name
>> to a new IP. But you'll have to put in logic that will do that but not
>> break people using DNS RR.
>>
>> Having a list of machine names that can be changed to point to new IPs
>> seems reasonable too. To be able to do the upgrade without having to
>> restart all clients, besides turning off DNS caching in the JVM, we
>> still have to solve the problem of zookeeper client caching the IPs in
>> code. Having 2 levels of DNS caching, one in the JVM and one in code
>> (which cannot be turned off) doesn't look like a good idea. Unless I'm
>> missing the purpose of such IP caching in zookeeper ?
>>
>>>> I realize that moving machines is difficult when you have lots of clients.
>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>> machine move given a cluster of that complexity, though
>>
>> Its not like it can't be done, it definitely has quite some
>> operational overhead. We are trying to brainstorm various approaches
>> and come up with one that will involve the least overhead on such
>> upgrades going forward.
>>
>> Having said that, seems like re-resolving host names in reconnect
>> doesn't look like a bad idea, provided it doesn't break the DNS RR use
>> case. If that sounds good, can I go ahead a file a JIRA for this ?
>>
>> Thanks,
>> Neha
>>
>> On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <ca...@apache.org> wrote:
>>> We don't shuffle IPs after the initial resolution of IP addresses.
>>>
>>> In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
>>> robin through them trying to connect. If you re-resolve on every
>>> round-robin, you have to put in logic to know which ones have changed and
>>> somehow maintain that shuffle order or you aren't doing a fair back end
>>> round robin, which people using the ZK client against DNS RR are relying on
>>> today.
>>>
>>> If you just have machine names in a list that you pass in, then yes, we
>>> could re-resolve on every reconnect and you could just re-alias that name
>>> to a new IP. But you'll have to put in logic that will do that but not
>>> break people using DNS RR.
>>>
>>> I realize that moving machines is difficult when you have lots of clients.
>>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>>> machine move given a cluster of that complexity, though. I also think that
>>> if we're going to be putting special cases like this in we might just want
>>> to go all the way to a pluggable reconnection scheme, but maybe that is too
>>> aggressive.
>>>
>>> C
>>>
>>> On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <ne...@gmail.com>wrote:
>>>
>>>> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
>>>> simplest implementation which resolves a hostname to multiple IPs.
>>>>
>>>> Whatever method you use to map host names to IPs, the problem is that
>>>> the zookeeper client code will always cache the IPs. So to be able to
>>>> swap out a machine, all clients would have to be restarted, which if
>>>> you have 100s of clients, is a major pain. If you want to move the
>>>> entire cluster to new machines, this becomes even harder.
>>>>
>>>> I don't see why re-resolving host names to IPs in the reconnect logic
>>>> is a problem for zookeeper, since you shuffle the list of IPs anyways.
>>>>
>>>> Thanks,
>>>> Neha
>>>>
>>>>
>>>> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org>
>>>> wrote:
>>>>> You can't sensibly round robin within the client code if you re-resolve
>>>> on
>>>>> every reconnect, if you're using dns rr. If that's your goal you'd want a
>>>>> list of dns alias names and re-resolve each hostname when you hit it on
>>>>> reconnect. But that will break people using dns rr.
>>>>> You can look into writing a pluggable reconnect logic into the zk client,
>>>>> that's what would be required to do this but at the end of the day you'll
>>>>> have to give your users special clients to make that work.
>>>>>
>>>>> C
>>>>>  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com>
>>>> wrote:
>>>>>
>>>>>> I was reading through the client code and saw that zookeeper client
>>>>>> caches the server IPs during startup and maintains it for the rest of
>>>>>> its lifetime. If we go with the DNS RR approach or a load balancer
>>>>>> approach, and later swap out a server with a new one ( with a new IP
>>>>>> ), all clients would have to be restarted to be able to "forget" the
>>>>>> old IP and see the new one. That doesn't look like a clean approach to
>>>>>> such upgrades. One way of getting around this problem, is adding the
>>>>>> resolution of host names to IPs in the "reconnect" logic in addition
>>>>>> to the constructor. So when such upgrades happen and the client
>>>>>> reconnects, it will see the new list of IPs, and wouldn't require to
>>>>>> be restarted.
>>>>>>
>>>>>> Does this approach sound good or am I missing something here ?
>>>>>>
>>>>>> Thanks,
>>>>>> Neha
>>>>>>
>>>>>> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
>>>>>> wrote:
>>>>>>> DNS RR is good. I had good experiences using that for my client
>>>>>>> configs for exactly the reasons you are listing.
>>>>>>>
>>>>>>> On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
>>>> neha.narkhede@gmail.com>
>>>>>> wrote:
>>>>>>>> Thanks for the responses!
>>>>>>>>
>>>>>>>>>> How are your clients configured to find the zks now?
>>>>>>>>
>>>>>>>> Our clients currently use the list of hostnames and ports that
>>>>>>>> comprise the zookeeper cluster. For example,
>>>>>>>> zoo1:port1,zoo2:port2,zoo3:port3
>>>>>>>>
>>>>>>>>>>> - switch DNS,
>>>>>>>>> - wait for caches to die,
>>>>>>>>
>>>>>>>> This is something we thought about however, if I understand it
>>>>>>>> correctly, doesn't JVM cache DNS entries forever until it is
>>>> restarted
>>>>>>>> ? We haven't specifically turned DNS caching off on our clients. So
>>>>>>>> this solution would require us to restart the clients to see the new
>>>>>>>> list of zookeeper hosts.
>>>>>>>>
>>>>>>>> Another thought is to use DNS RR and have the client zk url have one
>>>>>>>> name that resolves to and returns a list of IPs to the zookeeper
>>>>>>>> client. This has the advantage of being able to perform hardware
>>>>>>>> migration without changing the client connection url, in the future.
>>>>>>>> Do people have thoughts about using a DNS RR ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Neha
>>>>>>>>
>>>>>>>> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
>>>>>> wrote:
>>>>>>>>> In particular, aren't you using DNS names?  If you are, then you can
>>>>>>>>>
>>>>>>>>> - expand the quorum with the new hardware on new IP addresses,
>>>>>>>>> - switch DNS,
>>>>>>>>> - wait for caches to die,
>>>>>>>>> - restart applications without reconfig or otherwise force new
>>>>>> connections,
>>>>>>>>> - decrease quorum size again
>>>>>>>>>
>>>>>>>>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
>>>> camille@apache.org
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How are your clients configured to find the zks now? How many
>>>> clients
>>>>>> do
>>>>>>>>>> you have?
>>>>>>>>>>
>>>>>>>>>> From my phone
>>>>>>>>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> As part of upgrading to Zookeeper 3.3.4, we also have to migrate
>>>> our
>>>>>>>>>>> zookeeper cluster to new hardware. I'm trying to figure out the
>>>> best
>>>>>>>>>>> strategy to achieve that with no downtime.
>>>>>>>>>>> Here are some possible solutions I see at the moment, I could
>>>> have
>>>>>>>>>>> missed a few though -
>>>>>>>>>>>
>>>>>>>>>>> 1. Swap each machine out with a new machine, but with the same
>>>>>> host/IP.
>>>>>>>>>>>
>>>>>>>>>>> Pros: No client side config needs to be changed.
>>>>>>>>>>> Cons: Relatively tedious task for Operations
>>>>>>>>>>>
>>>>>>>>>>> 2. Add new machines, with different host/IPs to the existing
>>>>>> cluster,
>>>>>>>>>>> and remove the older machines, taking care to maintain the
>>>> quorum at
>>>>>>>>>>> all times
>>>>>>>>>>>
>>>>>>>>>>> Pros: Easier for Operations
>>>>>>>>>>> Cons: Client side configs need to be changed and clients need to
>>>> be
>>>>>>>>>>> restarted/bounced. Another problem is having a large quorum for
>>>>>>>>>>> sometime (potentially 9 nodes).
>>>>>>>>>>>
>>>>>>>>>>> 3. Hide the new cluster behind either a Hardware load balancer
>>>> or a
>>>>>>>>>>> DNS server resolving to all host ips.
>>>>>>>>>>>
>>>>>>>>>>> Pros: Makes it easier to move hardware around in the future
>>>>>>>>>>> Cons: Possible timeout issues with load balancers messing with
>>>>>>>>>>> zookeeper functionality or performance
>>>>>>>>>>>
>>>>>>>>>>> Read this and found it helpful -
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>>>>>>>>>>> But would like to hear from the authors and the users who might
>>>> have
>>>>>>>>>>> tried this in a real production setup.
>>>>>>>>>>>
>>>>>>>>>>> I'm very interested in finding a long term solution for masking
>>>> the
>>>>>>>>>>> zookeeper host names. Any inputs here are appreciated !
>>>>>>>>>>>
>>>>>>>>>>> In addition to this, it will also be great to know what people
>>>> think
>>>>>>>>>>> about options 1 and 2, as a solution for hardware changes in
>>>>>>>>>>> Zookeeper.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Neha
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Ted Dunning <te...@gmail.com>.
Neha

Filing a jira is a great way to further the discussion.  

Sent from my iPhone

On Jan 9, 2012, at 15:33, Neha Narkhede <ne...@gmail.com> wrote:

>>> If you just have machine names in a list that you pass in, then yes, we
> could re-resolve on every reconnect and you could just re-alias that name
> to a new IP. But you'll have to put in logic that will do that but not
> break people using DNS RR.
> 
> Having a list of machine names that can be changed to point to new IPs
> seems reasonable too. To be able to do the upgrade without having to
> restart all clients, besides turning off DNS caching in the JVM, we
> still have to solve the problem of zookeeper client caching the IPs in
> code. Having 2 levels of DNS caching, one in the JVM and one in code
> (which cannot be turned off) doesn't look like a good idea. Unless I'm
> missing the purpose of such IP caching in zookeeper ?
> 
>>> I realize that moving machines is difficult when you have lots of clients.
> I'm a bit surprised your admins can't maintain machine IP addresses on a
> machine move given a cluster of that complexity, though
> 
> Its not like it can't be done, it definitely has quite some
> operational overhead. We are trying to brainstorm various approaches
> and come up with one that will involve the least overhead on such
> upgrades going forward.
> 
> Having said that, seems like re-resolving host names in reconnect
> doesn't look like a bad idea, provided it doesn't break the DNS RR use
> case. If that sounds good, can I go ahead a file a JIRA for this ?
> 
> Thanks,
> Neha
> 
> On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <ca...@apache.org> wrote:
>> We don't shuffle IPs after the initial resolution of IP addresses.
>> 
>> In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
>> robin through them trying to connect. If you re-resolve on every
>> round-robin, you have to put in logic to know which ones have changed and
>> somehow maintain that shuffle order or you aren't doing a fair back end
>> round robin, which people using the ZK client against DNS RR are relying on
>> today.
>> 
>> If you just have machine names in a list that you pass in, then yes, we
>> could re-resolve on every reconnect and you could just re-alias that name
>> to a new IP. But you'll have to put in logic that will do that but not
>> break people using DNS RR.
>> 
>> I realize that moving machines is difficult when you have lots of clients.
>> I'm a bit surprised your admins can't maintain machine IP addresses on a
>> machine move given a cluster of that complexity, though. I also think that
>> if we're going to be putting special cases like this in we might just want
>> to go all the way to a pluggable reconnection scheme, but maybe that is too
>> aggressive.
>> 
>> C
>> 
>> On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <ne...@gmail.com>wrote:
>> 
>>> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
>>> simplest implementation which resolves a hostname to multiple IPs.
>>> 
>>> Whatever method you use to map host names to IPs, the problem is that
>>> the zookeeper client code will always cache the IPs. So to be able to
>>> swap out a machine, all clients would have to be restarted, which if
>>> you have 100s of clients, is a major pain. If you want to move the
>>> entire cluster to new machines, this becomes even harder.
>>> 
>>> I don't see why re-resolving host names to IPs in the reconnect logic
>>> is a problem for zookeeper, since you shuffle the list of IPs anyways.
>>> 
>>> Thanks,
>>> Neha
>>> 
>>> 
>>> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org>
>>> wrote:
>>>> You can't sensibly round robin within the client code if you re-resolve
>>> on
>>>> every reconnect, if you're using dns rr. If that's your goal you'd want a
>>>> list of dns alias names and re-resolve each hostname when you hit it on
>>>> reconnect. But that will break people using dns rr.
>>>> You can look into writing a pluggable reconnect logic into the zk client,
>>>> that's what would be required to do this but at the end of the day you'll
>>>> have to give your users special clients to make that work.
>>>> 
>>>> C
>>>>  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com>
>>> wrote:
>>>> 
>>>>> I was reading through the client code and saw that zookeeper client
>>>>> caches the server IPs during startup and maintains it for the rest of
>>>>> its lifetime. If we go with the DNS RR approach or a load balancer
>>>>> approach, and later swap out a server with a new one ( with a new IP
>>>>> ), all clients would have to be restarted to be able to "forget" the
>>>>> old IP and see the new one. That doesn't look like a clean approach to
>>>>> such upgrades. One way of getting around this problem, is adding the
>>>>> resolution of host names to IPs in the "reconnect" logic in addition
>>>>> to the constructor. So when such upgrades happen and the client
>>>>> reconnects, it will see the new list of IPs, and wouldn't require to
>>>>> be restarted.
>>>>> 
>>>>> Does this approach sound good or am I missing something here ?
>>>>> 
>>>>> Thanks,
>>>>> Neha
>>>>> 
>>>>> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
>>>>> wrote:
>>>>>> DNS RR is good. I had good experiences using that for my client
>>>>>> configs for exactly the reasons you are listing.
>>>>>> 
>>>>>> On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
>>> neha.narkhede@gmail.com>
>>>>> wrote:
>>>>>>> Thanks for the responses!
>>>>>>> 
>>>>>>>>> How are your clients configured to find the zks now?
>>>>>>> 
>>>>>>> Our clients currently use the list of hostnames and ports that
>>>>>>> comprise the zookeeper cluster. For example,
>>>>>>> zoo1:port1,zoo2:port2,zoo3:port3
>>>>>>> 
>>>>>>>>>> - switch DNS,
>>>>>>>> - wait for caches to die,
>>>>>>> 
>>>>>>> This is something we thought about however, if I understand it
>>>>>>> correctly, doesn't JVM cache DNS entries forever until it is
>>> restarted
>>>>>>> ? We haven't specifically turned DNS caching off on our clients. So
>>>>>>> this solution would require us to restart the clients to see the new
>>>>>>> list of zookeeper hosts.
>>>>>>> 
>>>>>>> Another thought is to use DNS RR and have the client zk url have one
>>>>>>> name that resolves to and returns a list of IPs to the zookeeper
>>>>>>> client. This has the advantage of being able to perform hardware
>>>>>>> migration without changing the client connection url, in the future.
>>>>>>> Do people have thoughts about using a DNS RR ?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Neha
>>>>>>> 
>>>>>>> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
>>>>> wrote:
>>>>>>>> In particular, aren't you using DNS names?  If you are, then you can
>>>>>>>> 
>>>>>>>> - expand the quorum with the new hardware on new IP addresses,
>>>>>>>> - switch DNS,
>>>>>>>> - wait for caches to die,
>>>>>>>> - restart applications without reconfig or otherwise force new
>>>>> connections,
>>>>>>>> - decrease quorum size again
>>>>>>>> 
>>>>>>>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
>>> camille@apache.org
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> How are your clients configured to find the zks now? How many
>>> clients
>>>>> do
>>>>>>>>> you have?
>>>>>>>>> 
>>>>>>>>> From my phone
>>>>>>>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> As part of upgrading to Zookeeper 3.3.4, we also have to migrate
>>> our
>>>>>>>>>> zookeeper cluster to new hardware. I'm trying to figure out the
>>> best
>>>>>>>>>> strategy to achieve that with no downtime.
>>>>>>>>>> Here are some possible solutions I see at the moment, I could
>>> have
>>>>>>>>>> missed a few though -
>>>>>>>>>> 
>>>>>>>>>> 1. Swap each machine out with a new machine, but with the same
>>>>> host/IP.
>>>>>>>>>> 
>>>>>>>>>> Pros: No client side config needs to be changed.
>>>>>>>>>> Cons: Relatively tedious task for Operations
>>>>>>>>>> 
>>>>>>>>>> 2. Add new machines, with different host/IPs to the existing
>>>>> cluster,
>>>>>>>>>> and remove the older machines, taking care to maintain the
>>> quorum at
>>>>>>>>>> all times
>>>>>>>>>> 
>>>>>>>>>> Pros: Easier for Operations
>>>>>>>>>> Cons: Client side configs need to be changed and clients need to
>>> be
>>>>>>>>>> restarted/bounced. Another problem is having a large quorum for
>>>>>>>>>> sometime (potentially 9 nodes).
>>>>>>>>>> 
>>>>>>>>>> 3. Hide the new cluster behind either a Hardware load balancer
>>> or a
>>>>>>>>>> DNS server resolving to all host ips.
>>>>>>>>>> 
>>>>>>>>>> Pros: Makes it easier to move hardware around in the future
>>>>>>>>>> Cons: Possible timeout issues with load balancers messing with
>>>>>>>>>> zookeeper functionality or performance
>>>>>>>>>> 
>>>>>>>>>> Read this and found it helpful -
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>>>>>>>>>> But would like to hear from the authors and the users who might
>>> have
>>>>>>>>>> tried this in a real production setup.
>>>>>>>>>> 
>>>>>>>>>> I'm very interested in finding a long term solution for masking
>>> the
>>>>>>>>>> zookeeper host names. Any inputs here are appreciated !
>>>>>>>>>> 
>>>>>>>>>> In addition to this, it will also be great to know what people
>>> think
>>>>>>>>>> about options 1 and 2, as a solution for hardware changes in
>>>>>>>>>> Zookeeper.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Neha
>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> 

RE: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Alexander Shraer <sh...@yahoo-inc.com>.
Hi Neha, Camille,

I wanted to share something we did as part of the dynamic membership change feature development (ZK-107) that seems very related to the discussion here and might solve the problem. 

When the membership changes, similarly to what you wrote below, clients sometimes need to move. Obviously this is the case if the server they are connected to is no longer in the cluster. The idea is that they should move in a way that minimizes unnecessary client migration (such as the one that would happen if you have every client re-shuffle the new server list) and yet leaves the system in a balance state (the number of clients connected to each server is the same in expectation).

The idea was to come up with a set of probabilistic rules that each client applies locally to see whether and where it should migrate. 

The resulting rules as well as the evaluation in Zookeeper are in the document attached to the jira (https://issues.apache.org/jira/browse/ZOOKEEPER-1355)
I'll provide a patch soon. (this is part of a larger paper about reconfiguration that is under submission).

In terms of implementation changes, I added a command "zk.updateServerList(hostlist);"
and an implantation of the probabilistic rules in StaticHostProvider. 

I wanted to get your feedback and hopefully this may solve the problem you're discussing here. 

Thanks a lot,
Alex


> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> Camille Fournier
> Sent: Monday, January 09, 2012 12:47 PM
> To: dev@zookeeper.apache.org
> Subject: Re: Performing no downtime hardware changes to a live
> zookeeper cluster
> 
> Sounds fine with me, probably should make it a flaggable option.
> 
> C
> 
> 
> On Mon, Jan 9, 2012 at 3:33 PM, Neha Narkhede
> <ne...@gmail.com>wrote:
> 
> > >> If you just have machine names in a list that you pass in, then
> yes, we
> > could re-resolve on every reconnect and you could just re-alias that
> name
> > to a new IP. But you'll have to put in logic that will do that but
> not
> > break people using DNS RR.
> >
> > Having a list of machine names that can be changed to point to new
> IPs
> > seems reasonable too. To be able to do the upgrade without having to
> > restart all clients, besides turning off DNS caching in the JVM, we
> > still have to solve the problem of zookeeper client caching the IPs
> in
> > code. Having 2 levels of DNS caching, one in the JVM and one in code
> > (which cannot be turned off) doesn't look like a good idea. Unless
> I'm
> > missing the purpose of such IP caching in zookeeper ?
> >
> > >> I realize that moving machines is difficult when you have lots of
> > clients.
> > I'm a bit surprised your admins can't maintain machine IP addresses
> on a
> > machine move given a cluster of that complexity, though
> >
> > Its not like it can't be done, it definitely has quite some
> > operational overhead. We are trying to brainstorm various approaches
> > and come up with one that will involve the least overhead on such
> > upgrades going forward.
> >
> > Having said that, seems like re-resolving host names in reconnect
> > doesn't look like a bad idea, provided it doesn't break the DNS RR
> use
> > case. If that sounds good, can I go ahead a file a JIRA for this ?
> >
> > Thanks,
> > Neha
> >
> > On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier
> <ca...@apache.org>
> > wrote:
> > > We don't shuffle IPs after the initial resolution of IP addresses.
> > >
> > > In DNS RR, you resolve to a list of IPs, shuffle these, and then we
> round
> > > robin through them trying to connect. If you re-resolve on every
> > > round-robin, you have to put in logic to know which ones have
> changed and
> > > somehow maintain that shuffle order or you aren't doing a fair back
> end
> > > round robin, which people using the ZK client against DNS RR are
> relying
> > on
> > > today.
> > >
> > > If you just have machine names in a list that you pass in, then
> yes, we
> > > could re-resolve on every reconnect and you could just re-alias
> that name
> > > to a new IP. But you'll have to put in logic that will do that but
> not
> > > break people using DNS RR.
> > >
> > > I realize that moving machines is difficult when you have lots of
> > clients.
> > > I'm a bit surprised your admins can't maintain machine IP addresses
> on a
> > > machine move given a cluster of that complexity, though. I also
> think
> > that
> > > if we're going to be putting special cases like this in we might
> just
> > want
> > > to go all the way to a pluggable reconnection scheme, but maybe
> that is
> > too
> > > aggressive.
> > >
> > > C
> > >
> > > On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede
> <neha.narkhede@gmail.com
> > >wrote:
> > >
> > >> Maybe I didn't express myself clearly. When I said DNS RR, I meant
> its
> > >> simplest implementation which resolves a hostname to multiple IPs.
> > >>
> > >> Whatever method you use to map host names to IPs, the problem is
> that
> > >> the zookeeper client code will always cache the IPs. So to be able
> to
> > >> swap out a machine, all clients would have to be restarted, which
> if
> > >> you have 100s of clients, is a major pain. If you want to move the
> > >> entire cluster to new machines, this becomes even harder.
> > >>
> > >> I don't see why re-resolving host names to IPs in the reconnect
> logic
> > >> is a problem for zookeeper, since you shuffle the list of IPs
> anyways.
> > >>
> > >> Thanks,
> > >> Neha
> > >>
> > >>
> > >> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier
> <ca...@apache.org>
> > >> wrote:
> > >> > You can't sensibly round robin within the client code if you
> > re-resolve
> > >> on
> > >> > every reconnect, if you're using dns rr. If that's your goal
> you'd
> > want a
> > >> > list of dns alias names and re-resolve each hostname when you
> hit it
> > on
> > >> > reconnect. But that will break people using dns rr.
> > >> > You can look into writing a pluggable reconnect logic into the
> zk
> > client,
> > >> > that's what would be required to do this but at the end of the
> day
> > you'll
> > >> > have to give your users special clients to make that work.
> > >> >
> > >> > C
> > >> >  On Jan 9, 2012 1:16 PM, "Neha Narkhede"
> <ne...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> I was reading through the client code and saw that zookeeper
> client
> > >> >> caches the server IPs during startup and maintains it for the
> rest of
> > >> >> its lifetime. If we go with the DNS RR approach or a load
> balancer
> > >> >> approach, and later swap out a server with a new one ( with a
> new IP
> > >> >> ), all clients would have to be restarted to be able to
> "forget" the
> > >> >> old IP and see the new one. That doesn't look like a clean
> approach
> > to
> > >> >> such upgrades. One way of getting around this problem, is
> adding the
> > >> >> resolution of host names to IPs in the "reconnect" logic in
> addition
> > >> >> to the constructor. So when such upgrades happen and the client
> > >> >> reconnects, it will see the new list of IPs, and wouldn't
> require to
> > >> >> be restarted.
> > >> >>
> > >> >> Does this approach sound good or am I missing something here ?
> > >> >>
> > >> >> Thanks,
> > >> >> Neha
> > >> >>
> > >> >> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <
> > camille@apache.org>
> > >> >> wrote:
> > >> >> > DNS RR is good. I had good experiences using that for my
> client
> > >> >> > configs for exactly the reasons you are listing.
> > >> >> >
> > >> >> > On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
> > >> neha.narkhede@gmail.com>
> > >> >> wrote:
> > >> >> >> Thanks for the responses!
> > >> >> >>
> > >> >> >>>> How are your clients configured to find the zks now?
> > >> >> >>
> > >> >> >> Our clients currently use the list of hostnames and ports
> that
> > >> >> >> comprise the zookeeper cluster. For example,
> > >> >> >> zoo1:port1,zoo2:port2,zoo3:port3
> > >> >> >>
> > >> >> >>>> > - switch DNS,
> > >> >> >>> - wait for caches to die,
> > >> >> >>
> > >> >> >> This is something we thought about however, if I understand
> it
> > >> >> >> correctly, doesn't JVM cache DNS entries forever until it is
> > >> restarted
> > >> >> >> ? We haven't specifically turned DNS caching off on our
> clients.
> > So
> > >> >> >> this solution would require us to restart the clients to see
> the
> > new
> > >> >> >> list of zookeeper hosts.
> > >> >> >>
> > >> >> >> Another thought is to use DNS RR and have the client zk url
> have
> > one
> > >> >> >> name that resolves to and returns a list of IPs to the
> zookeeper
> > >> >> >> client. This has the advantage of being able to perform
> hardware
> > >> >> >> migration without changing the client connection url, in the
> > future.
> > >> >> >> Do people have thoughts about using a DNS RR ?
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >> Neha
> > >> >> >>
> > >> >> >> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <
> > ted.dunning@gmail.com>
> > >> >> wrote:
> > >> >> >>> In particular, aren't you using DNS names?  If you are,
> then you
> > can
> > >> >> >>>
> > >> >> >>> - expand the quorum with the new hardware on new IP
> addresses,
> > >> >> >>> - switch DNS,
> > >> >> >>> - wait for caches to die,
> > >> >> >>> - restart applications without reconfig or otherwise force
> new
> > >> >> connections,
> > >> >> >>> - decrease quorum size again
> > >> >> >>>
> > >> >> >>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
> > >> camille@apache.org
> > >> >> >wrote:
> > >> >> >>>
> > >> >> >>>> How are your clients configured to find the zks now? How
> many
> > >> clients
> > >> >> do
> > >> >> >>>> you have?
> > >> >> >>>>
> > >> >> >>>> From my phone
> > >> >> >>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <
> > neha.narkhede@gmail.com>
> > >> >> wrote:
> > >> >> >>>>
> > >> >> >>>> > Hi,
> > >> >> >>>> >
> > >> >> >>>> > As part of upgrading to Zookeeper 3.3.4, we also have to
> > migrate
> > >> our
> > >> >> >>>> > zookeeper cluster to new hardware. I'm trying to figure
> out
> > the
> > >> best
> > >> >> >>>> > strategy to achieve that with no downtime.
> > >> >> >>>> > Here are some possible solutions I see at the moment, I
> could
> > >> have
> > >> >> >>>> > missed a few though -
> > >> >> >>>> >
> > >> >> >>>> > 1. Swap each machine out with a new machine, but with
> the same
> > >> >> host/IP.
> > >> >> >>>> >
> > >> >> >>>> > Pros: No client side config needs to be changed.
> > >> >> >>>> > Cons: Relatively tedious task for Operations
> > >> >> >>>> >
> > >> >> >>>> > 2. Add new machines, with different host/IPs to the
> existing
> > >> >> cluster,
> > >> >> >>>> > and remove the older machines, taking care to maintain
> the
> > >> quorum at
> > >> >> >>>> > all times
> > >> >> >>>> >
> > >> >> >>>> > Pros: Easier for Operations
> > >> >> >>>> > Cons: Client side configs need to be changed and clients
> need
> > to
> > >> be
> > >> >> >>>> > restarted/bounced. Another problem is having a large
> quorum
> > for
> > >> >> >>>> > sometime (potentially 9 nodes).
> > >> >> >>>> >
> > >> >> >>>> > 3. Hide the new cluster behind either a Hardware load
> balancer
> > >> or a
> > >> >> >>>> > DNS server resolving to all host ips.
> > >> >> >>>> >
> > >> >> >>>> > Pros: Makes it easier to move hardware around in the
> future
> > >> >> >>>> > Cons: Possible timeout issues with load balancers
> messing with
> > >> >> >>>> > zookeeper functionality or performance
> > >> >> >>>> >
> > >> >> >>>> > Read this and found it helpful -
> > >> >> >>>> >
> > >> >> >>>> >
> > >> >> >>>>
> > >> >>
> > >>
> >
> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+lis
> t:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
> > >> >> >>>> > But would like to hear from the authors and the users
> who
> > might
> > >> have
> > >> >> >>>> > tried this in a real production setup.
> > >> >> >>>> >
> > >> >> >>>> > I'm very interested in finding a long term solution for
> > masking
> > >> the
> > >> >> >>>> > zookeeper host names. Any inputs here are appreciated !
> > >> >> >>>> >
> > >> >> >>>> > In addition to this, it will also be great to know what
> people
> > >> think
> > >> >> >>>> > about options 1 and 2, as a solution for hardware
> changes in
> > >> >> >>>> > Zookeeper.
> > >> >> >>>> >
> > >> >> >>>> > Thanks,
> > >> >> >>>> > Neha
> > >> >> >>>> >
> > >> >> >>>>
> > >> >>
> > >>
> >

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Camille Fournier <ca...@apache.org>.
Sounds fine with me, probably should make it a flaggable option.

C


On Mon, Jan 9, 2012 at 3:33 PM, Neha Narkhede <ne...@gmail.com>wrote:

> >> If you just have machine names in a list that you pass in, then yes, we
> could re-resolve on every reconnect and you could just re-alias that name
> to a new IP. But you'll have to put in logic that will do that but not
> break people using DNS RR.
>
> Having a list of machine names that can be changed to point to new IPs
> seems reasonable too. To be able to do the upgrade without having to
> restart all clients, besides turning off DNS caching in the JVM, we
> still have to solve the problem of zookeeper client caching the IPs in
> code. Having 2 levels of DNS caching, one in the JVM and one in code
> (which cannot be turned off) doesn't look like a good idea. Unless I'm
> missing the purpose of such IP caching in zookeeper ?
>
> >> I realize that moving machines is difficult when you have lots of
> clients.
> I'm a bit surprised your admins can't maintain machine IP addresses on a
> machine move given a cluster of that complexity, though
>
> Its not like it can't be done, it definitely has quite some
> operational overhead. We are trying to brainstorm various approaches
> and come up with one that will involve the least overhead on such
> upgrades going forward.
>
> Having said that, seems like re-resolving host names in reconnect
> doesn't look like a bad idea, provided it doesn't break the DNS RR use
> case. If that sounds good, can I go ahead a file a JIRA for this ?
>
> Thanks,
> Neha
>
> On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <ca...@apache.org>
> wrote:
> > We don't shuffle IPs after the initial resolution of IP addresses.
> >
> > In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
> > robin through them trying to connect. If you re-resolve on every
> > round-robin, you have to put in logic to know which ones have changed and
> > somehow maintain that shuffle order or you aren't doing a fair back end
> > round robin, which people using the ZK client against DNS RR are relying
> on
> > today.
> >
> > If you just have machine names in a list that you pass in, then yes, we
> > could re-resolve on every reconnect and you could just re-alias that name
> > to a new IP. But you'll have to put in logic that will do that but not
> > break people using DNS RR.
> >
> > I realize that moving machines is difficult when you have lots of
> clients.
> > I'm a bit surprised your admins can't maintain machine IP addresses on a
> > machine move given a cluster of that complexity, though. I also think
> that
> > if we're going to be putting special cases like this in we might just
> want
> > to go all the way to a pluggable reconnection scheme, but maybe that is
> too
> > aggressive.
> >
> > C
> >
> > On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
> >
> >> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
> >> simplest implementation which resolves a hostname to multiple IPs.
> >>
> >> Whatever method you use to map host names to IPs, the problem is that
> >> the zookeeper client code will always cache the IPs. So to be able to
> >> swap out a machine, all clients would have to be restarted, which if
> >> you have 100s of clients, is a major pain. If you want to move the
> >> entire cluster to new machines, this becomes even harder.
> >>
> >> I don't see why re-resolving host names to IPs in the reconnect logic
> >> is a problem for zookeeper, since you shuffle the list of IPs anyways.
> >>
> >> Thanks,
> >> Neha
> >>
> >>
> >> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org>
> >> wrote:
> >> > You can't sensibly round robin within the client code if you
> re-resolve
> >> on
> >> > every reconnect, if you're using dns rr. If that's your goal you'd
> want a
> >> > list of dns alias names and re-resolve each hostname when you hit it
> on
> >> > reconnect. But that will break people using dns rr.
> >> > You can look into writing a pluggable reconnect logic into the zk
> client,
> >> > that's what would be required to do this but at the end of the day
> you'll
> >> > have to give your users special clients to make that work.
> >> >
> >> > C
> >> >  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com>
> >> wrote:
> >> >
> >> >> I was reading through the client code and saw that zookeeper client
> >> >> caches the server IPs during startup and maintains it for the rest of
> >> >> its lifetime. If we go with the DNS RR approach or a load balancer
> >> >> approach, and later swap out a server with a new one ( with a new IP
> >> >> ), all clients would have to be restarted to be able to "forget" the
> >> >> old IP and see the new one. That doesn't look like a clean approach
> to
> >> >> such upgrades. One way of getting around this problem, is adding the
> >> >> resolution of host names to IPs in the "reconnect" logic in addition
> >> >> to the constructor. So when such upgrades happen and the client
> >> >> reconnects, it will see the new list of IPs, and wouldn't require to
> >> >> be restarted.
> >> >>
> >> >> Does this approach sound good or am I missing something here ?
> >> >>
> >> >> Thanks,
> >> >> Neha
> >> >>
> >> >> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <
> camille@apache.org>
> >> >> wrote:
> >> >> > DNS RR is good. I had good experiences using that for my client
> >> >> > configs for exactly the reasons you are listing.
> >> >> >
> >> >> > On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
> >> neha.narkhede@gmail.com>
> >> >> wrote:
> >> >> >> Thanks for the responses!
> >> >> >>
> >> >> >>>> How are your clients configured to find the zks now?
> >> >> >>
> >> >> >> Our clients currently use the list of hostnames and ports that
> >> >> >> comprise the zookeeper cluster. For example,
> >> >> >> zoo1:port1,zoo2:port2,zoo3:port3
> >> >> >>
> >> >> >>>> > - switch DNS,
> >> >> >>> - wait for caches to die,
> >> >> >>
> >> >> >> This is something we thought about however, if I understand it
> >> >> >> correctly, doesn't JVM cache DNS entries forever until it is
> >> restarted
> >> >> >> ? We haven't specifically turned DNS caching off on our clients.
> So
> >> >> >> this solution would require us to restart the clients to see the
> new
> >> >> >> list of zookeeper hosts.
> >> >> >>
> >> >> >> Another thought is to use DNS RR and have the client zk url have
> one
> >> >> >> name that resolves to and returns a list of IPs to the zookeeper
> >> >> >> client. This has the advantage of being able to perform hardware
> >> >> >> migration without changing the client connection url, in the
> future.
> >> >> >> Do people have thoughts about using a DNS RR ?
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Neha
> >> >> >>
> >> >> >> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <
> ted.dunning@gmail.com>
> >> >> wrote:
> >> >> >>> In particular, aren't you using DNS names?  If you are, then you
> can
> >> >> >>>
> >> >> >>> - expand the quorum with the new hardware on new IP addresses,
> >> >> >>> - switch DNS,
> >> >> >>> - wait for caches to die,
> >> >> >>> - restart applications without reconfig or otherwise force new
> >> >> connections,
> >> >> >>> - decrease quorum size again
> >> >> >>>
> >> >> >>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
> >> camille@apache.org
> >> >> >wrote:
> >> >> >>>
> >> >> >>>> How are your clients configured to find the zks now? How many
> >> clients
> >> >> do
> >> >> >>>> you have?
> >> >> >>>>
> >> >> >>>> From my phone
> >> >> >>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <
> neha.narkhede@gmail.com>
> >> >> wrote:
> >> >> >>>>
> >> >> >>>> > Hi,
> >> >> >>>> >
> >> >> >>>> > As part of upgrading to Zookeeper 3.3.4, we also have to
> migrate
> >> our
> >> >> >>>> > zookeeper cluster to new hardware. I'm trying to figure out
> the
> >> best
> >> >> >>>> > strategy to achieve that with no downtime.
> >> >> >>>> > Here are some possible solutions I see at the moment, I could
> >> have
> >> >> >>>> > missed a few though -
> >> >> >>>> >
> >> >> >>>> > 1. Swap each machine out with a new machine, but with the same
> >> >> host/IP.
> >> >> >>>> >
> >> >> >>>> > Pros: No client side config needs to be changed.
> >> >> >>>> > Cons: Relatively tedious task for Operations
> >> >> >>>> >
> >> >> >>>> > 2. Add new machines, with different host/IPs to the existing
> >> >> cluster,
> >> >> >>>> > and remove the older machines, taking care to maintain the
> >> quorum at
> >> >> >>>> > all times
> >> >> >>>> >
> >> >> >>>> > Pros: Easier for Operations
> >> >> >>>> > Cons: Client side configs need to be changed and clients need
> to
> >> be
> >> >> >>>> > restarted/bounced. Another problem is having a large quorum
> for
> >> >> >>>> > sometime (potentially 9 nodes).
> >> >> >>>> >
> >> >> >>>> > 3. Hide the new cluster behind either a Hardware load balancer
> >> or a
> >> >> >>>> > DNS server resolving to all host ips.
> >> >> >>>> >
> >> >> >>>> > Pros: Makes it easier to move hardware around in the future
> >> >> >>>> > Cons: Possible timeout issues with load balancers messing with
> >> >> >>>> > zookeeper functionality or performance
> >> >> >>>> >
> >> >> >>>> > Read this and found it helpful -
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>>
> >> >>
> >>
> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
> >> >> >>>> > But would like to hear from the authors and the users who
> might
> >> have
> >> >> >>>> > tried this in a real production setup.
> >> >> >>>> >
> >> >> >>>> > I'm very interested in finding a long term solution for
> masking
> >> the
> >> >> >>>> > zookeeper host names. Any inputs here are appreciated !
> >> >> >>>> >
> >> >> >>>> > In addition to this, it will also be great to know what people
> >> think
> >> >> >>>> > about options 1 and 2, as a solution for hardware changes in
> >> >> >>>> > Zookeeper.
> >> >> >>>> >
> >> >> >>>> > Thanks,
> >> >> >>>> > Neha
> >> >> >>>> >
> >> >> >>>>
> >> >>
> >>
>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Neha Narkhede <ne...@gmail.com>.
>> If you just have machine names in a list that you pass in, then yes, we
could re-resolve on every reconnect and you could just re-alias that name
to a new IP. But you'll have to put in logic that will do that but not
break people using DNS RR.

Having a list of machine names that can be changed to point to new IPs
seems reasonable too. To be able to do the upgrade without having to
restart all clients, besides turning off DNS caching in the JVM, we
still have to solve the problem of zookeeper client caching the IPs in
code. Having 2 levels of DNS caching, one in the JVM and one in code
(which cannot be turned off) doesn't look like a good idea. Unless I'm
missing the purpose of such IP caching in zookeeper ?

>> I realize that moving machines is difficult when you have lots of clients.
I'm a bit surprised your admins can't maintain machine IP addresses on a
machine move given a cluster of that complexity, though

Its not like it can't be done, it definitely has quite some
operational overhead. We are trying to brainstorm various approaches
and come up with one that will involve the least overhead on such
upgrades going forward.

Having said that, seems like re-resolving host names in reconnect
doesn't look like a bad idea, provided it doesn't break the DNS RR use
case. If that sounds good, can I go ahead a file a JIRA for this ?

Thanks,
Neha

On Mon, Jan 9, 2012 at 11:04 AM, Camille Fournier <ca...@apache.org> wrote:
> We don't shuffle IPs after the initial resolution of IP addresses.
>
> In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
> robin through them trying to connect. If you re-resolve on every
> round-robin, you have to put in logic to know which ones have changed and
> somehow maintain that shuffle order or you aren't doing a fair back end
> round robin, which people using the ZK client against DNS RR are relying on
> today.
>
> If you just have machine names in a list that you pass in, then yes, we
> could re-resolve on every reconnect and you could just re-alias that name
> to a new IP. But you'll have to put in logic that will do that but not
> break people using DNS RR.
>
> I realize that moving machines is difficult when you have lots of clients.
> I'm a bit surprised your admins can't maintain machine IP addresses on a
> machine move given a cluster of that complexity, though. I also think that
> if we're going to be putting special cases like this in we might just want
> to go all the way to a pluggable reconnection scheme, but maybe that is too
> aggressive.
>
> C
>
> On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <ne...@gmail.com>wrote:
>
>> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
>> simplest implementation which resolves a hostname to multiple IPs.
>>
>> Whatever method you use to map host names to IPs, the problem is that
>> the zookeeper client code will always cache the IPs. So to be able to
>> swap out a machine, all clients would have to be restarted, which if
>> you have 100s of clients, is a major pain. If you want to move the
>> entire cluster to new machines, this becomes even harder.
>>
>> I don't see why re-resolving host names to IPs in the reconnect logic
>> is a problem for zookeeper, since you shuffle the list of IPs anyways.
>>
>> Thanks,
>> Neha
>>
>>
>> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org>
>> wrote:
>> > You can't sensibly round robin within the client code if you re-resolve
>> on
>> > every reconnect, if you're using dns rr. If that's your goal you'd want a
>> > list of dns alias names and re-resolve each hostname when you hit it on
>> > reconnect. But that will break people using dns rr.
>> > You can look into writing a pluggable reconnect logic into the zk client,
>> > that's what would be required to do this but at the end of the day you'll
>> > have to give your users special clients to make that work.
>> >
>> > C
>> >  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com>
>> wrote:
>> >
>> >> I was reading through the client code and saw that zookeeper client
>> >> caches the server IPs during startup and maintains it for the rest of
>> >> its lifetime. If we go with the DNS RR approach or a load balancer
>> >> approach, and later swap out a server with a new one ( with a new IP
>> >> ), all clients would have to be restarted to be able to "forget" the
>> >> old IP and see the new one. That doesn't look like a clean approach to
>> >> such upgrades. One way of getting around this problem, is adding the
>> >> resolution of host names to IPs in the "reconnect" logic in addition
>> >> to the constructor. So when such upgrades happen and the client
>> >> reconnects, it will see the new list of IPs, and wouldn't require to
>> >> be restarted.
>> >>
>> >> Does this approach sound good or am I missing something here ?
>> >>
>> >> Thanks,
>> >> Neha
>> >>
>> >> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
>> >> wrote:
>> >> > DNS RR is good. I had good experiences using that for my client
>> >> > configs for exactly the reasons you are listing.
>> >> >
>> >> > On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
>> neha.narkhede@gmail.com>
>> >> wrote:
>> >> >> Thanks for the responses!
>> >> >>
>> >> >>>> How are your clients configured to find the zks now?
>> >> >>
>> >> >> Our clients currently use the list of hostnames and ports that
>> >> >> comprise the zookeeper cluster. For example,
>> >> >> zoo1:port1,zoo2:port2,zoo3:port3
>> >> >>
>> >> >>>> > - switch DNS,
>> >> >>> - wait for caches to die,
>> >> >>
>> >> >> This is something we thought about however, if I understand it
>> >> >> correctly, doesn't JVM cache DNS entries forever until it is
>> restarted
>> >> >> ? We haven't specifically turned DNS caching off on our clients. So
>> >> >> this solution would require us to restart the clients to see the new
>> >> >> list of zookeeper hosts.
>> >> >>
>> >> >> Another thought is to use DNS RR and have the client zk url have one
>> >> >> name that resolves to and returns a list of IPs to the zookeeper
>> >> >> client. This has the advantage of being able to perform hardware
>> >> >> migration without changing the client connection url, in the future.
>> >> >> Do people have thoughts about using a DNS RR ?
>> >> >>
>> >> >> Thanks,
>> >> >> Neha
>> >> >>
>> >> >> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
>> >> wrote:
>> >> >>> In particular, aren't you using DNS names?  If you are, then you can
>> >> >>>
>> >> >>> - expand the quorum with the new hardware on new IP addresses,
>> >> >>> - switch DNS,
>> >> >>> - wait for caches to die,
>> >> >>> - restart applications without reconfig or otherwise force new
>> >> connections,
>> >> >>> - decrease quorum size again
>> >> >>>
>> >> >>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
>> camille@apache.org
>> >> >wrote:
>> >> >>>
>> >> >>>> How are your clients configured to find the zks now? How many
>> clients
>> >> do
>> >> >>>> you have?
>> >> >>>>
>> >> >>>> From my phone
>> >> >>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
>> >> wrote:
>> >> >>>>
>> >> >>>> > Hi,
>> >> >>>> >
>> >> >>>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate
>> our
>> >> >>>> > zookeeper cluster to new hardware. I'm trying to figure out the
>> best
>> >> >>>> > strategy to achieve that with no downtime.
>> >> >>>> > Here are some possible solutions I see at the moment, I could
>> have
>> >> >>>> > missed a few though -
>> >> >>>> >
>> >> >>>> > 1. Swap each machine out with a new machine, but with the same
>> >> host/IP.
>> >> >>>> >
>> >> >>>> > Pros: No client side config needs to be changed.
>> >> >>>> > Cons: Relatively tedious task for Operations
>> >> >>>> >
>> >> >>>> > 2. Add new machines, with different host/IPs to the existing
>> >> cluster,
>> >> >>>> > and remove the older machines, taking care to maintain the
>> quorum at
>> >> >>>> > all times
>> >> >>>> >
>> >> >>>> > Pros: Easier for Operations
>> >> >>>> > Cons: Client side configs need to be changed and clients need to
>> be
>> >> >>>> > restarted/bounced. Another problem is having a large quorum for
>> >> >>>> > sometime (potentially 9 nodes).
>> >> >>>> >
>> >> >>>> > 3. Hide the new cluster behind either a Hardware load balancer
>> or a
>> >> >>>> > DNS server resolving to all host ips.
>> >> >>>> >
>> >> >>>> > Pros: Makes it easier to move hardware around in the future
>> >> >>>> > Cons: Possible timeout issues with load balancers messing with
>> >> >>>> > zookeeper functionality or performance
>> >> >>>> >
>> >> >>>> > Read this and found it helpful -
>> >> >>>> >
>> >> >>>> >
>> >> >>>>
>> >>
>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>> >> >>>> > But would like to hear from the authors and the users who might
>> have
>> >> >>>> > tried this in a real production setup.
>> >> >>>> >
>> >> >>>> > I'm very interested in finding a long term solution for masking
>> the
>> >> >>>> > zookeeper host names. Any inputs here are appreciated !
>> >> >>>> >
>> >> >>>> > In addition to this, it will also be great to know what people
>> think
>> >> >>>> > about options 1 and 2, as a solution for hardware changes in
>> >> >>>> > Zookeeper.
>> >> >>>> >
>> >> >>>> > Thanks,
>> >> >>>> > Neha
>> >> >>>> >
>> >> >>>>
>> >>
>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Camille Fournier <ca...@apache.org>.
We don't shuffle IPs after the initial resolution of IP addresses.

In DNS RR, you resolve to a list of IPs, shuffle these, and then we round
robin through them trying to connect. If you re-resolve on every
round-robin, you have to put in logic to know which ones have changed and
somehow maintain that shuffle order or you aren't doing a fair back end
round robin, which people using the ZK client against DNS RR are relying on
today.

If you just have machine names in a list that you pass in, then yes, we
could re-resolve on every reconnect and you could just re-alias that name
to a new IP. But you'll have to put in logic that will do that but not
break people using DNS RR.

I realize that moving machines is difficult when you have lots of clients.
I'm a bit surprised your admins can't maintain machine IP addresses on a
machine move given a cluster of that complexity, though. I also think that
if we're going to be putting special cases like this in we might just want
to go all the way to a pluggable reconnection scheme, but maybe that is too
aggressive.

C

On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <ne...@gmail.com>wrote:

> Maybe I didn't express myself clearly. When I said DNS RR, I meant its
> simplest implementation which resolves a hostname to multiple IPs.
>
> Whatever method you use to map host names to IPs, the problem is that
> the zookeeper client code will always cache the IPs. So to be able to
> swap out a machine, all clients would have to be restarted, which if
> you have 100s of clients, is a major pain. If you want to move the
> entire cluster to new machines, this becomes even harder.
>
> I don't see why re-resolving host names to IPs in the reconnect logic
> is a problem for zookeeper, since you shuffle the list of IPs anyways.
>
> Thanks,
> Neha
>
>
> On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org>
> wrote:
> > You can't sensibly round robin within the client code if you re-resolve
> on
> > every reconnect, if you're using dns rr. If that's your goal you'd want a
> > list of dns alias names and re-resolve each hostname when you hit it on
> > reconnect. But that will break people using dns rr.
> > You can look into writing a pluggable reconnect logic into the zk client,
> > that's what would be required to do this but at the end of the day you'll
> > have to give your users special clients to make that work.
> >
> > C
> >  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com>
> wrote:
> >
> >> I was reading through the client code and saw that zookeeper client
> >> caches the server IPs during startup and maintains it for the rest of
> >> its lifetime. If we go with the DNS RR approach or a load balancer
> >> approach, and later swap out a server with a new one ( with a new IP
> >> ), all clients would have to be restarted to be able to "forget" the
> >> old IP and see the new one. That doesn't look like a clean approach to
> >> such upgrades. One way of getting around this problem, is adding the
> >> resolution of host names to IPs in the "reconnect" logic in addition
> >> to the constructor. So when such upgrades happen and the client
> >> reconnects, it will see the new list of IPs, and wouldn't require to
> >> be restarted.
> >>
> >> Does this approach sound good or am I missing something here ?
> >>
> >> Thanks,
> >> Neha
> >>
> >> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
> >> wrote:
> >> > DNS RR is good. I had good experiences using that for my client
> >> > configs for exactly the reasons you are listing.
> >> >
> >> > On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <
> neha.narkhede@gmail.com>
> >> wrote:
> >> >> Thanks for the responses!
> >> >>
> >> >>>> How are your clients configured to find the zks now?
> >> >>
> >> >> Our clients currently use the list of hostnames and ports that
> >> >> comprise the zookeeper cluster. For example,
> >> >> zoo1:port1,zoo2:port2,zoo3:port3
> >> >>
> >> >>>> > - switch DNS,
> >> >>> - wait for caches to die,
> >> >>
> >> >> This is something we thought about however, if I understand it
> >> >> correctly, doesn't JVM cache DNS entries forever until it is
> restarted
> >> >> ? We haven't specifically turned DNS caching off on our clients. So
> >> >> this solution would require us to restart the clients to see the new
> >> >> list of zookeeper hosts.
> >> >>
> >> >> Another thought is to use DNS RR and have the client zk url have one
> >> >> name that resolves to and returns a list of IPs to the zookeeper
> >> >> client. This has the advantage of being able to perform hardware
> >> >> migration without changing the client connection url, in the future.
> >> >> Do people have thoughts about using a DNS RR ?
> >> >>
> >> >> Thanks,
> >> >> Neha
> >> >>
> >> >> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >> >>> In particular, aren't you using DNS names?  If you are, then you can
> >> >>>
> >> >>> - expand the quorum with the new hardware on new IP addresses,
> >> >>> - switch DNS,
> >> >>> - wait for caches to die,
> >> >>> - restart applications without reconfig or otherwise force new
> >> connections,
> >> >>> - decrease quorum size again
> >> >>>
> >> >>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <
> camille@apache.org
> >> >wrote:
> >> >>>
> >> >>>> How are your clients configured to find the zks now? How many
> clients
> >> do
> >> >>>> you have?
> >> >>>>
> >> >>>> From my phone
> >> >>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
> >> wrote:
> >> >>>>
> >> >>>> > Hi,
> >> >>>> >
> >> >>>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate
> our
> >> >>>> > zookeeper cluster to new hardware. I'm trying to figure out the
> best
> >> >>>> > strategy to achieve that with no downtime.
> >> >>>> > Here are some possible solutions I see at the moment, I could
> have
> >> >>>> > missed a few though -
> >> >>>> >
> >> >>>> > 1. Swap each machine out with a new machine, but with the same
> >> host/IP.
> >> >>>> >
> >> >>>> > Pros: No client side config needs to be changed.
> >> >>>> > Cons: Relatively tedious task for Operations
> >> >>>> >
> >> >>>> > 2. Add new machines, with different host/IPs to the existing
> >> cluster,
> >> >>>> > and remove the older machines, taking care to maintain the
> quorum at
> >> >>>> > all times
> >> >>>> >
> >> >>>> > Pros: Easier for Operations
> >> >>>> > Cons: Client side configs need to be changed and clients need to
> be
> >> >>>> > restarted/bounced. Another problem is having a large quorum for
> >> >>>> > sometime (potentially 9 nodes).
> >> >>>> >
> >> >>>> > 3. Hide the new cluster behind either a Hardware load balancer
> or a
> >> >>>> > DNS server resolving to all host ips.
> >> >>>> >
> >> >>>> > Pros: Makes it easier to move hardware around in the future
> >> >>>> > Cons: Possible timeout issues with load balancers messing with
> >> >>>> > zookeeper functionality or performance
> >> >>>> >
> >> >>>> > Read this and found it helpful -
> >> >>>> >
> >> >>>> >
> >> >>>>
> >>
> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
> >> >>>> > But would like to hear from the authors and the users who might
> have
> >> >>>> > tried this in a real production setup.
> >> >>>> >
> >> >>>> > I'm very interested in finding a long term solution for masking
> the
> >> >>>> > zookeeper host names. Any inputs here are appreciated !
> >> >>>> >
> >> >>>> > In addition to this, it will also be great to know what people
> think
> >> >>>> > about options 1 and 2, as a solution for hardware changes in
> >> >>>> > Zookeeper.
> >> >>>> >
> >> >>>> > Thanks,
> >> >>>> > Neha
> >> >>>> >
> >> >>>>
> >>
>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Neha Narkhede <ne...@gmail.com>.
Maybe I didn't express myself clearly. When I said DNS RR, I meant its
simplest implementation which resolves a hostname to multiple IPs.

Whatever method you use to map host names to IPs, the problem is that
the zookeeper client code will always cache the IPs. So to be able to
swap out a machine, all clients would have to be restarted, which if
you have 100s of clients, is a major pain. If you want to move the
entire cluster to new machines, this becomes even harder.

I don't see why re-resolving host names to IPs in the reconnect logic
is a problem for zookeeper, since you shuffle the list of IPs anyways.

Thanks,
Neha


On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <ca...@apache.org> wrote:
> You can't sensibly round robin within the client code if you re-resolve on
> every reconnect, if you're using dns rr. If that's your goal you'd want a
> list of dns alias names and re-resolve each hostname when you hit it on
> reconnect. But that will break people using dns rr.
> You can look into writing a pluggable reconnect logic into the zk client,
> that's what would be required to do this but at the end of the day you'll
> have to give your users special clients to make that work.
>
> C
>  On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>
>> I was reading through the client code and saw that zookeeper client
>> caches the server IPs during startup and maintains it for the rest of
>> its lifetime. If we go with the DNS RR approach or a load balancer
>> approach, and later swap out a server with a new one ( with a new IP
>> ), all clients would have to be restarted to be able to "forget" the
>> old IP and see the new one. That doesn't look like a clean approach to
>> such upgrades. One way of getting around this problem, is adding the
>> resolution of host names to IPs in the "reconnect" logic in addition
>> to the constructor. So when such upgrades happen and the client
>> reconnects, it will see the new list of IPs, and wouldn't require to
>> be restarted.
>>
>> Does this approach sound good or am I missing something here ?
>>
>> Thanks,
>> Neha
>>
>> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
>> wrote:
>> > DNS RR is good. I had good experiences using that for my client
>> > configs for exactly the reasons you are listing.
>> >
>> > On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <ne...@gmail.com>
>> wrote:
>> >> Thanks for the responses!
>> >>
>> >>>> How are your clients configured to find the zks now?
>> >>
>> >> Our clients currently use the list of hostnames and ports that
>> >> comprise the zookeeper cluster. For example,
>> >> zoo1:port1,zoo2:port2,zoo3:port3
>> >>
>> >>>> > - switch DNS,
>> >>> - wait for caches to die,
>> >>
>> >> This is something we thought about however, if I understand it
>> >> correctly, doesn't JVM cache DNS entries forever until it is restarted
>> >> ? We haven't specifically turned DNS caching off on our clients. So
>> >> this solution would require us to restart the clients to see the new
>> >> list of zookeeper hosts.
>> >>
>> >> Another thought is to use DNS RR and have the client zk url have one
>> >> name that resolves to and returns a list of IPs to the zookeeper
>> >> client. This has the advantage of being able to perform hardware
>> >> migration without changing the client connection url, in the future.
>> >> Do people have thoughts about using a DNS RR ?
>> >>
>> >> Thanks,
>> >> Neha
>> >>
>> >> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >>> In particular, aren't you using DNS names?  If you are, then you can
>> >>>
>> >>> - expand the quorum with the new hardware on new IP addresses,
>> >>> - switch DNS,
>> >>> - wait for caches to die,
>> >>> - restart applications without reconfig or otherwise force new
>> connections,
>> >>> - decrease quorum size again
>> >>>
>> >>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <camille@apache.org
>> >wrote:
>> >>>
>> >>>> How are your clients configured to find the zks now? How many clients
>> do
>> >>>> you have?
>> >>>>
>> >>>> From my phone
>> >>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
>> wrote:
>> >>>>
>> >>>> > Hi,
>> >>>> >
>> >>>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
>> >>>> > zookeeper cluster to new hardware. I'm trying to figure out the best
>> >>>> > strategy to achieve that with no downtime.
>> >>>> > Here are some possible solutions I see at the moment, I could have
>> >>>> > missed a few though -
>> >>>> >
>> >>>> > 1. Swap each machine out with a new machine, but with the same
>> host/IP.
>> >>>> >
>> >>>> > Pros: No client side config needs to be changed.
>> >>>> > Cons: Relatively tedious task for Operations
>> >>>> >
>> >>>> > 2. Add new machines, with different host/IPs to the existing
>> cluster,
>> >>>> > and remove the older machines, taking care to maintain the quorum at
>> >>>> > all times
>> >>>> >
>> >>>> > Pros: Easier for Operations
>> >>>> > Cons: Client side configs need to be changed and clients need to be
>> >>>> > restarted/bounced. Another problem is having a large quorum for
>> >>>> > sometime (potentially 9 nodes).
>> >>>> >
>> >>>> > 3. Hide the new cluster behind either a Hardware load balancer or a
>> >>>> > DNS server resolving to all host ips.
>> >>>> >
>> >>>> > Pros: Makes it easier to move hardware around in the future
>> >>>> > Cons: Possible timeout issues with load balancers messing with
>> >>>> > zookeeper functionality or performance
>> >>>> >
>> >>>> > Read this and found it helpful -
>> >>>> >
>> >>>> >
>> >>>>
>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>> >>>> > But would like to hear from the authors and the users who might have
>> >>>> > tried this in a real production setup.
>> >>>> >
>> >>>> > I'm very interested in finding a long term solution for masking the
>> >>>> > zookeeper host names. Any inputs here are appreciated !
>> >>>> >
>> >>>> > In addition to this, it will also be great to know what people think
>> >>>> > about options 1 and 2, as a solution for hardware changes in
>> >>>> > Zookeeper.
>> >>>> >
>> >>>> > Thanks,
>> >>>> > Neha
>> >>>> >
>> >>>>
>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Camille Fournier <ca...@apache.org>.
You can't sensibly round robin within the client code if you re-resolve on
every reconnect, if you're using dns rr. If that's your goal you'd want a
list of dns alias names and re-resolve each hostname when you hit it on
reconnect. But that will break people using dns rr.
You can look into writing a pluggable reconnect logic into the zk client,
that's what would be required to do this but at the end of the day you'll
have to give your users special clients to make that work.

C
 On Jan 9, 2012 1:16 PM, "Neha Narkhede" <ne...@gmail.com> wrote:

> I was reading through the client code and saw that zookeeper client
> caches the server IPs during startup and maintains it for the rest of
> its lifetime. If we go with the DNS RR approach or a load balancer
> approach, and later swap out a server with a new one ( with a new IP
> ), all clients would have to be restarted to be able to "forget" the
> old IP and see the new one. That doesn't look like a clean approach to
> such upgrades. One way of getting around this problem, is adding the
> resolution of host names to IPs in the "reconnect" logic in addition
> to the constructor. So when such upgrades happen and the client
> reconnects, it will see the new list of IPs, and wouldn't require to
> be restarted.
>
> Does this approach sound good or am I missing something here ?
>
> Thanks,
> Neha
>
> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org>
> wrote:
> > DNS RR is good. I had good experiences using that for my client
> > configs for exactly the reasons you are listing.
> >
> > On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <ne...@gmail.com>
> wrote:
> >> Thanks for the responses!
> >>
> >>>> How are your clients configured to find the zks now?
> >>
> >> Our clients currently use the list of hostnames and ports that
> >> comprise the zookeeper cluster. For example,
> >> zoo1:port1,zoo2:port2,zoo3:port3
> >>
> >>>> > - switch DNS,
> >>> - wait for caches to die,
> >>
> >> This is something we thought about however, if I understand it
> >> correctly, doesn't JVM cache DNS entries forever until it is restarted
> >> ? We haven't specifically turned DNS caching off on our clients. So
> >> this solution would require us to restart the clients to see the new
> >> list of zookeeper hosts.
> >>
> >> Another thought is to use DNS RR and have the client zk url have one
> >> name that resolves to and returns a list of IPs to the zookeeper
> >> client. This has the advantage of being able to perform hardware
> >> migration without changing the client connection url, in the future.
> >> Do people have thoughts about using a DNS RR ?
> >>
> >> Thanks,
> >> Neha
> >>
> >> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>> In particular, aren't you using DNS names?  If you are, then you can
> >>>
> >>> - expand the quorum with the new hardware on new IP addresses,
> >>> - switch DNS,
> >>> - wait for caches to die,
> >>> - restart applications without reconfig or otherwise force new
> connections,
> >>> - decrease quorum size again
> >>>
> >>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <camille@apache.org
> >wrote:
> >>>
> >>>> How are your clients configured to find the zks now? How many clients
> do
> >>>> you have?
> >>>>
> >>>> From my phone
> >>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com>
> wrote:
> >>>>
> >>>> > Hi,
> >>>> >
> >>>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
> >>>> > zookeeper cluster to new hardware. I'm trying to figure out the best
> >>>> > strategy to achieve that with no downtime.
> >>>> > Here are some possible solutions I see at the moment, I could have
> >>>> > missed a few though -
> >>>> >
> >>>> > 1. Swap each machine out with a new machine, but with the same
> host/IP.
> >>>> >
> >>>> > Pros: No client side config needs to be changed.
> >>>> > Cons: Relatively tedious task for Operations
> >>>> >
> >>>> > 2. Add new machines, with different host/IPs to the existing
> cluster,
> >>>> > and remove the older machines, taking care to maintain the quorum at
> >>>> > all times
> >>>> >
> >>>> > Pros: Easier for Operations
> >>>> > Cons: Client side configs need to be changed and clients need to be
> >>>> > restarted/bounced. Another problem is having a large quorum for
> >>>> > sometime (potentially 9 nodes).
> >>>> >
> >>>> > 3. Hide the new cluster behind either a Hardware load balancer or a
> >>>> > DNS server resolving to all host ips.
> >>>> >
> >>>> > Pros: Makes it easier to move hardware around in the future
> >>>> > Cons: Possible timeout issues with load balancers messing with
> >>>> > zookeeper functionality or performance
> >>>> >
> >>>> > Read this and found it helpful -
> >>>> >
> >>>> >
> >>>>
> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
> >>>> > But would like to hear from the authors and the users who might have
> >>>> > tried this in a real production setup.
> >>>> >
> >>>> > I'm very interested in finding a long term solution for masking the
> >>>> > zookeeper host names. Any inputs here are appreciated !
> >>>> >
> >>>> > In addition to this, it will also be great to know what people think
> >>>> > about options 1 and 2, as a solution for hardware changes in
> >>>> > Zookeeper.
> >>>> >
> >>>> > Thanks,
> >>>> > Neha
> >>>> >
> >>>>
>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Neha Narkhede <ne...@gmail.com>.
I was reading through the client code and saw that zookeeper client
caches the server IPs during startup and maintains it for the rest of
its lifetime. If we go with the DNS RR approach or a load balancer
approach, and later swap out a server with a new one ( with a new IP
), all clients would have to be restarted to be able to "forget" the
old IP and see the new one. That doesn't look like a clean approach to
such upgrades. One way of getting around this problem, is adding the
resolution of host names to IPs in the "reconnect" logic in addition
to the constructor. So when such upgrades happen and the client
reconnects, it will see the new list of IPs, and wouldn't require to
be restarted.

Does this approach sound good or am I missing something here ?

Thanks,
Neha

On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <ca...@apache.org> wrote:
> DNS RR is good. I had good experiences using that for my client
> configs for exactly the reasons you are listing.
>
> On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <ne...@gmail.com> wrote:
>> Thanks for the responses!
>>
>>>> How are your clients configured to find the zks now?
>>
>> Our clients currently use the list of hostnames and ports that
>> comprise the zookeeper cluster. For example,
>> zoo1:port1,zoo2:port2,zoo3:port3
>>
>>>> > - switch DNS,
>>> - wait for caches to die,
>>
>> This is something we thought about however, if I understand it
>> correctly, doesn't JVM cache DNS entries forever until it is restarted
>> ? We haven't specifically turned DNS caching off on our clients. So
>> this solution would require us to restart the clients to see the new
>> list of zookeeper hosts.
>>
>> Another thought is to use DNS RR and have the client zk url have one
>> name that resolves to and returns a list of IPs to the zookeeper
>> client. This has the advantage of being able to perform hardware
>> migration without changing the client connection url, in the future.
>> Do people have thoughts about using a DNS RR ?
>>
>> Thanks,
>> Neha
>>
>> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com> wrote:
>>> In particular, aren't you using DNS names?  If you are, then you can
>>>
>>> - expand the quorum with the new hardware on new IP addresses,
>>> - switch DNS,
>>> - wait for caches to die,
>>> - restart applications without reconfig or otherwise force new connections,
>>> - decrease quorum size again
>>>
>>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <ca...@apache.org>wrote:
>>>
>>>> How are your clients configured to find the zks now? How many clients do
>>>> you have?
>>>>
>>>> From my phone
>>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
>>>> > zookeeper cluster to new hardware. I'm trying to figure out the best
>>>> > strategy to achieve that with no downtime.
>>>> > Here are some possible solutions I see at the moment, I could have
>>>> > missed a few though -
>>>> >
>>>> > 1. Swap each machine out with a new machine, but with the same host/IP.
>>>> >
>>>> > Pros: No client side config needs to be changed.
>>>> > Cons: Relatively tedious task for Operations
>>>> >
>>>> > 2. Add new machines, with different host/IPs to the existing cluster,
>>>> > and remove the older machines, taking care to maintain the quorum at
>>>> > all times
>>>> >
>>>> > Pros: Easier for Operations
>>>> > Cons: Client side configs need to be changed and clients need to be
>>>> > restarted/bounced. Another problem is having a large quorum for
>>>> > sometime (potentially 9 nodes).
>>>> >
>>>> > 3. Hide the new cluster behind either a Hardware load balancer or a
>>>> > DNS server resolving to all host ips.
>>>> >
>>>> > Pros: Makes it easier to move hardware around in the future
>>>> > Cons: Possible timeout issues with load balancers messing with
>>>> > zookeeper functionality or performance
>>>> >
>>>> > Read this and found it helpful -
>>>> >
>>>> >
>>>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>>>> > But would like to hear from the authors and the users who might have
>>>> > tried this in a real production setup.
>>>> >
>>>> > I'm very interested in finding a long term solution for masking the
>>>> > zookeeper host names. Any inputs here are appreciated !
>>>> >
>>>> > In addition to this, it will also be great to know what people think
>>>> > about options 1 and 2, as a solution for hardware changes in
>>>> > Zookeeper.
>>>> >
>>>> > Thanks,
>>>> > Neha
>>>> >
>>>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Camille Fournier <ca...@apache.org>.
DNS RR is good. I had good experiences using that for my client
configs for exactly the reasons you are listing.

On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede <ne...@gmail.com> wrote:
> Thanks for the responses!
>
>>> How are your clients configured to find the zks now?
>
> Our clients currently use the list of hostnames and ports that
> comprise the zookeeper cluster. For example,
> zoo1:port1,zoo2:port2,zoo3:port3
>
>>> > - switch DNS,
>> - wait for caches to die,
>
> This is something we thought about however, if I understand it
> correctly, doesn't JVM cache DNS entries forever until it is restarted
> ? We haven't specifically turned DNS caching off on our clients. So
> this solution would require us to restart the clients to see the new
> list of zookeeper hosts.
>
> Another thought is to use DNS RR and have the client zk url have one
> name that resolves to and returns a list of IPs to the zookeeper
> client. This has the advantage of being able to perform hardware
> migration without changing the client connection url, in the future.
> Do people have thoughts about using a DNS RR ?
>
> Thanks,
> Neha
>
> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com> wrote:
>> In particular, aren't you using DNS names?  If you are, then you can
>>
>> - expand the quorum with the new hardware on new IP addresses,
>> - switch DNS,
>> - wait for caches to die,
>> - restart applications without reconfig or otherwise force new connections,
>> - decrease quorum size again
>>
>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <ca...@apache.org>wrote:
>>
>>> How are your clients configured to find the zks now? How many clients do
>>> you have?
>>>
>>> From my phone
>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
>>> > zookeeper cluster to new hardware. I'm trying to figure out the best
>>> > strategy to achieve that with no downtime.
>>> > Here are some possible solutions I see at the moment, I could have
>>> > missed a few though -
>>> >
>>> > 1. Swap each machine out with a new machine, but with the same host/IP.
>>> >
>>> > Pros: No client side config needs to be changed.
>>> > Cons: Relatively tedious task for Operations
>>> >
>>> > 2. Add new machines, with different host/IPs to the existing cluster,
>>> > and remove the older machines, taking care to maintain the quorum at
>>> > all times
>>> >
>>> > Pros: Easier for Operations
>>> > Cons: Client side configs need to be changed and clients need to be
>>> > restarted/bounced. Another problem is having a large quorum for
>>> > sometime (potentially 9 nodes).
>>> >
>>> > 3. Hide the new cluster behind either a Hardware load balancer or a
>>> > DNS server resolving to all host ips.
>>> >
>>> > Pros: Makes it easier to move hardware around in the future
>>> > Cons: Possible timeout issues with load balancers messing with
>>> > zookeeper functionality or performance
>>> >
>>> > Read this and found it helpful -
>>> >
>>> >
>>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>>> > But would like to hear from the authors and the users who might have
>>> > tried this in a real production setup.
>>> >
>>> > I'm very interested in finding a long term solution for masking the
>>> > zookeeper host names. Any inputs here are appreciated !
>>> >
>>> > In addition to this, it will also be great to know what people think
>>> > about options 1 and 2, as a solution for hardware changes in
>>> > Zookeeper.
>>> >
>>> > Thanks,
>>> > Neha
>>> >
>>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Neha Narkhede <ne...@gmail.com>.
Thanks for the responses!

>> How are your clients configured to find the zks now?

Our clients currently use the list of hostnames and ports that
comprise the zookeeper cluster. For example,
zoo1:port1,zoo2:port2,zoo3:port3

>> > - switch DNS,
> - wait for caches to die,

This is something we thought about however, if I understand it
correctly, doesn't JVM cache DNS entries forever until it is restarted
? We haven't specifically turned DNS caching off on our clients. So
this solution would require us to restart the clients to see the new
list of zookeeper hosts.

Another thought is to use DNS RR and have the client zk url have one
name that resolves to and returns a list of IPs to the zookeeper
client. This has the advantage of being able to perform hardware
migration without changing the client connection url, in the future.
Do people have thoughts about using a DNS RR ?

Thanks,
Neha

On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <te...@gmail.com> wrote:
> In particular, aren't you using DNS names?  If you are, then you can
>
> - expand the quorum with the new hardware on new IP addresses,
> - switch DNS,
> - wait for caches to die,
> - restart applications without reconfig or otherwise force new connections,
> - decrease quorum size again
>
> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <ca...@apache.org>wrote:
>
>> How are your clients configured to find the zks now? How many clients do
>> you have?
>>
>> From my phone
>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
>> > zookeeper cluster to new hardware. I'm trying to figure out the best
>> > strategy to achieve that with no downtime.
>> > Here are some possible solutions I see at the moment, I could have
>> > missed a few though -
>> >
>> > 1. Swap each machine out with a new machine, but with the same host/IP.
>> >
>> > Pros: No client side config needs to be changed.
>> > Cons: Relatively tedious task for Operations
>> >
>> > 2. Add new machines, with different host/IPs to the existing cluster,
>> > and remove the older machines, taking care to maintain the quorum at
>> > all times
>> >
>> > Pros: Easier for Operations
>> > Cons: Client side configs need to be changed and clients need to be
>> > restarted/bounced. Another problem is having a large quorum for
>> > sometime (potentially 9 nodes).
>> >
>> > 3. Hide the new cluster behind either a Hardware load balancer or a
>> > DNS server resolving to all host ips.
>> >
>> > Pros: Makes it easier to move hardware around in the future
>> > Cons: Possible timeout issues with load balancers messing with
>> > zookeeper functionality or performance
>> >
>> > Read this and found it helpful -
>> >
>> >
>> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
>> > But would like to hear from the authors and the users who might have
>> > tried this in a real production setup.
>> >
>> > I'm very interested in finding a long term solution for masking the
>> > zookeeper host names. Any inputs here are appreciated !
>> >
>> > In addition to this, it will also be great to know what people think
>> > about options 1 and 2, as a solution for hardware changes in
>> > Zookeeper.
>> >
>> > Thanks,
>> > Neha
>> >
>>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Ted Dunning <te...@gmail.com>.
In particular, aren't you using DNS names?  If you are, then you can

- expand the quorum with the new hardware on new IP addresses,
- switch DNS,
- wait for caches to die,
- restart applications without reconfig or otherwise force new connections,
- decrease quorum size again

On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier <ca...@apache.org>wrote:

> How are your clients configured to find the zks now? How many clients do
> you have?
>
> From my phone
> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com> wrote:
>
> > Hi,
> >
> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
> > zookeeper cluster to new hardware. I'm trying to figure out the best
> > strategy to achieve that with no downtime.
> > Here are some possible solutions I see at the moment, I could have
> > missed a few though -
> >
> > 1. Swap each machine out with a new machine, but with the same host/IP.
> >
> > Pros: No client side config needs to be changed.
> > Cons: Relatively tedious task for Operations
> >
> > 2. Add new machines, with different host/IPs to the existing cluster,
> > and remove the older machines, taking care to maintain the quorum at
> > all times
> >
> > Pros: Easier for Operations
> > Cons: Client side configs need to be changed and clients need to be
> > restarted/bounced. Another problem is having a large quorum for
> > sometime (potentially 9 nodes).
> >
> > 3. Hide the new cluster behind either a Hardware load balancer or a
> > DNS server resolving to all host ips.
> >
> > Pros: Makes it easier to move hardware around in the future
> > Cons: Possible timeout issues with load balancers messing with
> > zookeeper functionality or performance
> >
> > Read this and found it helpful -
> >
> >
> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
> > But would like to hear from the authors and the users who might have
> > tried this in a real production setup.
> >
> > I'm very interested in finding a long term solution for masking the
> > zookeeper host names. Any inputs here are appreciated !
> >
> > In addition to this, it will also be great to know what people think
> > about options 1 and 2, as a solution for hardware changes in
> > Zookeeper.
> >
> > Thanks,
> > Neha
> >
>

Re: Performing no downtime hardware changes to a live zookeeper cluster

Posted by Camille Fournier <ca...@apache.org>.
How are your clients configured to find the zks now? How many clients do
you have?

>From my phone
On Dec 20, 2011 3:14 PM, "Neha Narkhede" <ne...@gmail.com> wrote:

> Hi,
>
> As part of upgrading to Zookeeper 3.3.4, we also have to migrate our
> zookeeper cluster to new hardware. I'm trying to figure out the best
> strategy to achieve that with no downtime.
> Here are some possible solutions I see at the moment, I could have
> missed a few though -
>
> 1. Swap each machine out with a new machine, but with the same host/IP.
>
> Pros: No client side config needs to be changed.
> Cons: Relatively tedious task for Operations
>
> 2. Add new machines, with different host/IPs to the existing cluster,
> and remove the older machines, taking care to maintain the quorum at
> all times
>
> Pros: Easier for Operations
> Cons: Client side configs need to be changed and clients need to be
> restarted/bounced. Another problem is having a large quorum for
> sometime (potentially 9 nodes).
>
> 3. Hide the new cluster behind either a Hardware load balancer or a
> DNS server resolving to all host ips.
>
> Pros: Makes it easier to move hardware around in the future
> Cons: Possible timeout issues with load balancers messing with
> zookeeper functionality or performance
>
> Read this and found it helpful -
>
> http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1
> But would like to hear from the authors and the users who might have
> tried this in a real production setup.
>
> I'm very interested in finding a long term solution for masking the
> zookeeper host names. Any inputs here are appreciated !
>
> In addition to this, it will also be great to know what people think
> about options 1 and 2, as a solution for hardware changes in
> Zookeeper.
>
> Thanks,
> Neha
>