You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Robert Kamphuis (JIRA)" <ji...@apache.org> on 2014/03/14 08:44:48 UTC
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934720#comment-13934720 ] 

Robert Kamphuis commented on ZOOKEEPER-1506:
--------------------------------------------

As this seems not to progress much, some tips for people working in AWS or similar enough environments. This is likely not the only, nor the best, but it is working for me. 
- configure the zookeeper ensemble servers to connect to the elastic-IP-address in stead of the hostname 
- on a serious failure of one of the servers, boot a replacement, and re-assign the corresponding elastic-ip to that server. 
- others will reconnect correctly 
- you will need to setup the Security group to explicitly enable the interconnect to 2888/3888(/2181) or your ports of choice for the elasticIPs to enable the connections to work. 
- downsides: 
-# traffic between zookeeper servers goes via whatever boxes doing the elastic-ip to server mapping - bigger latency. My measurements as an example: ping using private IPs vs elastic- IPs: 0.8 ms vs 1.4 msec (500 byte packets - servers in two different AZs in US-east)
-# you will need to pay for this traffic whereas when using the names which are mapped to the internal IPs you would not. 

Also: for the clients, I am using as connect string static DNS records with names like: zookeeperN.<domain> pointing to the ec2-A-B-C-D.compute-1.amazonaws.com - thus pointing to the elastic-ip's name and not the IPs. These are mapped by EC2 to the active private IPs after assigning the elastic-ip to an instance. The clients will be recognised properly as from the correct security group(s). No need to add all the client IPs - of which I have many, and changing set; just add the clients security groups access to the the zookeeper security group.  

BTW: if someone knows of good resources running zookeeper and curator-based clients in AWS I would kindly like to know where... 


> Re-try DNS hostname -> IP resolution if node connection fails
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1506
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.5
>         Environment: Ubuntu 11.04 64-bit
>            Reporter: Mike Heffner
>            Assignee: Michael Lasevich
>              Labels: patch
>             Fix For: 3.5.0
>
>         Attachments: zk-dns-caching-refresh.patch
>
>
> In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (<= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname->IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)