You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Zuber <ob...@gmail.com> on 2016/08/03 14:22:53 UTC

Kafka DNS Caching in AWS

Hello –
 
We are planning to use Kafka as Event Store in a system which is being built using event sourcing design approach.
Here is how we deployed the cluster in AWS to verify HA in the cloud (in our POC we only had 1 topic with 1 partition and 3 replication factor) -
1)    3 ZK servers running in different AZs (managed by Auto Scaling Group)
2)    3 Kafka brokers EC2 running in different AZs (managed by Auto Scaling Group)
3)    Kafka logs are stored in EBS volumes
4)    A type addresses are defined for all ZK servers & Kafka brokers in Route53
EC2 instance registers its IP for corresponding A type address (in Route53) on startup
 
But due a bug in ZKClient used by Kafka broker which caches ZK IP forever, I don’t see any other option other than bouncing all brokers.
 
One of the Netflix presentation (following links) mentions about the issue as well as couple of ZK JIRA defects but I haven’t found any concrete solution yet.
I would really appreciate any help in this regard.
 
http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
https://issues.apache.org/jira/browse/ZOOKEEPER-338
https://issues.apache.org/jira/browse/ZOOKEEPER-1506
http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
 
Thanks,
Zuber


Re: Kafka DNS Caching in AWS

Posted by Zuber Saiyed <ob...@gmail.com>.
Thank you all for your responses.

So here is what I tried and it worked since EIP is not an option for me -
1) Created an ENI with a dedicated IP
2) Associated that IP with A Type address
3) Assigned that ENI to the EC2 instance
4) Created EBS volume to keep ZK data

As EBS volume and ENI are bound to AZ, I have created AutoScaling group per AZ.


Thanks,
Zuber


On Wed, Aug 3, 2016 at 10:22 AM, Zuber <ob...@gmail.com> wrote:
> Hello –
>
>
>
> We are planning to use Kafka as Event Store in a system which is being built
> using event sourcing design approach.
>
> Here is how we deployed the cluster in AWS to verify HA in the cloud (in our
> POC we only had 1 topic with 1 partition and 3 replication factor) -
>
> 1)    3 ZK servers running in different AZs (managed by Auto Scaling Group)
>
> 2)    3 Kafka brokers EC2 running in different AZs (managed by Auto Scaling
> Group)
>
> 3)    Kafka logs are stored in EBS volumes
>
> 4)    A type addresses are defined for all ZK servers & Kafka brokers in
> Route53
>
> EC2 instance registers its IP for corresponding A type address (in Route53)
> on startup
>
>
>
> But due a bug in ZKClient used by Kafka broker which caches ZK IP forever, I
> don’t see any other option other than bouncing all brokers.
>
>
>
> One of the Netflix presentation (following links) mentions about the issue
> as well as couple of ZK JIRA defects but I haven’t found any concrete
> solution yet.
>
> I would really appreciate any help in this regard.
>
>
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-338
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
>
>
>
> Thanks,
>
> Zuber
>
>



-- 
Thanks & Regards,
Zuber Saiyed

Re: Kafka DNS Caching in AWS

Posted by Joe Lawson <jl...@opensourceconnections.com>.
In the past on classic EC2 with an autoscaling group of zookeeper
instances, I've used elastic IPs for my list. There we subscribed an SQS
queue to the autoscaling SNS topic and when a new instances was brought
online one of the spare IPs was allocated to the instance. It has to try
over and over sometimes if we had 3 EIPs and all were assigned as a new
instance was being brought online. All of the EIPs and SNS topic
subscriptions were automatically created via Cloudformation.

On Wed, Aug 3, 2016 at 12:54 PM, Gian Merlino <gi...@gmail.com> wrote:

> Hey Zuber,
>
> Our AWS ZK deployment involves a subnet that is not used for other things,
> fixed private IP addresses, and EBS volumes for ZK data. That way, if a ZK
> instance fails, it can be replaced with another instance with the same IP
> and data volume.
>
> On Wed, Aug 3, 2016 at 7:22 AM, Zuber <ob...@gmail.com> wrote:
>
> > Hello –
> >
> > We are planning to use Kafka as Event Store in a system which is being
> > built using event sourcing design approach.
> > Here is how we deployed the cluster in AWS to verify HA in the cloud (in
> > our POC we only had 1 topic with 1 partition and 3 replication factor) -
> > 1)    3 ZK servers running in different AZs (managed by Auto Scaling
> Group)
> > 2)    3 Kafka brokers EC2 running in different AZs (managed by Auto
> > Scaling Group)
> > 3)    Kafka logs are stored in EBS volumes
> > 4)    A type addresses are defined for all ZK servers & Kafka brokers in
> > Route53
> > EC2 instance registers its IP for corresponding A type address (in
> > Route53) on startup
> >
> > But due a bug in ZKClient used by Kafka broker which caches ZK IP
> forever,
> > I don’t see any other option other than bouncing all brokers.
> >
> > One of the Netflix presentation (following links) mentions about the
> issue
> > as well as couple of ZK JIRA defects but I haven’t found any concrete
> > solution yet.
> > I would really appreciate any help in this regard.
> >
> >
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> >
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> > https://issues.apache.org/jira/browse/ZOOKEEPER-338
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> >
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
> >
> > Thanks,
> > Zuber
> >
> >
>



-- 
-Joe

Re: Kafka DNS Caching in AWS

Posted by Gian Merlino <gi...@gmail.com>.
Hey Zuber,

Our AWS ZK deployment involves a subnet that is not used for other things,
fixed private IP addresses, and EBS volumes for ZK data. That way, if a ZK
instance fails, it can be replaced with another instance with the same IP
and data volume.

On Wed, Aug 3, 2016 at 7:22 AM, Zuber <ob...@gmail.com> wrote:

> Hello –
>
> We are planning to use Kafka as Event Store in a system which is being
> built using event sourcing design approach.
> Here is how we deployed the cluster in AWS to verify HA in the cloud (in
> our POC we only had 1 topic with 1 partition and 3 replication factor) -
> 1)    3 ZK servers running in different AZs (managed by Auto Scaling Group)
> 2)    3 Kafka brokers EC2 running in different AZs (managed by Auto
> Scaling Group)
> 3)    Kafka logs are stored in EBS volumes
> 4)    A type addresses are defined for all ZK servers & Kafka brokers in
> Route53
> EC2 instance registers its IP for corresponding A type address (in
> Route53) on startup
>
> But due a bug in ZKClient used by Kafka broker which caches ZK IP forever,
> I don’t see any other option other than bouncing all brokers.
>
> One of the Netflix presentation (following links) mentions about the issue
> as well as couple of ZK JIRA defects but I haven’t found any concrete
> solution yet.
> I would really appreciate any help in this regard.
>
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> https://issues.apache.org/jira/browse/ZOOKEEPER-338
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
>
> Thanks,
> Zuber
>
>

Re: Kafka DNS Caching in AWS

Posted by Alexis Midon <al...@airbnb.com.INVALID>.
Hi Gwen,

I have explored and tested this approach in the past. It does not work for
2 reasons:
 A. the first one relates to the ZKClient implementation,
 B. the second is the JVM behavior.


A. The ZKConnection [1] managed by ZKClient uses a legacy constructor of
org,apache.Zookeeper [2]. The created Zookeeper instance relies on a
StaticHostProvider [3].
This host provider implementation will resolve the DNS on instantiation. So
as soon as the Kafka broker creates its ZKClient instance, all server
addresses are resolved and the corresponding InetSocketAddress instances
will store the IP for their lifetime.  :(

I believe the right thing to do would be for ZKClient to use a custom
HostProvider implementation that create a new InetSocketAddress instance on
each invocation of `HostProvider#next()` [4] (therefore resolving the
address).

You would think this is enough, but no, because the JVM itself caches DNS.


B. When an InetSocketAddress instance resolves a DNS name, the JVM will
cache the value! So even if a dynamic HostProvider implementation is used,
the JVM might return a cached value.
And the default TTL is implementation specific. If I remember correctly the
Oracle JVM caches them for ever. [5]

So you must also configure the Kafka JVM correctly.

hope it helps,

Alexis

[1]
https://github.com/sgroschupf/zkclient/blob/ec77080a5d7a5d920fa0e8ea5bd5119fb02a06f1/src/main/java/org/I0Itec/zkclient/ZkConnection.java#L69

[2]
https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/ZooKeeper.java#L1210

[3]
https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/client/StaticHostProvider.java

[4]
https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1071

[5]
http://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html
*`networkaddress.cache.ttl`*Specified in java.security to indicate the
caching policy for successful name lookups from the name service.. The
value is specified as integer to indicate the number of seconds to cache
the successful lookup.

A value of -1 indicates "cache forever". The default behavior is to cache
forever when a security manager is installed, and to cache for an
implementation specific period of time, when a security manager is not
installed.
See also `networkaddress.cache.negative.ttl`.




On Wed, Aug 3, 2016 at 9:45 AM Gwen Shapira <gw...@confluent.io> wrote:

> Can you define a DNS name that round-robins to multiple IP addresses?
> This way ZKClient will cache the name and you can rotate IPs behind
> the scenes with no issues?
>
>
>
> On Wed, Aug 3, 2016 at 7:22 AM, Zuber <ob...@gmail.com> wrote:
> > Hello –
> >
> > We are planning to use Kafka as Event Store in a system which is being
> built using event sourcing design approach.
> > Here is how we deployed the cluster in AWS to verify HA in the cloud (in
> our POC we only had 1 topic with 1 partition and 3 replication factor) -
> > 1)    3 ZK servers running in different AZs (managed by Auto Scaling
> Group)
> > 2)    3 Kafka brokers EC2 running in different AZs (managed by Auto
> Scaling Group)
> > 3)    Kafka logs are stored in EBS volumes
> > 4)    A type addresses are defined for all ZK servers & Kafka brokers in
> Route53
> > EC2 instance registers its IP for corresponding A type address (in
> Route53) on startup
> >
> > But due a bug in ZKClient used by Kafka broker which caches ZK IP
> forever, I don’t see any other option other than bouncing all brokers.
> >
> > One of the Netflix presentation (following links) mentions about the
> issue as well as couple of ZK JIRA defects but I haven’t found any concrete
> solution yet.
> > I would really appreciate any help in this regard.
> >
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> > https://issues.apache.org/jira/browse/ZOOKEEPER-338
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> >
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
> >
> > Thanks,
> > Zuber
> >
>

Re: Kafka DNS Caching in AWS

Posted by Gwen Shapira <gw...@confluent.io>.
Can you define a DNS name that round-robins to multiple IP addresses?
This way ZKClient will cache the name and you can rotate IPs behind
the scenes with no issues?



On Wed, Aug 3, 2016 at 7:22 AM, Zuber <ob...@gmail.com> wrote:
> Hello –
>
> We are planning to use Kafka as Event Store in a system which is being built using event sourcing design approach.
> Here is how we deployed the cluster in AWS to verify HA in the cloud (in our POC we only had 1 topic with 1 partition and 3 replication factor) -
> 1)    3 ZK servers running in different AZs (managed by Auto Scaling Group)
> 2)    3 Kafka brokers EC2 running in different AZs (managed by Auto Scaling Group)
> 3)    Kafka logs are stored in EBS volumes
> 4)    A type addresses are defined for all ZK servers & Kafka brokers in Route53
> EC2 instance registers its IP for corresponding A type address (in Route53) on startup
>
> But due a bug in ZKClient used by Kafka broker which caches ZK IP forever, I don’t see any other option other than bouncing all brokers.
>
> One of the Netflix presentation (following links) mentions about the issue as well as couple of ZK JIRA defects but I haven’t found any concrete solution yet.
> I would really appreciate any help in this regard.
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> https://issues.apache.org/jira/browse/ZOOKEEPER-338
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
>
> Thanks,
> Zuber
>