You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Tom Brown <to...@gmail.com> on 2015/10/07 00:31:53 UTC

Datacenter to datacenter over the open internet

Hello,

How do you consume a kafka topic from a remote location without a dedicated
connection? How do you protect the server?

The setup: data streams into our datacenter. We process it, and publish it
to a kafka cluster. The consumer is located in a different datacenter with
no direct connection. The most efficient scenario would be to setup a
point-to-point link but that idea has no traction with our executives. We
can setup a VPN; While functional, our IT department assures us that it
won't be able to scale.

What we're currently planning is to expose the kafka cluster IP addresses
to the internet, and only allow access via firewall. Each message will be
encrypted with a shared private key, so we're not worried about messages
being intercepted. What we are worried about is this: how brokers refer to
each other-- when a broker directs the consumer to the server that is in
charge of a particular region, does it use the host name (that could be
externally mapped to the public IP) or does it use the detected/private IP
address.

What solution would you use to consume a remote cluster?

--Tom

Re: Datacenter to datacenter over the open internet

Posted by Pradeep Gollakota <pr...@gmail.com>.
At Lithium, we have multiple datacenters and we distcp our data across our
Hadoop clusters. We have 2 DCs in NA and 1 in EU. We have a non-redundant
direct connect from our EU cluster to one of our NA DCs. If and when this
fails, we have automatic failover to a VPN that goes over the internet. The
amount of data thats moving across the clusters is not much, so we can get
away with this. We don't have Kafka replication setup yet, but we will be
setting it up using Mirror Maker and the same constraints apply.

Of course opening up your Kafka cluster to be reachable by the internet
would work too, but IMHO a VPN is more secure and reduces the surface area
of your infrastructure that could come under attack. It sucks that you
can't get your executives on board for a p2p direct connect as that is the
best solution.

On Tue, Oct 6, 2015 at 5:48 PM, Gwen Shapira <gw...@confluent.io> wrote:

> You can configure "advertised.host.name" for each broker, which is the
> name
> external consumers and producers will use to refer to the brokers.
>
> On Tue, Oct 6, 2015 at 3:31 PM, Tom Brown <to...@gmail.com> wrote:
>
> > Hello,
> >
> > How do you consume a kafka topic from a remote location without a
> dedicated
> > connection? How do you protect the server?
> >
> > The setup: data streams into our datacenter. We process it, and publish
> it
> > to a kafka cluster. The consumer is located in a different datacenter
> with
> > no direct connection. The most efficient scenario would be to setup a
> > point-to-point link but that idea has no traction with our executives. We
> > can setup a VPN; While functional, our IT department assures us that it
> > won't be able to scale.
> >
> > What we're currently planning is to expose the kafka cluster IP addresses
> > to the internet, and only allow access via firewall. Each message will be
> > encrypted with a shared private key, so we're not worried about messages
> > being intercepted. What we are worried about is this: how brokers refer
> to
> > each other-- when a broker directs the consumer to the server that is in
> > charge of a particular region, does it use the host name (that could be
> > externally mapped to the public IP) or does it use the detected/private
> IP
> > address.
> >
> > What solution would you use to consume a remote cluster?
> >
> > --Tom
> >
>

Re: Datacenter to datacenter over the open internet

Posted by Gwen Shapira <gw...@confluent.io>.
You can configure "advertised.host.name" for each broker, which is the name
external consumers and producers will use to refer to the brokers.

On Tue, Oct 6, 2015 at 3:31 PM, Tom Brown <to...@gmail.com> wrote:

> Hello,
>
> How do you consume a kafka topic from a remote location without a dedicated
> connection? How do you protect the server?
>
> The setup: data streams into our datacenter. We process it, and publish it
> to a kafka cluster. The consumer is located in a different datacenter with
> no direct connection. The most efficient scenario would be to setup a
> point-to-point link but that idea has no traction with our executives. We
> can setup a VPN; While functional, our IT department assures us that it
> won't be able to scale.
>
> What we're currently planning is to expose the kafka cluster IP addresses
> to the internet, and only allow access via firewall. Each message will be
> encrypted with a shared private key, so we're not worried about messages
> being intercepted. What we are worried about is this: how brokers refer to
> each other-- when a broker directs the consumer to the server that is in
> charge of a particular region, does it use the host name (that could be
> externally mapped to the public IP) or does it use the detected/private IP
> address.
>
> What solution would you use to consume a remote cluster?
>
> --Tom
>