You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Matt Wise <ma...@nextdoor.com> on 2013/04/22 20:31:27 UTC

Using Stunnel to encrypt/authenticate Kafka producers and consumers...

Hi there... we're currently looking into using Kafka as a pipeline for passing around log messages. We like its use of Zookeeper for coordination (as we already make heavy use of Zookeeper at Nextdoor), but I'm running into one big problem. Everything we do is a) in the cloud, b) secure, and c) cross-region/datacenter/cloud-provider.

We make use of SSL for both encryption and authentication of most of our services. My understanding is that Kafka 0.7.x producers and consumers connect to Zookeeper to retrieve a list of the current Kafka servers, and then make direct TCP connections to the individual servers that they need to to publish or subscribe to a stream. In 0.8.x thats changed, so now clients can connect to a single Kafka server and get a list of these servers via an API?

What I'm wondering is whether we can actually put an ELB in front of *all* of our Kafka servers, throw stunnel on them, and give our producers and clients a single endpoint to connect to (through the ELB) rather than having them connect directly to the individual Kafka servers. This would provide us both encryption of the data during transport, as well as authentication of the producers and subscribers. Lastly, if it works, it would provide  these features without impacting our ability to use existing kafka producer/consumers that people have written.

My concern is that the Kafka clients (producers or consumers?) would connect once through the ELB, then get the list of servers via the API, and finally try to connect directly to one of those Kafka servers rather than just leveraging the existing connection through the ELB.

Thoughts?

--Matt

Re: Using Stunnel to encrypt/authenticate Kafka producers and consumers...

Posted by Jason Rosenberg <jb...@squareup.com>.
I'm interested in the same topic (similar use case).

What I think would be nice too (and this has been discussed a bit in the
past on this list), would be to have ssl support within the kafka protocol.
 Zookeeper also doesn't support ssl, but at least now, in 0.8, producing
clients no longer really need zookeeper connections.  So, if we could just
get encryption support for the kafka protocol, it would go a long way.

Jason


On Mon, Apr 22, 2013 at 12:50 PM, Jonathan Creasy <jc...@box.com> wrote:

> Wouldn't it make more sense to do something like an encrypted tunnel
> between your core routers in each facility? LIke IPSEC on a GRE tunnel or
> something.
>
> This concept would need adjustment for those in the cloud but when you want
> to build an encrypted tunnel between a bunch of hosts and a bunch of hosts,
> it doesn't seem like a giant pile of stunnels is the best method. Something
> networking level would make more sense I think.
>
> -Jonathan
>
>
> On Mon, Apr 22, 2013 at 12:46 PM, Matt Wise <ma...@nextdoor.com> wrote:
>
> > Unfortunately 'stunneling everything' is not really possible. Stunnel
> acts
> > like a proxy service ... in the sense that the Stunnel client (on your
> log
> > producer, or log consumer) has to be explicitly configured to connect to
> an
> > exact endpoint (ie, kafka1.mydomain.com:1234) -- or multiple endpoints,
> > that are randomly selected by stunnel.
> >
> > In a few cases you can use Stunnel as an SSL offloader for certain
> > protocols, but thats done on the server-side... ie, in front of a
> Postgres
> > server, so that Stunnel can do the encryption rather than Postgres
> itself.
> >
> > It would make a bit of a difference I think if our log producers were the
> > only ones that needed to be able to talk to 'all' of the Kafka nodes. We
> > could do something where we ship logs via an encrypted TCP session to
> some
> > group of Kafka "log funnel" machines, where they can reach the Kafka
> > servers directly and dump the log data. Maybe.
> >
> > I'm still digging around, but I'm really surprised this hasn't been a
> > larger topic of discussion. If Kafka natively allowed a single connection
> > through a single server to reach all of the other servers in the farm, it
> > would be far easier to secure and encrypt the communication.
> ElasticSearch
> > and RabbitMQ are good examples of this model.
> >
> > --Matt
> >
> > On Apr 22, 2013, at 12:21 PM, Scott Clasen <sc...@heroku.com> wrote:
> >
> > > I think you are right, even if you did put an ELB in front of kafka, it
> > > would only be used for getting the initial broker list afaik. Producers
> > and
> > > consumers need to be able to talk to each broker directly, and also
> > > consumers need to be able to talk to zookeeper as well to store
> offsets.
> > >
> > > Probably have to stunnel all the things.  Id be interested in hearing
> how
> > > it works out. IMO this would be a great thing to have in kafka-contrib.
> > >
> > >
> > >
> > > On Mon, Apr 22, 2013 at 11:31 AM, Matt Wise <ma...@nextdoor.com> wrote:
> > >
> > >> Hi there... we're currently looking into using Kafka as a pipeline for
> > >> passing around log messages. We like its use of Zookeeper for
> > coordination
> > >> (as we already make heavy use of Zookeeper at Nextdoor), but I'm
> running
> > >> into one big problem. Everything we do is a) in the cloud, b) secure,
> > and
> > >> c) cross-region/datacenter/cloud-provider.
> > >>
> > >> We make use of SSL for both encryption and authentication of most of
> our
> > >> services. My understanding is that Kafka 0.7.x producers and consumers
> > >> connect to Zookeeper to retrieve a list of the current Kafka servers,
> > and
> > >> then make direct TCP connections to the individual servers that they
> > need
> > >> to to publish or subscribe to a stream. In 0.8.x thats changed, so now
> > >> clients can connect to a single Kafka server and get a list of these
> > >> servers via an API?
> > >>
> > >> What I'm wondering is whether we can actually put an ELB in front of
> > *all*
> > >> of our Kafka servers, throw stunnel on them, and give our producers
> and
> > >> clients a single endpoint to connect to (through the ELB) rather than
> > >> having them connect directly to the individual Kafka servers. This
> would
> > >> provide us both encryption of the data during transport, as well as
> > >> authentication of the producers and subscribers. Lastly, if it works,
> it
> > >> would provide  these features without impacting our ability to use
> > existing
> > >> kafka producer/consumers that people have written.
> > >>
> > >> My concern is that the Kafka clients (producers or consumers?) would
> > >> connect once through the ELB, then get the list of servers via the
> API,
> > and
> > >> finally try to connect directly to one of those Kafka servers rather
> > than
> > >> just leveraging the existing connection through the ELB.
> > >>
> > >> Thoughts?
> > >>
> > >> --Matt
> >
> >
>
>
> --
> **
>
> *Jonathan Creasy* | Sr. Ops Engineer
>
> e: jc@box.com | t: 314.580.8909
>

Re: Using Stunnel to encrypt/authenticate Kafka producers and consumers...

Posted by Jonathan Creasy <jc...@box.com>.
Wouldn't it make more sense to do something like an encrypted tunnel
between your core routers in each facility? LIke IPSEC on a GRE tunnel or
something.

This concept would need adjustment for those in the cloud but when you want
to build an encrypted tunnel between a bunch of hosts and a bunch of hosts,
it doesn't seem like a giant pile of stunnels is the best method. Something
networking level would make more sense I think.

-Jonathan


On Mon, Apr 22, 2013 at 12:46 PM, Matt Wise <ma...@nextdoor.com> wrote:

> Unfortunately 'stunneling everything' is not really possible. Stunnel acts
> like a proxy service ... in the sense that the Stunnel client (on your log
> producer, or log consumer) has to be explicitly configured to connect to an
> exact endpoint (ie, kafka1.mydomain.com:1234) -- or multiple endpoints,
> that are randomly selected by stunnel.
>
> In a few cases you can use Stunnel as an SSL offloader for certain
> protocols, but thats done on the server-side... ie, in front of a Postgres
> server, so that Stunnel can do the encryption rather than Postgres itself.
>
> It would make a bit of a difference I think if our log producers were the
> only ones that needed to be able to talk to 'all' of the Kafka nodes. We
> could do something where we ship logs via an encrypted TCP session to some
> group of Kafka "log funnel" machines, where they can reach the Kafka
> servers directly and dump the log data. Maybe.
>
> I'm still digging around, but I'm really surprised this hasn't been a
> larger topic of discussion. If Kafka natively allowed a single connection
> through a single server to reach all of the other servers in the farm, it
> would be far easier to secure and encrypt the communication. ElasticSearch
> and RabbitMQ are good examples of this model.
>
> --Matt
>
> On Apr 22, 2013, at 12:21 PM, Scott Clasen <sc...@heroku.com> wrote:
>
> > I think you are right, even if you did put an ELB in front of kafka, it
> > would only be used for getting the initial broker list afaik. Producers
> and
> > consumers need to be able to talk to each broker directly, and also
> > consumers need to be able to talk to zookeeper as well to store offsets.
> >
> > Probably have to stunnel all the things.  Id be interested in hearing how
> > it works out. IMO this would be a great thing to have in kafka-contrib.
> >
> >
> >
> > On Mon, Apr 22, 2013 at 11:31 AM, Matt Wise <ma...@nextdoor.com> wrote:
> >
> >> Hi there... we're currently looking into using Kafka as a pipeline for
> >> passing around log messages. We like its use of Zookeeper for
> coordination
> >> (as we already make heavy use of Zookeeper at Nextdoor), but I'm running
> >> into one big problem. Everything we do is a) in the cloud, b) secure,
> and
> >> c) cross-region/datacenter/cloud-provider.
> >>
> >> We make use of SSL for both encryption and authentication of most of our
> >> services. My understanding is that Kafka 0.7.x producers and consumers
> >> connect to Zookeeper to retrieve a list of the current Kafka servers,
> and
> >> then make direct TCP connections to the individual servers that they
> need
> >> to to publish or subscribe to a stream. In 0.8.x thats changed, so now
> >> clients can connect to a single Kafka server and get a list of these
> >> servers via an API?
> >>
> >> What I'm wondering is whether we can actually put an ELB in front of
> *all*
> >> of our Kafka servers, throw stunnel on them, and give our producers and
> >> clients a single endpoint to connect to (through the ELB) rather than
> >> having them connect directly to the individual Kafka servers. This would
> >> provide us both encryption of the data during transport, as well as
> >> authentication of the producers and subscribers. Lastly, if it works, it
> >> would provide  these features without impacting our ability to use
> existing
> >> kafka producer/consumers that people have written.
> >>
> >> My concern is that the Kafka clients (producers or consumers?) would
> >> connect once through the ELB, then get the list of servers via the API,
> and
> >> finally try to connect directly to one of those Kafka servers rather
> than
> >> just leveraging the existing connection through the ELB.
> >>
> >> Thoughts?
> >>
> >> --Matt
>
>


-- 
**

*Jonathan Creasy* | Sr. Ops Engineer

e: jc@box.com | t: 314.580.8909

Re: Using Stunnel to encrypt/authenticate Kafka producers and consumers...

Posted by Matt Wise <ma...@nextdoor.com>.
Unfortunately 'stunneling everything' is not really possible. Stunnel acts like a proxy service ... in the sense that the Stunnel client (on your log producer, or log consumer) has to be explicitly configured to connect to an exact endpoint (ie, kafka1.mydomain.com:1234) -- or multiple endpoints, that are randomly selected by stunnel.

In a few cases you can use Stunnel as an SSL offloader for certain protocols, but thats done on the server-side... ie, in front of a Postgres server, so that Stunnel can do the encryption rather than Postgres itself.

It would make a bit of a difference I think if our log producers were the only ones that needed to be able to talk to 'all' of the Kafka nodes. We could do something where we ship logs via an encrypted TCP session to some group of Kafka "log funnel" machines, where they can reach the Kafka servers directly and dump the log data. Maybe.

I'm still digging around, but I'm really surprised this hasn't been a larger topic of discussion. If Kafka natively allowed a single connection through a single server to reach all of the other servers in the farm, it would be far easier to secure and encrypt the communication. ElasticSearch and RabbitMQ are good examples of this model.

--Matt

On Apr 22, 2013, at 12:21 PM, Scott Clasen <sc...@heroku.com> wrote:

> I think you are right, even if you did put an ELB in front of kafka, it
> would only be used for getting the initial broker list afaik. Producers and
> consumers need to be able to talk to each broker directly, and also
> consumers need to be able to talk to zookeeper as well to store offsets.
> 
> Probably have to stunnel all the things.  Id be interested in hearing how
> it works out. IMO this would be a great thing to have in kafka-contrib.
> 
> 
> 
> On Mon, Apr 22, 2013 at 11:31 AM, Matt Wise <ma...@nextdoor.com> wrote:
> 
>> Hi there... we're currently looking into using Kafka as a pipeline for
>> passing around log messages. We like its use of Zookeeper for coordination
>> (as we already make heavy use of Zookeeper at Nextdoor), but I'm running
>> into one big problem. Everything we do is a) in the cloud, b) secure, and
>> c) cross-region/datacenter/cloud-provider.
>> 
>> We make use of SSL for both encryption and authentication of most of our
>> services. My understanding is that Kafka 0.7.x producers and consumers
>> connect to Zookeeper to retrieve a list of the current Kafka servers, and
>> then make direct TCP connections to the individual servers that they need
>> to to publish or subscribe to a stream. In 0.8.x thats changed, so now
>> clients can connect to a single Kafka server and get a list of these
>> servers via an API?
>> 
>> What I'm wondering is whether we can actually put an ELB in front of *all*
>> of our Kafka servers, throw stunnel on them, and give our producers and
>> clients a single endpoint to connect to (through the ELB) rather than
>> having them connect directly to the individual Kafka servers. This would
>> provide us both encryption of the data during transport, as well as
>> authentication of the producers and subscribers. Lastly, if it works, it
>> would provide  these features without impacting our ability to use existing
>> kafka producer/consumers that people have written.
>> 
>> My concern is that the Kafka clients (producers or consumers?) would
>> connect once through the ELB, then get the list of servers via the API, and
>> finally try to connect directly to one of those Kafka servers rather than
>> just leveraging the existing connection through the ELB.
>> 
>> Thoughts?
>> 
>> --Matt


Re: Using Stunnel to encrypt/authenticate Kafka producers and consumers...

Posted by Scott Clasen <sc...@heroku.com>.
I think you are right, even if you did put an ELB in front of kafka, it
would only be used for getting the initial broker list afaik. Producers and
consumers need to be able to talk to each broker directly, and also
consumers need to be able to talk to zookeeper as well to store offsets.

Probably have to stunnel all the things.  Id be interested in hearing how
it works out. IMO this would be a great thing to have in kafka-contrib.



On Mon, Apr 22, 2013 at 11:31 AM, Matt Wise <ma...@nextdoor.com> wrote:

> Hi there... we're currently looking into using Kafka as a pipeline for
> passing around log messages. We like its use of Zookeeper for coordination
> (as we already make heavy use of Zookeeper at Nextdoor), but I'm running
> into one big problem. Everything we do is a) in the cloud, b) secure, and
> c) cross-region/datacenter/cloud-provider.
>
> We make use of SSL for both encryption and authentication of most of our
> services. My understanding is that Kafka 0.7.x producers and consumers
> connect to Zookeeper to retrieve a list of the current Kafka servers, and
> then make direct TCP connections to the individual servers that they need
> to to publish or subscribe to a stream. In 0.8.x thats changed, so now
> clients can connect to a single Kafka server and get a list of these
> servers via an API?
>
> What I'm wondering is whether we can actually put an ELB in front of *all*
> of our Kafka servers, throw stunnel on them, and give our producers and
> clients a single endpoint to connect to (through the ELB) rather than
> having them connect directly to the individual Kafka servers. This would
> provide us both encryption of the data during transport, as well as
> authentication of the producers and subscribers. Lastly, if it works, it
> would provide  these features without impacting our ability to use existing
> kafka producer/consumers that people have written.
>
> My concern is that the Kafka clients (producers or consumers?) would
> connect once through the ELB, then get the list of servers via the API, and
> finally try to connect directly to one of those Kafka servers rather than
> just leveraging the existing connection through the ELB.
>
> Thoughts?
>
> --Matt