You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Nicolas Motte <li...@gmail.com> on 2017/03/06 13:38:51 UTC

Performance and encryption

Hi everyone,

I understand one of the reasons why Kafka is performant is by using
zero-copy.

I often hear that when encryption is enabled, then Kafka has to copy the
data in user space to decode the message, so it has a big impact on
performance.

If it is true, I don t get why the message has to be decoded by Kafka. I
would assume that whether the message is encrypted or not, Kafka simply
receives it, appends it to the file, and when a consumer wants to read it,
it simply reads at the right offset...

Also I m wondering if it s the case if we don t use keys (pure queuing
system with key=null).

Cheers
Nico

Re: Performance and encryption

Posted by Todd Palino <tp...@gmail.com>.
They are defined at the broker level as a default for all topics that do
not have an override for those configs. Both (and many other configs) can
be overridden for individual topics using the command line tools.

-Todd


On Wed, Mar 8, 2017 at 12:36 PM, Nicolas Motte <li...@gmail.com> wrote:

> Hi everyone, I have another question.
> Is there any reason why retention and cleanup policy are defined at cluster
> level and not topic level?
> I can t see why it would not be possible from a technical point of view...
>
> 2017-03-06 14:38 GMT+01:00 Nicolas Motte <li...@gmail.com>:
>
> > Hi everyone,
> >
> > I understand one of the reasons why Kafka is performant is by using
> > zero-copy.
> >
> > I often hear that when encryption is enabled, then Kafka has to copy the
> > data in user space to decode the message, so it has a big impact on
> > performance.
> >
> > If it is true, I don t get why the message has to be decoded by Kafka. I
> > would assume that whether the message is encrypted or not, Kafka simply
> > receives it, appends it to the file, and when a consumer wants to read
> it,
> > it simply reads at the right offset...
> >
> > Also I m wondering if it s the case if we don t use keys (pure queuing
> > system with key=null).
> >
> > Cheers
> > Nico
> >
> >
>



-- 
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming



linkedin.com/in/toddpalino

Re: Performance and encryption

Posted by Stephane Maarek <st...@simplemachines.com.au>.
I believe these are defaults you can set at the broker level so that if the
topic doesn’t have that setting set, it will inherit those
But you can definitely override your topic configuration at the topic level

On 9 March 2017 at 7:42:14 am, Nicolas Motte (lingusinpg@gmail.com) wrote:

Hi everyone, I have another question.
Is there any reason why retention and cleanup policy are defined at cluster
level and not topic level?
I can t see why it would not be possible from a technical point of view...

2017-03-06 14:38 GMT+01:00 Nicolas Motte <li...@gmail.com>:

> Hi everyone,
>
> I understand one of the reasons why Kafka is performant is by using
> zero-copy.
>
> I often hear that when encryption is enabled, then Kafka has to copy the
> data in user space to decode the message, so it has a big impact on
> performance.
>
> If it is true, I don t get why the message has to be decoded by Kafka. I
> would assume that whether the message is encrypted or not, Kafka simply
> receives it, appends it to the file, and when a consumer wants to read
it,
> it simply reads at the right offset...
>
> Also I m wondering if it s the case if we don t use keys (pure queuing
> system with key=null).
>
> Cheers
> Nico
>
>

Re: Performance and encryption

Posted by Nicolas Motte <li...@gmail.com>.
Hi everyone, I have another question.
Is there any reason why retention and cleanup policy are defined at cluster
level and not topic level?
I can t see why it would not be possible from a technical point of view...

2017-03-06 14:38 GMT+01:00 Nicolas Motte <li...@gmail.com>:

> Hi everyone,
>
> I understand one of the reasons why Kafka is performant is by using
> zero-copy.
>
> I often hear that when encryption is enabled, then Kafka has to copy the
> data in user space to decode the message, so it has a big impact on
> performance.
>
> If it is true, I don t get why the message has to be decoded by Kafka. I
> would assume that whether the message is encrypted or not, Kafka simply
> receives it, appends it to the file, and when a consumer wants to read it,
> it simply reads at the right offset...
>
> Also I m wondering if it s the case if we don t use keys (pure queuing
> system with key=null).
>
> Cheers
> Nico
>
>

Re: Performance and encryption

Posted by IT Consultant <0b...@gmail.com>.
Hi Todd

Can you please help me with notes or document on how did you achieve
encryption ?

I have followed data available on official sites but failed as I m no good
with TLS .

On Mar 6, 2017 19:55, "Todd Palino" <tp...@gmail.com> wrote:

> It’s not that Kafka has to decode it, it’s that it has to send it across
> the network. This is specific to enabling TLS support (transport
> encryption), and won’t affect any end-to-end encryption you do at the
> client level.
>
> The operation in question is called “zero copy”. In order to send a message
> batch to a consumer, the Kafka broker must read it from disk (sometimes
> it’s cached in memory, but that’s irrelevant here) and send it across the
> network. The Linux kernel allows this to happen without having to copy the
> data in memory (to move it from the disk buffers to the network buffers).
> However, if TLS is enabled, the broker must first encrypt the data going
> across the network. This means that it can no longer take advantage of the
> zero copy optimization as it has to make a copy in the process of applying
> the TLS encryption.
>
> Now, how much of an impact this has on the broker operations is up for
> debate, I think. Originally, when we ran into this problem was when TLS
> support was added to Kafka and the zero copy send for plaintext
> communications was accidentally removed as well. At the time, we saw a
> significant performance hit, and the code was patched to put it back.
> However, since then I’ve turned on inter-broker TLS in all of our clusters,
> and when we did that there was no performance hit. This is odd, because the
> replica fetchers should take advantage of the same zero copy optimization.
>
> It’s possible that it’s because it’s just one consumer (the replica
> fetchers). We’re about to start testing additional consumers over TLS, so
> we’ll see what happens at that point. All I can suggest right now is that
> you test in your environment and see what the impact is. Oh, and using
> message keys (or not) won’t matter here.
>
> -Todd
>
>
> On Mon, Mar 6, 2017 at 5:38 AM, Nicolas Motte <li...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > I understand one of the reasons why Kafka is performant is by using
> > zero-copy.
> >
> > I often hear that when encryption is enabled, then Kafka has to copy the
> > data in user space to decode the message, so it has a big impact on
> > performance.
> >
> > If it is true, I don t get why the message has to be decoded by Kafka. I
> > would assume that whether the message is encrypted or not, Kafka simply
> > receives it, appends it to the file, and when a consumer wants to read
> it,
> > it simply reads at the right offset...
> >
> > Also I m wondering if it s the case if we don t use keys (pure queuing
> > system with key=null).
> >
> > Cheers
> > Nico
> >
>
>
>
> --
> *Todd Palino*
> Staff Site Reliability Engineer
> Data Infrastructure Streaming
>
>
>
> linkedin.com/in/toddpalino
>

Re: Performance and encryption

Posted by Todd Palino <tp...@gmail.com>.
It’s not that Kafka has to decode it, it’s that it has to send it across
the network. This is specific to enabling TLS support (transport
encryption), and won’t affect any end-to-end encryption you do at the
client level.

The operation in question is called “zero copy”. In order to send a message
batch to a consumer, the Kafka broker must read it from disk (sometimes
it’s cached in memory, but that’s irrelevant here) and send it across the
network. The Linux kernel allows this to happen without having to copy the
data in memory (to move it from the disk buffers to the network buffers).
However, if TLS is enabled, the broker must first encrypt the data going
across the network. This means that it can no longer take advantage of the
zero copy optimization as it has to make a copy in the process of applying
the TLS encryption.

Now, how much of an impact this has on the broker operations is up for
debate, I think. Originally, when we ran into this problem was when TLS
support was added to Kafka and the zero copy send for plaintext
communications was accidentally removed as well. At the time, we saw a
significant performance hit, and the code was patched to put it back.
However, since then I’ve turned on inter-broker TLS in all of our clusters,
and when we did that there was no performance hit. This is odd, because the
replica fetchers should take advantage of the same zero copy optimization.

It’s possible that it’s because it’s just one consumer (the replica
fetchers). We’re about to start testing additional consumers over TLS, so
we’ll see what happens at that point. All I can suggest right now is that
you test in your environment and see what the impact is. Oh, and using
message keys (or not) won’t matter here.

-Todd


On Mon, Mar 6, 2017 at 5:38 AM, Nicolas Motte <li...@gmail.com> wrote:

> Hi everyone,
>
> I understand one of the reasons why Kafka is performant is by using
> zero-copy.
>
> I often hear that when encryption is enabled, then Kafka has to copy the
> data in user space to decode the message, so it has a big impact on
> performance.
>
> If it is true, I don t get why the message has to be decoded by Kafka. I
> would assume that whether the message is encrypted or not, Kafka simply
> receives it, appends it to the file, and when a consumer wants to read it,
> it simply reads at the right offset...
>
> Also I m wondering if it s the case if we don t use keys (pure queuing
> system with key=null).
>
> Cheers
> Nico
>



-- 
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming



linkedin.com/in/toddpalino

Re: Performance and encryption

Posted by Ismael Juma <is...@juma.me.uk>.
Hi Todd,

I agree that KAFKA-2561 would be good to have for the reasons you state.

Ismael

On Mon, Mar 6, 2017 at 5:17 PM, Todd Palino <tp...@gmail.com> wrote:

> Thanks for the link, Ismael. I had thought that the most recent kernels
> already implemented this, but I was probably confusing it with BSD. Most of
> my systems are stuck in the stone age right now anyway.
>
> It would be nice to get KAFKA-2561 in, either way. First off, if you can
> take advantage of it it’s a good performance boost. Second, especially with
> the security landscape getting worse and worse, it would be good to have
> options as far as the TLS implementation goes. A zero-day exploit in the
> Java TLS implementation would be devastating, and more difficult to react
> to as it would require a new JRE (bringing with it who knows what
> problems). Swapping an underlying OpenSSL version would be much more
> palatable.
>
> -Todd
>
>
> On Mon, Mar 6, 2017 at 9:01 AM, Ismael Juma <is...@juma.me.uk> wrote:
>
> > Even though OpenSSL is much faster than the Java 8 TLS implementation (I
> > haven't tested against Java 9, which is much faster than Java 8, but
> > probably still slower than OpenSSL), all the tests were without zero copy
> > in the sense that is being discussed here (i.e. sendfile). To benefit
> from
> > sendfile with TLS, kernel-level changes/modules are required:
> >
> > https://github.com/ktls/af_ktls
> > http://www.phoronix.com/scan.php?page=news_item&px=FreeBSD-
> Faster-Sendfile
> >
> > Ismael
> >
> > On Mon, Mar 6, 2017 at 4:18 PM, Todd Palino <tp...@gmail.com> wrote:
> >
> > > So that’s not quite true, Hans. First, as far as the performance hit
> > being
> > > not a big impact (25% is huge). Or that it’s to be expected. Part of
> the
> > > problem is that the Java TLS implementation does not support zero copy.
> > > OpenSSL does, and in fact there’s been a ticket open to allow Kafka to
> > > support using OpenSSL for a while now:
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-2561
> > >
> > >
> > >
> > >
> > > On Mon, Mar 6, 2017 at 6:30 AM, Hans Jespersen <ha...@confluent.io>
> > wrote:
> > >
> > > >
> > > > Its not a single message at a time that is encrypted with TLS its the
> > > > entire network byte stream so a Kafka broker can’t even see the Kafka
> > > > Protocol tunneled inside TLS unless it’s terminated at the broker.
> > > > It is true that losing the zero copy optimization impacts performance
> > > > somewhat  but it’s not what I would call a “big impact” because Kafka
> > > does
> > > > a lot of other things to get it’s performance (like using page cache
> > and
> > > > doing lots on sequential disk I/O). The difference should be
> something
> > in
> > > > the order of 25-30% slower with TLS enabled which is about what you
> > would
> > > > see with any other messaging protocol with TLS on vs off.
> > > >
> > > > If you wanted to encrypt each message independently before sending to
> > > > Kafka then zero copy would still be in effect and all the consumers
> > would
> > > > get the same encrypted message (and have to understand how to decrypt
> > > it).
> > > >
> > > > -hans
> > > >
> > > >
> > > >
> > > > > On Mar 6, 2017, at 5:38 AM, Nicolas Motte <li...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > I understand one of the reasons why Kafka is performant is by using
> > > > > zero-copy.
> > > > >
> > > > > I often hear that when encryption is enabled, then Kafka has to
> copy
> > > the
> > > > > data in user space to decode the message, so it has a big impact on
> > > > > performance.
> > > > >
> > > > > If it is true, I don t get why the message has to be decoded by
> > Kafka.
> > > I
> > > > > would assume that whether the message is encrypted or not, Kafka
> > simply
> > > > > receives it, appends it to the file, and when a consumer wants to
> > read
> > > > it,
> > > > > it simply reads at the right offset...
> > > > >
> > > > > Also I m wondering if it s the case if we don t use keys (pure
> > queuing
> > > > > system with key=null).
> > > > >
> > > > > Cheers
> > > > > Nico
> > > >
> > > >
> > >
> > >
> > > --
> > > *Todd Palino*
> > > Staff Site Reliability Engineer
> > > Data Infrastructure Streaming
> > >
> > >
> > >
> > > linkedin.com/in/toddpalino
> > >
> >
>
>
>
> --
> *Todd Palino*
> Staff Site Reliability Engineer
> Data Infrastructure Streaming
>
>
>
> linkedin.com/in/toddpalino
>

Re: Performance and encryption

Posted by Todd Palino <tp...@gmail.com>.
Thanks for the link, Ismael. I had thought that the most recent kernels
already implemented this, but I was probably confusing it with BSD. Most of
my systems are stuck in the stone age right now anyway.

It would be nice to get KAFKA-2561 in, either way. First off, if you can
take advantage of it it’s a good performance boost. Second, especially with
the security landscape getting worse and worse, it would be good to have
options as far as the TLS implementation goes. A zero-day exploit in the
Java TLS implementation would be devastating, and more difficult to react
to as it would require a new JRE (bringing with it who knows what
problems). Swapping an underlying OpenSSL version would be much more
palatable.

-Todd


On Mon, Mar 6, 2017 at 9:01 AM, Ismael Juma <is...@juma.me.uk> wrote:

> Even though OpenSSL is much faster than the Java 8 TLS implementation (I
> haven't tested against Java 9, which is much faster than Java 8, but
> probably still slower than OpenSSL), all the tests were without zero copy
> in the sense that is being discussed here (i.e. sendfile). To benefit from
> sendfile with TLS, kernel-level changes/modules are required:
>
> https://github.com/ktls/af_ktls
> http://www.phoronix.com/scan.php?page=news_item&px=FreeBSD-Faster-Sendfile
>
> Ismael
>
> On Mon, Mar 6, 2017 at 4:18 PM, Todd Palino <tp...@gmail.com> wrote:
>
> > So that’s not quite true, Hans. First, as far as the performance hit
> being
> > not a big impact (25% is huge). Or that it’s to be expected. Part of the
> > problem is that the Java TLS implementation does not support zero copy.
> > OpenSSL does, and in fact there’s been a ticket open to allow Kafka to
> > support using OpenSSL for a while now:
> >
> > https://issues.apache.org/jira/browse/KAFKA-2561
> >
> >
> >
> >
> > On Mon, Mar 6, 2017 at 6:30 AM, Hans Jespersen <ha...@confluent.io>
> wrote:
> >
> > >
> > > Its not a single message at a time that is encrypted with TLS its the
> > > entire network byte stream so a Kafka broker can’t even see the Kafka
> > > Protocol tunneled inside TLS unless it’s terminated at the broker.
> > > It is true that losing the zero copy optimization impacts performance
> > > somewhat  but it’s not what I would call a “big impact” because Kafka
> > does
> > > a lot of other things to get it’s performance (like using page cache
> and
> > > doing lots on sequential disk I/O). The difference should be something
> in
> > > the order of 25-30% slower with TLS enabled which is about what you
> would
> > > see with any other messaging protocol with TLS on vs off.
> > >
> > > If you wanted to encrypt each message independently before sending to
> > > Kafka then zero copy would still be in effect and all the consumers
> would
> > > get the same encrypted message (and have to understand how to decrypt
> > it).
> > >
> > > -hans
> > >
> > >
> > >
> > > > On Mar 6, 2017, at 5:38 AM, Nicolas Motte <li...@gmail.com>
> > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I understand one of the reasons why Kafka is performant is by using
> > > > zero-copy.
> > > >
> > > > I often hear that when encryption is enabled, then Kafka has to copy
> > the
> > > > data in user space to decode the message, so it has a big impact on
> > > > performance.
> > > >
> > > > If it is true, I don t get why the message has to be decoded by
> Kafka.
> > I
> > > > would assume that whether the message is encrypted or not, Kafka
> simply
> > > > receives it, appends it to the file, and when a consumer wants to
> read
> > > it,
> > > > it simply reads at the right offset...
> > > >
> > > > Also I m wondering if it s the case if we don t use keys (pure
> queuing
> > > > system with key=null).
> > > >
> > > > Cheers
> > > > Nico
> > >
> > >
> >
> >
> > --
> > *Todd Palino*
> > Staff Site Reliability Engineer
> > Data Infrastructure Streaming
> >
> >
> >
> > linkedin.com/in/toddpalino
> >
>



-- 
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming



linkedin.com/in/toddpalino

Re: Performance and encryption

Posted by Ismael Juma <is...@juma.me.uk>.
Even though OpenSSL is much faster than the Java 8 TLS implementation (I
haven't tested against Java 9, which is much faster than Java 8, but
probably still slower than OpenSSL), all the tests were without zero copy
in the sense that is being discussed here (i.e. sendfile). To benefit from
sendfile with TLS, kernel-level changes/modules are required:

https://github.com/ktls/af_ktls
http://www.phoronix.com/scan.php?page=news_item&px=FreeBSD-Faster-Sendfile

Ismael

On Mon, Mar 6, 2017 at 4:18 PM, Todd Palino <tp...@gmail.com> wrote:

> So that’s not quite true, Hans. First, as far as the performance hit being
> not a big impact (25% is huge). Or that it’s to be expected. Part of the
> problem is that the Java TLS implementation does not support zero copy.
> OpenSSL does, and in fact there’s been a ticket open to allow Kafka to
> support using OpenSSL for a while now:
>
> https://issues.apache.org/jira/browse/KAFKA-2561
>
>
>
>
> On Mon, Mar 6, 2017 at 6:30 AM, Hans Jespersen <ha...@confluent.io> wrote:
>
> >
> > Its not a single message at a time that is encrypted with TLS its the
> > entire network byte stream so a Kafka broker can’t even see the Kafka
> > Protocol tunneled inside TLS unless it’s terminated at the broker.
> > It is true that losing the zero copy optimization impacts performance
> > somewhat  but it’s not what I would call a “big impact” because Kafka
> does
> > a lot of other things to get it’s performance (like using page cache and
> > doing lots on sequential disk I/O). The difference should be something in
> > the order of 25-30% slower with TLS enabled which is about what you would
> > see with any other messaging protocol with TLS on vs off.
> >
> > If you wanted to encrypt each message independently before sending to
> > Kafka then zero copy would still be in effect and all the consumers would
> > get the same encrypted message (and have to understand how to decrypt
> it).
> >
> > -hans
> >
> >
> >
> > > On Mar 6, 2017, at 5:38 AM, Nicolas Motte <li...@gmail.com>
> wrote:
> > >
> > > Hi everyone,
> > >
> > > I understand one of the reasons why Kafka is performant is by using
> > > zero-copy.
> > >
> > > I often hear that when encryption is enabled, then Kafka has to copy
> the
> > > data in user space to decode the message, so it has a big impact on
> > > performance.
> > >
> > > If it is true, I don t get why the message has to be decoded by Kafka.
> I
> > > would assume that whether the message is encrypted or not, Kafka simply
> > > receives it, appends it to the file, and when a consumer wants to read
> > it,
> > > it simply reads at the right offset...
> > >
> > > Also I m wondering if it s the case if we don t use keys (pure queuing
> > > system with key=null).
> > >
> > > Cheers
> > > Nico
> >
> >
>
>
> --
> *Todd Palino*
> Staff Site Reliability Engineer
> Data Infrastructure Streaming
>
>
>
> linkedin.com/in/toddpalino
>

Re: Performance and encryption

Posted by Todd Palino <tp...@gmail.com>.
So that’s not quite true, Hans. First, as far as the performance hit being
not a big impact (25% is huge). Or that it’s to be expected. Part of the
problem is that the Java TLS implementation does not support zero copy.
OpenSSL does, and in fact there’s been a ticket open to allow Kafka to
support using OpenSSL for a while now:

https://issues.apache.org/jira/browse/KAFKA-2561




On Mon, Mar 6, 2017 at 6:30 AM, Hans Jespersen <ha...@confluent.io> wrote:

>
> Its not a single message at a time that is encrypted with TLS its the
> entire network byte stream so a Kafka broker can’t even see the Kafka
> Protocol tunneled inside TLS unless it’s terminated at the broker.
> It is true that losing the zero copy optimization impacts performance
> somewhat  but it’s not what I would call a “big impact” because Kafka does
> a lot of other things to get it’s performance (like using page cache and
> doing lots on sequential disk I/O). The difference should be something in
> the order of 25-30% slower with TLS enabled which is about what you would
> see with any other messaging protocol with TLS on vs off.
>
> If you wanted to encrypt each message independently before sending to
> Kafka then zero copy would still be in effect and all the consumers would
> get the same encrypted message (and have to understand how to decrypt it).
>
> -hans
>
>
>
> > On Mar 6, 2017, at 5:38 AM, Nicolas Motte <li...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I understand one of the reasons why Kafka is performant is by using
> > zero-copy.
> >
> > I often hear that when encryption is enabled, then Kafka has to copy the
> > data in user space to decode the message, so it has a big impact on
> > performance.
> >
> > If it is true, I don t get why the message has to be decoded by Kafka. I
> > would assume that whether the message is encrypted or not, Kafka simply
> > receives it, appends it to the file, and when a consumer wants to read
> it,
> > it simply reads at the right offset...
> >
> > Also I m wondering if it s the case if we don t use keys (pure queuing
> > system with key=null).
> >
> > Cheers
> > Nico
>
>


-- 
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming



linkedin.com/in/toddpalino

Re: Performance and encryption

Posted by Hans Jespersen <ha...@confluent.io>.
Its not a single message at a time that is encrypted with TLS its the entire network byte stream so a Kafka broker can’t even see the Kafka Protocol tunneled inside TLS unless it’s terminated at the broker.  
It is true that losing the zero copy optimization impacts performance somewhat  but it’s not what I would call a “big impact” because Kafka does a lot of other things to get it’s performance (like using page cache and doing lots on sequential disk I/O). The difference should be something in the order of 25-30% slower with TLS enabled which is about what you would see with any other messaging protocol with TLS on vs off.

If you wanted to encrypt each message independently before sending to Kafka then zero copy would still be in effect and all the consumers would get the same encrypted message (and have to understand how to decrypt it).

-hans



> On Mar 6, 2017, at 5:38 AM, Nicolas Motte <li...@gmail.com> wrote:
> 
> Hi everyone,
> 
> I understand one of the reasons why Kafka is performant is by using
> zero-copy.
> 
> I often hear that when encryption is enabled, then Kafka has to copy the
> data in user space to decode the message, so it has a big impact on
> performance.
> 
> If it is true, I don t get why the message has to be decoded by Kafka. I
> would assume that whether the message is encrypted or not, Kafka simply
> receives it, appends it to the file, and when a consumer wants to read it,
> it simply reads at the right offset...
> 
> Also I m wondering if it s the case if we don t use keys (pure queuing
> system with key=null).
> 
> Cheers
> Nico