You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by eugene miretsky <eu...@gmail.com> on 2015/07/30 21:16:06 UTC

Gauging Interest in adding Encryption to Kafka

Hi,

Based on the security wiki page
<https://cwiki.apache.org/confluence/display/KAFKA/Security> encryption of
data at rest is out of scope for the time being. However, we are
 implementing  encryption in Kafka and would like to see if there is
interest in submitting a patch got it.

I suppose that one way to implement  encryption would be to add an
'encrypted key' field to the Message/MessageSet  structures in the
wire protocole - however, this is a very big and fundamental change.

A simpler way to add encryption support would be:
1) Custom Serializer, but it wouldn't be compatible with other  custom
serializers (Avro, etc. )
2)  Add a step in KafkaProducer after serialization to encrypt the data
before it's being submitted to the accumulator (encryption is done in the
submitting thread, not in the producer io thread)

Is there interest in adding #2 to Kafka?

Cheers,
Eugene

Re: Gauging Interest in adding Encryption to Kafka

Posted by Don Bosco Durai <bo...@apache.org>.
>The client-broker protocol would have to be augmented to carry the
>encrypted encryption key, plus logic to handle redistribution to existing
>clients due to key rotation.
This is a good point. HDFS has the encryption zone concept, which could be
akin to a topic. The keys in HDFS are per file level, not sure what would
be good compromise here for the granularity. For simplicity, it could be
at the topic level itself, but the master key is never given to the
client. 

Regardless, internal Kafka management of messages, batching, replication,
compression, compaction, performance, etc might be some of the key
deciding factors.


On Mon, Aug 3, 2015 at 10:22 AM, Gwen Shapira <gw...@confluent.io> wrote:
>If I understand you correctly, you are saying that the kerberos keytab
>that
>the broker uses to authenticate with the KMS will be somewhere on the
>broker node and can be used by a malicious admin.
>
Yes. If broker is doing encryption/decryption, then you need to bootstrap
the broker(s) with the encryption key. The key could be in local disk as
java keystore or remote in KMS. If it is in remote KMS, then you will need
to authenticate with KMS via kerberos or other authentication scheme.
Regardless, the Kafka admin having shell login access as the linux user
used by Kafka broker process (assuming kafka) will have access to the key
and would be able to encrypt/decrypt the stored data.

This might seem not like a big deal, but in some enterprises ³separation
of duties² or ³privilege user access management (PIM)² are critical
compliance requirements.

More importantly, if the goal is just to store data in encrypted form in
the disk, then honestly, you just need to encrypt the Kafka data volume
using LUKS and with restricted file permissions. This will take care of
issues like disk being stolen, etc. You don¹t need to do any changes to
Kafka :-)

Thanks

Bosco







On 8/3/15, 12:28 PM, "Alejandro Abdelnur" <tu...@gmail.com> wrote:

>Doing encryption on the client has the following benefits (most of them
>already mentioned in the thread):
>
>* brokers don't have additional CPU load
>* brokers never see the data in unencrypted form (Kafka admins cannot
>snoop)
>* secure multi-tenancy (keys are 100% on the client space)
>* no need to secure Kafka wire transport, client-broker and broker-broker
>(data is already encrypted)
>
>It would be highly desirable, even if encryption is done on the client
>side, that encryption is 'transparent'. Similar to how HDFS encryption
>works, it is not the client writing/reading a topic the one that decides
>to
>encrypt/decrypt but the broker is the one telling that to the client and
>providing encrypted encryption keys for the tasks.The client-broker
>protocol would have to be augmented to carry the encrypted encryption key,
>plus logic to handle redistribution to existing clients due to key
>rotation.
>
>A nice thing about doing broker side encryption though is that you can
>shut
>off clients at any time and they won't see unencrypted data anymore. But
>this means the brokers will have to deal with the client ACLs for
>encryption (i'd rather leave that outside of Kafka and being a concern of
>the KMS system). You could achieve similar functionality on client side
>encryption, by removing the client from the ACLs in the KMS and doing a
>key
>rotation, then the client will not be able to decrypt any messages using
>the new key (though all previous ones using the key that the client
>already
>has will be visible to the client).
>
>
>Thanks.
>
>
>On Mon, Aug 3, 2015 at 10:22 AM, Gwen Shapira <gw...@confluent.io> wrote:
>
>> If I understand you correctly, you are saying that the kerberos keytab
>>that
>> the broker uses to authenticate with the KMS will be somewhere on the
>> broker node and can be used by a malicious admin.
>>
>> I agree this is a valid concern.
>> I am not opposed to client-side encryption, I am more concerned that the
>> modifications this will require in Kafka broker implementation make the
>> idea impractical. And obviously, as in any security discussion - there
>>are
>> lots of details regarding key exchange, management and protection that
>>are
>> critical.
>>
>> Perhaps given a design doc, we can better evaluate the proposed
>>tradeoffs.
>>
>> Gwen
>>
>>
>>
>> On Sat, Aug 1, 2015 at 10:10 AM, Don Bosco Durai <bo...@apache.org>
>>wrote:
>>
>> > >Any reason you think its better to let the clients handle it?
>> > Gwen, I agree with Todd, depending on the goal, the requirements might
>> > vary. If the goal is that someone stills the disk, then they should be
>> > able to access the data, then encrypting at Broker is enough.
>>However, if
>> > the requirement is that the admin/operator should not be able to
>>access
>> > the data, then client side is the only option.
>> >
>> > Hadoop/HDFS transparent data encryption has a similar philosophy,
>>where
>> > the actual encryption/decryption happens at the client side.
>> >
>> > >1. Key management
>> > Hadoop common has a KMS. And there are industry standards like KMIP.
>>If
>> > Broker does the encrypt/decrypt, then the solution is much easier. If
>>the
>> > client does it, then sharing the key would be a challenge. It might be
>> > even necessary to use asymmetric encryption to limit sharing of the
>>keys.
>> >
>> > Bosco
>> >
>> >
>> >
>> >
>> > On 7/31/15, 9:31 PM, "Jiangjie Qin" <jq...@linkedin.com.INVALID> wrote:
>> >
>> > >I agree with Todd, the major concern I have is still the complexity
>>on
>> > >broker which can kill the performance - which a key advantage of
>>Kafka.
>> I
>> > >think there are two separate issues here:
>> > >1. Key management
>> > >2. the actual encryption/decryption work.
>> > >
>> > >Personally I think it might be OK to have [1] supported in Kafka
>>given
>> we
>> > >might need to be compatible with different key management system
>>anyway.
>> > >But we should just make Kafka compatible with other key management
>> systems
>> > >instead of letting Kafka itself to manage the keys. For [2], I think
>>we
>> > >should keep it on the client side.
>> > >
>> > >Jiangjie (Becket) Qin
>> > >
>> > >On Fri, Jul 31, 2015 at 5:06 PM, Todd Palino <tp...@gmail.com>
>>wrote:
>> > >
>> > >> 1 - Yes, authorization combined with encryption does get us most of
>> the
>> > >>way
>> > >> there. However, depending on the auditor it might not be good
>>enough.
>> > >>The
>> > >> problem is that if you are encrypting at the broker, then by
>> definition
>> > >> anyone who has access to the broker (i.e. operations staff) have
>> access
>> > >>to
>> > >> the data. Consider the case where you are passing salary and other
>> > >> information through the system, and those people do not need a
>>view of
>> > >>it.
>> > >> I admit, the 90% solution might be better here than going for a
>> perfect
>> > >> solution, but it is something to think about.
>> > >>
>> > >> 2 - My worry is people wanting to integrate with different key
>> systems.
>> > >>For
>> > >> example, one person may be fine with providing it in a config file,
>> > >>while
>> > >> someone else may want to use the solution from vendor A, someone
>>else
>> > >>wants
>> > >> vendor B, and yet another person wants this obscure hardware-based
>> > >>solution
>> > >> that exists elsewhere.
>> > >>
>> > >> The compaction concern is definitely a good one I hadn't thought
>>of.
>> I'm
>> > >> wondering if it's reasonable to just say that compaction will not
>>work
>> > >> properly with encrypted keys if you do not have consistent
>>encryption
>> > >>(that
>> > >> is, the same string encrypts to the same string every time).
>> > >>
>> > >> Ultimately I don't like the idea of the broker doing any
>> encrypt/decrypt
>> > >> steps OR compression/decompression. This is all CPU overhead that
>> you're
>> > >> concentrating in one place instead of distributing the load out to
>>the
>> > >> clients. Now yes, I know that the broker decompresses to check the
>>CRC
>> > >>and
>> > >> assign offsets and then compresses, and we can potentially avoid
>>the
>> > >> compression step with assigning the batch an offset and a count
>> instead
>> > >>but
>> > >> we still need to consider the CRC. Adding encrypt/decrypt steps
>>adds
>> > >>even
>> > >> more overhead and it's going to get very difficult to handle even 2
>> > >>Gbits
>> > >> worth of traffic at that rate.
>> > >>
>> > >> There are other situations that concern me, such as revocation of
>> keys,
>> > >>and
>> > >> I'm not sure whether it is better with client-based or server-based
>> > >> encryption. For example, if I want to revoke a key with
>>client-based
>> > >> encryption it becomes similar to how we handle Avro schemas
>> (internally)
>> > >> now - you change keys, and depending on what your desire is you
>>either
>> > >> expire out the data for some period of time with the older keys, or
>> you
>> > >> just let it sit there and your consuming clients won't have an
>>issue.
>> > >>With
>> > >> broker-based encryption, the broker has to work with the multiple
>>keys
>> > >> per-topic.
>> > >>
>> > >> -Todd
>> > >>
>> > >>
>> > >> On Fri, Jul 31, 2015 at 2:38 PM, Gwen Shapira
>><gs...@cloudera.com>
>> > >> wrote:
>> > >>
>> > >> > Good points :)
>> > >> >
>> > >> > 1) Kafka already (pending commit) has an authorization layer, so
>> > >> > theoretically we are good for SOX, HIPAA, PCI, etc. Transparent
>> broker
>> > >> > encryption will support PCI
>> > >> > never-let-unencrypted-card-number-hit-disk.
>> > >> >
>> > >> > 2) Agree on Key Management being complete PITA. It may better to
>> > >> > centralize this pain in the broker rather than distributing it to
>> > >> > clients. Any reason you think its better to let the clients
>>handle
>> it?
>> > >> > The way I see it, we'll need to handle key management the way we
>>did
>> > >> > authorization - give an API for interfacing with existing
>>systems.
>> > >> >
>> > >> > More important, we need the broker to be able to decrypt and
>>encrypt
>> > >> > in order to support compaction (unless we can find a cool
>> > >> > key-uniqueness-preserving encryption algorithm, but this may not
>>be
>> as
>> > >> > secure). I think we also need the broker to be able to
>>re-compress
>> > >> > data, and since we always encrypt compressed bits (compressing
>> > >> > encrypted bits doesn't compress), we need the broker to decrypt
>> before
>> > >> > re-compressing.
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Fri, Jul 31, 2015 at 2:27 PM, Todd Palino <tp...@gmail.com>
>> > >>wrote:
>> > >> > > It does limit it to clients that have an implementation for
>> > >>encryption,
>> > >> > > however encryption on the client side is better from an
>>auditing
>> > >>point
>> > >> of
>> > >> > > view (whether that is SOX, HIPAA, PCI, or something else).
>>Most of
>> > >> those
>> > >> > > types of standards are based around allowing visibility of
>>data to
>> > >>just
>> > >> > the
>> > >> > > people who need it. That includes the admins of the system (who
>> are
>> > >> often
>> > >> > > not the people who use the data).
>> > >> > >
>> > >> > > Additionally, key management is a royal pain, and there are
>>lots
>> of
>> > >> > > different types of systems that one may want to use. This is a
>> > >>pretty
>> > >> big
>> > >> > > complication for the brokers.
>> > >> > >
>> > >> > > -Todd
>> > >> > >
>> > >> > >
>> > >> > > On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira
>> > >><gs...@cloudera.com>
>> > >> > wrote:
>> > >> > >
>> > >> > >> I've seen interest in HDFS-like "encryption zones" in Kafka.
>> > >> > >>
>> > >> > >> This has the advantage of magically encrypting data at rest
>> > >>regardless
>> > >> > >> of which client is used as a producer.
>> > >> > >> Adding it on the client side limits the feature to the java
>> client.
>> > >> > >>
>> > >> > >> Gwen
>> > >> > >>
>> > >> > >> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
>> > >> > >> <eu...@gmail.com> wrote:
>> > >> > >> > I think that Hadoop and Cassandra do [1] (Transparent
>> Encryption)
>> > >> > >> >
>> > >> > >> > We're doing [2] (on a side note, for [2] you still need
>> > >> > authentication on
>> > >> > >> > the producer side - you don't want an unauthorized user
>>writing
>> > >> > garbage).
>> > >> > >> > Right now we have the 'user' doing the  encryption and
>> submitting
>> > >> raw
>> > >> > >> bytes
>> > >> > >> > to the producer. I was suggesting implementing an encryptor
>>in
>> > >>the
>> > >> > >> > producer itself - I think it's cleaner and can be reused by
>> other
>> > >> > users
>> > >> > >> > (instead of having to do their own encryption)
>> > >> > >> >
>> > >> > >> > Cheers,
>> > >> > >> > Eugene
>> > >> > >> >
>> > >> > >> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin
>> > >> > <jqin@linkedin.com.invalid
>> > >> > >> >
>> > >> > >> > wrote:
>> > >> > >> >
>> > >> > >> >> I think the goal here is to make the actual message stored
>>on
>> > >> broker
>> > >> > to
>> > >> > >> be
>> > >> > >> >> encrypted, because after we have SSL, the transmission
>>would
>> be
>> > >> > >> encrypted.
>> > >> > >> >>
>> > >> > >> >> In general there might be tow approaches:
>> > >> > >> >> 1. Broker do the encryption/decryption
>> > >> > >> >> 2. Client do the encryption/decryption
>> > >> > >> >>
>> > >> > >> >> From performance point of view, I would prefer [2]. It is
>>just
>> > >>in
>> > >> > that
>> > >> > >> >> case, maybe user does not necessarily need to use SSL
>>anymore
>> > >> because
>> > >> > >> the
>> > >> > >> >> data would be encrypted anyway.
>> > >> > >> >>
>> > >> > >> >> If we let client do the encryption, there are also two
>>ways to
>> > >>do
>> > >> so
>> > >> > -
>> > >> > >> >> either we let producer take an encryptor or users can do
>> > >> > >> >> serialization/encryption outside the producer and send raw
>> > >>bytes.
>> > >> The
>> > >> > >> only
>> > >> > >> >> difference between the two might be flexibility. For
>>example,
>> if
>> > >> > someone
>> > >> > >> >> wants to know the actual bytes of a message that got sent
>>over
>> > >>the
>> > >> > wire,
>> > >> > >> >> doing it outside the producer would probably more
>>preferable.
>> > >> > >> >>
>> > >> > >> >> Jiangjie (Becket) Qin
>> > >> > >> >>
>> > >> > >> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
>> > >> > >> >> eugene.miretsky@gmail.com
>> > >> > >> >> > wrote:
>> > >> > >> >>
>> > >> > >> >> > Hi,
>> > >> > >> >> >
>> > >> > >> >> > Based on the security wiki page
>> > >> > >> >> > 
>><https://cwiki.apache.org/confluence/display/KAFKA/Security
>> >
>> > >> > >> encryption
>> > >> > >> >> of
>> > >> > >> >> > data at rest is out of scope for the time being.
>>However, we
>> > >>are
>> > >> > >> >> >  implementing  encryption in Kafka and would like to see
>>if
>> > >>there
>> > >> > is
>> > >> > >> >> > interest in submitting a patch got it.
>> > >> > >> >> >
>> > >> > >> >> > I suppose that one way to implement  encryption would be
>>to
>> > >>add
>> > >> an
>> > >> > >> >> > 'encrypted key' field to the Message/MessageSet
>>structures
>> in
>> > >> the
>> > >> > >> >> > wire protocole - however, this is a very big and
>>fundamental
>> > >> > change.
>> > >> > >> >> >
>> > >> > >> >> > A simpler way to add encryption support would be:
>> > >> > >> >> > 1) Custom Serializer, but it wouldn't be compatible with
>> other
>> > >> > custom
>> > >> > >> >> > serializers (Avro, etc. )
>> > >> > >> >> > 2)  Add a step in KafkaProducer after serialization to
>> encrypt
>> > >> the
>> > >> > >> data
>> > >> > >> >> > before it's being submitted to the accumulator
>>(encryption
>> is
>> > >> done
>> > >> > in
>> > >> > >> the
>> > >> > >> >> > submitting thread, not in the producer io thread)
>> > >> > >> >> >
>> > >> > >> >> > Is there interest in adding #2 to Kafka?
>> > >> > >> >> >
>> > >> > >> >> > Cheers,
>> > >> > >> >> > Eugene
>> > >> > >> >> >
>> > >> > >> >>
>> > >> > >>
>> > >> >
>> > >>
>> >
>> >
>> >
>>



Re: Gauging Interest in adding Encryption to Kafka

Posted by Alejandro Abdelnur <tu...@gmail.com>.
Doing encryption on the client has the following benefits (most of them
already mentioned in the thread):

* brokers don't have additional CPU load
* brokers never see the data in unencrypted form (Kafka admins cannot snoop)
* secure multi-tenancy (keys are 100% on the client space)
* no need to secure Kafka wire transport, client-broker and broker-broker
(data is already encrypted)

It would be highly desirable, even if encryption is done on the client
side, that encryption is 'transparent'. Similar to how HDFS encryption
works, it is not the client writing/reading a topic the one that decides to
encrypt/decrypt but the broker is the one telling that to the client and
providing encrypted encryption keys for the tasks.The client-broker
protocol would have to be augmented to carry the encrypted encryption key,
plus logic to handle redistribution to existing clients due to key rotation.

A nice thing about doing broker side encryption though is that you can shut
off clients at any time and they won't see unencrypted data anymore. But
this means the brokers will have to deal with the client ACLs for
encryption (i'd rather leave that outside of Kafka and being a concern of
the KMS system). You could achieve similar functionality on client side
encryption, by removing the client from the ACLs in the KMS and doing a key
rotation, then the client will not be able to decrypt any messages using
the new key (though all previous ones using the key that the client already
has will be visible to the client).


Thanks.


On Mon, Aug 3, 2015 at 10:22 AM, Gwen Shapira <gw...@confluent.io> wrote:

> If I understand you correctly, you are saying that the kerberos keytab that
> the broker uses to authenticate with the KMS will be somewhere on the
> broker node and can be used by a malicious admin.
>
> I agree this is a valid concern.
> I am not opposed to client-side encryption, I am more concerned that the
> modifications this will require in Kafka broker implementation make the
> idea impractical. And obviously, as in any security discussion - there are
> lots of details regarding key exchange, management and protection that are
> critical.
>
> Perhaps given a design doc, we can better evaluate the proposed tradeoffs.
>
> Gwen
>
>
>
> On Sat, Aug 1, 2015 at 10:10 AM, Don Bosco Durai <bo...@apache.org> wrote:
>
> > >Any reason you think its better to let the clients handle it?
> > Gwen, I agree with Todd, depending on the goal, the requirements might
> > vary. If the goal is that someone stills the disk, then they should be
> > able to access the data, then encrypting at Broker is enough. However, if
> > the requirement is that the admin/operator should not be able to access
> > the data, then client side is the only option.
> >
> > Hadoop/HDFS transparent data encryption has a similar philosophy, where
> > the actual encryption/decryption happens at the client side.
> >
> > >1. Key management
> > Hadoop common has a KMS. And there are industry standards like KMIP. If
> > Broker does the encrypt/decrypt, then the solution is much easier. If the
> > client does it, then sharing the key would be a challenge. It might be
> > even necessary to use asymmetric encryption to limit sharing of the keys.
> >
> > Bosco
> >
> >
> >
> >
> > On 7/31/15, 9:31 PM, "Jiangjie Qin" <jq...@linkedin.com.INVALID> wrote:
> >
> > >I agree with Todd, the major concern I have is still the complexity on
> > >broker which can kill the performance - which a key advantage of Kafka.
> I
> > >think there are two separate issues here:
> > >1. Key management
> > >2. the actual encryption/decryption work.
> > >
> > >Personally I think it might be OK to have [1] supported in Kafka given
> we
> > >might need to be compatible with different key management system anyway.
> > >But we should just make Kafka compatible with other key management
> systems
> > >instead of letting Kafka itself to manage the keys. For [2], I think we
> > >should keep it on the client side.
> > >
> > >Jiangjie (Becket) Qin
> > >
> > >On Fri, Jul 31, 2015 at 5:06 PM, Todd Palino <tp...@gmail.com> wrote:
> > >
> > >> 1 - Yes, authorization combined with encryption does get us most of
> the
> > >>way
> > >> there. However, depending on the auditor it might not be good enough.
> > >>The
> > >> problem is that if you are encrypting at the broker, then by
> definition
> > >> anyone who has access to the broker (i.e. operations staff) have
> access
> > >>to
> > >> the data. Consider the case where you are passing salary and other
> > >> information through the system, and those people do not need a view of
> > >>it.
> > >> I admit, the 90% solution might be better here than going for a
> perfect
> > >> solution, but it is something to think about.
> > >>
> > >> 2 - My worry is people wanting to integrate with different key
> systems.
> > >>For
> > >> example, one person may be fine with providing it in a config file,
> > >>while
> > >> someone else may want to use the solution from vendor A, someone else
> > >>wants
> > >> vendor B, and yet another person wants this obscure hardware-based
> > >>solution
> > >> that exists elsewhere.
> > >>
> > >> The compaction concern is definitely a good one I hadn't thought of.
> I'm
> > >> wondering if it's reasonable to just say that compaction will not work
> > >> properly with encrypted keys if you do not have consistent encryption
> > >>(that
> > >> is, the same string encrypts to the same string every time).
> > >>
> > >> Ultimately I don't like the idea of the broker doing any
> encrypt/decrypt
> > >> steps OR compression/decompression. This is all CPU overhead that
> you're
> > >> concentrating in one place instead of distributing the load out to the
> > >> clients. Now yes, I know that the broker decompresses to check the CRC
> > >>and
> > >> assign offsets and then compresses, and we can potentially avoid the
> > >> compression step with assigning the batch an offset and a count
> instead
> > >>but
> > >> we still need to consider the CRC. Adding encrypt/decrypt steps adds
> > >>even
> > >> more overhead and it's going to get very difficult to handle even 2
> > >>Gbits
> > >> worth of traffic at that rate.
> > >>
> > >> There are other situations that concern me, such as revocation of
> keys,
> > >>and
> > >> I'm not sure whether it is better with client-based or server-based
> > >> encryption. For example, if I want to revoke a key with client-based
> > >> encryption it becomes similar to how we handle Avro schemas
> (internally)
> > >> now - you change keys, and depending on what your desire is you either
> > >> expire out the data for some period of time with the older keys, or
> you
> > >> just let it sit there and your consuming clients won't have an issue.
> > >>With
> > >> broker-based encryption, the broker has to work with the multiple keys
> > >> per-topic.
> > >>
> > >> -Todd
> > >>
> > >>
> > >> On Fri, Jul 31, 2015 at 2:38 PM, Gwen Shapira <gs...@cloudera.com>
> > >> wrote:
> > >>
> > >> > Good points :)
> > >> >
> > >> > 1) Kafka already (pending commit) has an authorization layer, so
> > >> > theoretically we are good for SOX, HIPAA, PCI, etc. Transparent
> broker
> > >> > encryption will support PCI
> > >> > never-let-unencrypted-card-number-hit-disk.
> > >> >
> > >> > 2) Agree on Key Management being complete PITA. It may better to
> > >> > centralize this pain in the broker rather than distributing it to
> > >> > clients. Any reason you think its better to let the clients handle
> it?
> > >> > The way I see it, we'll need to handle key management the way we did
> > >> > authorization - give an API for interfacing with existing systems.
> > >> >
> > >> > More important, we need the broker to be able to decrypt and encrypt
> > >> > in order to support compaction (unless we can find a cool
> > >> > key-uniqueness-preserving encryption algorithm, but this may not be
> as
> > >> > secure). I think we also need the broker to be able to re-compress
> > >> > data, and since we always encrypt compressed bits (compressing
> > >> > encrypted bits doesn't compress), we need the broker to decrypt
> before
> > >> > re-compressing.
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Jul 31, 2015 at 2:27 PM, Todd Palino <tp...@gmail.com>
> > >>wrote:
> > >> > > It does limit it to clients that have an implementation for
> > >>encryption,
> > >> > > however encryption on the client side is better from an auditing
> > >>point
> > >> of
> > >> > > view (whether that is SOX, HIPAA, PCI, or something else). Most of
> > >> those
> > >> > > types of standards are based around allowing visibility of data to
> > >>just
> > >> > the
> > >> > > people who need it. That includes the admins of the system (who
> are
> > >> often
> > >> > > not the people who use the data).
> > >> > >
> > >> > > Additionally, key management is a royal pain, and there are lots
> of
> > >> > > different types of systems that one may want to use. This is a
> > >>pretty
> > >> big
> > >> > > complication for the brokers.
> > >> > >
> > >> > > -Todd
> > >> > >
> > >> > >
> > >> > > On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira
> > >><gs...@cloudera.com>
> > >> > wrote:
> > >> > >
> > >> > >> I've seen interest in HDFS-like "encryption zones" in Kafka.
> > >> > >>
> > >> > >> This has the advantage of magically encrypting data at rest
> > >>regardless
> > >> > >> of which client is used as a producer.
> > >> > >> Adding it on the client side limits the feature to the java
> client.
> > >> > >>
> > >> > >> Gwen
> > >> > >>
> > >> > >> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
> > >> > >> <eu...@gmail.com> wrote:
> > >> > >> > I think that Hadoop and Cassandra do [1] (Transparent
> Encryption)
> > >> > >> >
> > >> > >> > We're doing [2] (on a side note, for [2] you still need
> > >> > authentication on
> > >> > >> > the producer side - you don't want an unauthorized user writing
> > >> > garbage).
> > >> > >> > Right now we have the 'user' doing the  encryption and
> submitting
> > >> raw
> > >> > >> bytes
> > >> > >> > to the producer. I was suggesting implementing an encryptor in
> > >>the
> > >> > >> > producer itself - I think it's cleaner and can be reused by
> other
> > >> > users
> > >> > >> > (instead of having to do their own encryption)
> > >> > >> >
> > >> > >> > Cheers,
> > >> > >> > Eugene
> > >> > >> >
> > >> > >> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin
> > >> > <jqin@linkedin.com.invalid
> > >> > >> >
> > >> > >> > wrote:
> > >> > >> >
> > >> > >> >> I think the goal here is to make the actual message stored on
> > >> broker
> > >> > to
> > >> > >> be
> > >> > >> >> encrypted, because after we have SSL, the transmission would
> be
> > >> > >> encrypted.
> > >> > >> >>
> > >> > >> >> In general there might be tow approaches:
> > >> > >> >> 1. Broker do the encryption/decryption
> > >> > >> >> 2. Client do the encryption/decryption
> > >> > >> >>
> > >> > >> >> From performance point of view, I would prefer [2]. It is just
> > >>in
> > >> > that
> > >> > >> >> case, maybe user does not necessarily need to use SSL anymore
> > >> because
> > >> > >> the
> > >> > >> >> data would be encrypted anyway.
> > >> > >> >>
> > >> > >> >> If we let client do the encryption, there are also two ways to
> > >>do
> > >> so
> > >> > -
> > >> > >> >> either we let producer take an encryptor or users can do
> > >> > >> >> serialization/encryption outside the producer and send raw
> > >>bytes.
> > >> The
> > >> > >> only
> > >> > >> >> difference between the two might be flexibility. For example,
> if
> > >> > someone
> > >> > >> >> wants to know the actual bytes of a message that got sent over
> > >>the
> > >> > wire,
> > >> > >> >> doing it outside the producer would probably more preferable.
> > >> > >> >>
> > >> > >> >> Jiangjie (Becket) Qin
> > >> > >> >>
> > >> > >> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
> > >> > >> >> eugene.miretsky@gmail.com
> > >> > >> >> > wrote:
> > >> > >> >>
> > >> > >> >> > Hi,
> > >> > >> >> >
> > >> > >> >> > Based on the security wiki page
> > >> > >> >> > <https://cwiki.apache.org/confluence/display/KAFKA/Security
> >
> > >> > >> encryption
> > >> > >> >> of
> > >> > >> >> > data at rest is out of scope for the time being. However, we
> > >>are
> > >> > >> >> >  implementing  encryption in Kafka and would like to see if
> > >>there
> > >> > is
> > >> > >> >> > interest in submitting a patch got it.
> > >> > >> >> >
> > >> > >> >> > I suppose that one way to implement  encryption would be to
> > >>add
> > >> an
> > >> > >> >> > 'encrypted key' field to the Message/MessageSet  structures
> in
> > >> the
> > >> > >> >> > wire protocole - however, this is a very big and fundamental
> > >> > change.
> > >> > >> >> >
> > >> > >> >> > A simpler way to add encryption support would be:
> > >> > >> >> > 1) Custom Serializer, but it wouldn't be compatible with
> other
> > >> > custom
> > >> > >> >> > serializers (Avro, etc. )
> > >> > >> >> > 2)  Add a step in KafkaProducer after serialization to
> encrypt
> > >> the
> > >> > >> data
> > >> > >> >> > before it's being submitted to the accumulator (encryption
> is
> > >> done
> > >> > in
> > >> > >> the
> > >> > >> >> > submitting thread, not in the producer io thread)
> > >> > >> >> >
> > >> > >> >> > Is there interest in adding #2 to Kafka?
> > >> > >> >> >
> > >> > >> >> > Cheers,
> > >> > >> >> > Eugene
> > >> > >> >> >
> > >> > >> >>
> > >> > >>
> > >> >
> > >>
> >
> >
> >
>

Re: Gauging Interest in adding Encryption to Kafka

Posted by Gwen Shapira <gw...@confluent.io>.
If I understand you correctly, you are saying that the kerberos keytab that
the broker uses to authenticate with the KMS will be somewhere on the
broker node and can be used by a malicious admin.

I agree this is a valid concern.
I am not opposed to client-side encryption, I am more concerned that the
modifications this will require in Kafka broker implementation make the
idea impractical. And obviously, as in any security discussion - there are
lots of details regarding key exchange, management and protection that are
critical.

Perhaps given a design doc, we can better evaluate the proposed tradeoffs.

Gwen



On Sat, Aug 1, 2015 at 10:10 AM, Don Bosco Durai <bo...@apache.org> wrote:

> >Any reason you think its better to let the clients handle it?
> Gwen, I agree with Todd, depending on the goal, the requirements might
> vary. If the goal is that someone stills the disk, then they should be
> able to access the data, then encrypting at Broker is enough. However, if
> the requirement is that the admin/operator should not be able to access
> the data, then client side is the only option.
>
> Hadoop/HDFS transparent data encryption has a similar philosophy, where
> the actual encryption/decryption happens at the client side.
>
> >1. Key management
> Hadoop common has a KMS. And there are industry standards like KMIP. If
> Broker does the encrypt/decrypt, then the solution is much easier. If the
> client does it, then sharing the key would be a challenge. It might be
> even necessary to use asymmetric encryption to limit sharing of the keys.
>
> Bosco
>
>
>
>
> On 7/31/15, 9:31 PM, "Jiangjie Qin" <jq...@linkedin.com.INVALID> wrote:
>
> >I agree with Todd, the major concern I have is still the complexity on
> >broker which can kill the performance - which a key advantage of Kafka. I
> >think there are two separate issues here:
> >1. Key management
> >2. the actual encryption/decryption work.
> >
> >Personally I think it might be OK to have [1] supported in Kafka given we
> >might need to be compatible with different key management system anyway.
> >But we should just make Kafka compatible with other key management systems
> >instead of letting Kafka itself to manage the keys. For [2], I think we
> >should keep it on the client side.
> >
> >Jiangjie (Becket) Qin
> >
> >On Fri, Jul 31, 2015 at 5:06 PM, Todd Palino <tp...@gmail.com> wrote:
> >
> >> 1 - Yes, authorization combined with encryption does get us most of the
> >>way
> >> there. However, depending on the auditor it might not be good enough.
> >>The
> >> problem is that if you are encrypting at the broker, then by definition
> >> anyone who has access to the broker (i.e. operations staff) have access
> >>to
> >> the data. Consider the case where you are passing salary and other
> >> information through the system, and those people do not need a view of
> >>it.
> >> I admit, the 90% solution might be better here than going for a perfect
> >> solution, but it is something to think about.
> >>
> >> 2 - My worry is people wanting to integrate with different key systems.
> >>For
> >> example, one person may be fine with providing it in a config file,
> >>while
> >> someone else may want to use the solution from vendor A, someone else
> >>wants
> >> vendor B, and yet another person wants this obscure hardware-based
> >>solution
> >> that exists elsewhere.
> >>
> >> The compaction concern is definitely a good one I hadn't thought of. I'm
> >> wondering if it's reasonable to just say that compaction will not work
> >> properly with encrypted keys if you do not have consistent encryption
> >>(that
> >> is, the same string encrypts to the same string every time).
> >>
> >> Ultimately I don't like the idea of the broker doing any encrypt/decrypt
> >> steps OR compression/decompression. This is all CPU overhead that you're
> >> concentrating in one place instead of distributing the load out to the
> >> clients. Now yes, I know that the broker decompresses to check the CRC
> >>and
> >> assign offsets and then compresses, and we can potentially avoid the
> >> compression step with assigning the batch an offset and a count instead
> >>but
> >> we still need to consider the CRC. Adding encrypt/decrypt steps adds
> >>even
> >> more overhead and it's going to get very difficult to handle even 2
> >>Gbits
> >> worth of traffic at that rate.
> >>
> >> There are other situations that concern me, such as revocation of keys,
> >>and
> >> I'm not sure whether it is better with client-based or server-based
> >> encryption. For example, if I want to revoke a key with client-based
> >> encryption it becomes similar to how we handle Avro schemas (internally)
> >> now - you change keys, and depending on what your desire is you either
> >> expire out the data for some period of time with the older keys, or you
> >> just let it sit there and your consuming clients won't have an issue.
> >>With
> >> broker-based encryption, the broker has to work with the multiple keys
> >> per-topic.
> >>
> >> -Todd
> >>
> >>
> >> On Fri, Jul 31, 2015 at 2:38 PM, Gwen Shapira <gs...@cloudera.com>
> >> wrote:
> >>
> >> > Good points :)
> >> >
> >> > 1) Kafka already (pending commit) has an authorization layer, so
> >> > theoretically we are good for SOX, HIPAA, PCI, etc. Transparent broker
> >> > encryption will support PCI
> >> > never-let-unencrypted-card-number-hit-disk.
> >> >
> >> > 2) Agree on Key Management being complete PITA. It may better to
> >> > centralize this pain in the broker rather than distributing it to
> >> > clients. Any reason you think its better to let the clients handle it?
> >> > The way I see it, we'll need to handle key management the way we did
> >> > authorization - give an API for interfacing with existing systems.
> >> >
> >> > More important, we need the broker to be able to decrypt and encrypt
> >> > in order to support compaction (unless we can find a cool
> >> > key-uniqueness-preserving encryption algorithm, but this may not be as
> >> > secure). I think we also need the broker to be able to re-compress
> >> > data, and since we always encrypt compressed bits (compressing
> >> > encrypted bits doesn't compress), we need the broker to decrypt before
> >> > re-compressing.
> >> >
> >> >
> >> >
> >> > On Fri, Jul 31, 2015 at 2:27 PM, Todd Palino <tp...@gmail.com>
> >>wrote:
> >> > > It does limit it to clients that have an implementation for
> >>encryption,
> >> > > however encryption on the client side is better from an auditing
> >>point
> >> of
> >> > > view (whether that is SOX, HIPAA, PCI, or something else). Most of
> >> those
> >> > > types of standards are based around allowing visibility of data to
> >>just
> >> > the
> >> > > people who need it. That includes the admins of the system (who are
> >> often
> >> > > not the people who use the data).
> >> > >
> >> > > Additionally, key management is a royal pain, and there are lots of
> >> > > different types of systems that one may want to use. This is a
> >>pretty
> >> big
> >> > > complication for the brokers.
> >> > >
> >> > > -Todd
> >> > >
> >> > >
> >> > > On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira
> >><gs...@cloudera.com>
> >> > wrote:
> >> > >
> >> > >> I've seen interest in HDFS-like "encryption zones" in Kafka.
> >> > >>
> >> > >> This has the advantage of magically encrypting data at rest
> >>regardless
> >> > >> of which client is used as a producer.
> >> > >> Adding it on the client side limits the feature to the java client.
> >> > >>
> >> > >> Gwen
> >> > >>
> >> > >> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
> >> > >> <eu...@gmail.com> wrote:
> >> > >> > I think that Hadoop and Cassandra do [1] (Transparent Encryption)
> >> > >> >
> >> > >> > We're doing [2] (on a side note, for [2] you still need
> >> > authentication on
> >> > >> > the producer side - you don't want an unauthorized user writing
> >> > garbage).
> >> > >> > Right now we have the 'user' doing the  encryption and submitting
> >> raw
> >> > >> bytes
> >> > >> > to the producer. I was suggesting implementing an encryptor in
> >>the
> >> > >> > producer itself - I think it's cleaner and can be reused by other
> >> > users
> >> > >> > (instead of having to do their own encryption)
> >> > >> >
> >> > >> > Cheers,
> >> > >> > Eugene
> >> > >> >
> >> > >> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin
> >> > <jqin@linkedin.com.invalid
> >> > >> >
> >> > >> > wrote:
> >> > >> >
> >> > >> >> I think the goal here is to make the actual message stored on
> >> broker
> >> > to
> >> > >> be
> >> > >> >> encrypted, because after we have SSL, the transmission would be
> >> > >> encrypted.
> >> > >> >>
> >> > >> >> In general there might be tow approaches:
> >> > >> >> 1. Broker do the encryption/decryption
> >> > >> >> 2. Client do the encryption/decryption
> >> > >> >>
> >> > >> >> From performance point of view, I would prefer [2]. It is just
> >>in
> >> > that
> >> > >> >> case, maybe user does not necessarily need to use SSL anymore
> >> because
> >> > >> the
> >> > >> >> data would be encrypted anyway.
> >> > >> >>
> >> > >> >> If we let client do the encryption, there are also two ways to
> >>do
> >> so
> >> > -
> >> > >> >> either we let producer take an encryptor or users can do
> >> > >> >> serialization/encryption outside the producer and send raw
> >>bytes.
> >> The
> >> > >> only
> >> > >> >> difference between the two might be flexibility. For example, if
> >> > someone
> >> > >> >> wants to know the actual bytes of a message that got sent over
> >>the
> >> > wire,
> >> > >> >> doing it outside the producer would probably more preferable.
> >> > >> >>
> >> > >> >> Jiangjie (Becket) Qin
> >> > >> >>
> >> > >> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
> >> > >> >> eugene.miretsky@gmail.com
> >> > >> >> > wrote:
> >> > >> >>
> >> > >> >> > Hi,
> >> > >> >> >
> >> > >> >> > Based on the security wiki page
> >> > >> >> > <https://cwiki.apache.org/confluence/display/KAFKA/Security>
> >> > >> encryption
> >> > >> >> of
> >> > >> >> > data at rest is out of scope for the time being. However, we
> >>are
> >> > >> >> >  implementing  encryption in Kafka and would like to see if
> >>there
> >> > is
> >> > >> >> > interest in submitting a patch got it.
> >> > >> >> >
> >> > >> >> > I suppose that one way to implement  encryption would be to
> >>add
> >> an
> >> > >> >> > 'encrypted key' field to the Message/MessageSet  structures in
> >> the
> >> > >> >> > wire protocole - however, this is a very big and fundamental
> >> > change.
> >> > >> >> >
> >> > >> >> > A simpler way to add encryption support would be:
> >> > >> >> > 1) Custom Serializer, but it wouldn't be compatible with other
> >> > custom
> >> > >> >> > serializers (Avro, etc. )
> >> > >> >> > 2)  Add a step in KafkaProducer after serialization to encrypt
> >> the
> >> > >> data
> >> > >> >> > before it's being submitted to the accumulator (encryption is
> >> done
> >> > in
> >> > >> the
> >> > >> >> > submitting thread, not in the producer io thread)
> >> > >> >> >
> >> > >> >> > Is there interest in adding #2 to Kafka?
> >> > >> >> >
> >> > >> >> > Cheers,
> >> > >> >> > Eugene
> >> > >> >> >
> >> > >> >>
> >> > >>
> >> >
> >>
>
>
>

Re: Gauging Interest in adding Encryption to Kafka

Posted by Don Bosco Durai <bo...@apache.org>.
>Any reason you think its better to let the clients handle it?
Gwen, I agree with Todd, depending on the goal, the requirements might
vary. If the goal is that someone stills the disk, then they should be
able to access the data, then encrypting at Broker is enough. However, if
the requirement is that the admin/operator should not be able to access
the data, then client side is the only option.

Hadoop/HDFS transparent data encryption has a similar philosophy, where
the actual encryption/decryption happens at the client side.

>1. Key management
Hadoop common has a KMS. And there are industry standards like KMIP. If
Broker does the encrypt/decrypt, then the solution is much easier. If the
client does it, then sharing the key would be a challenge. It might be
even necessary to use asymmetric encryption to limit sharing of the keys.

Bosco




On 7/31/15, 9:31 PM, "Jiangjie Qin" <jq...@linkedin.com.INVALID> wrote:

>I agree with Todd, the major concern I have is still the complexity on
>broker which can kill the performance - which a key advantage of Kafka. I
>think there are two separate issues here:
>1. Key management
>2. the actual encryption/decryption work.
>
>Personally I think it might be OK to have [1] supported in Kafka given we
>might need to be compatible with different key management system anyway.
>But we should just make Kafka compatible with other key management systems
>instead of letting Kafka itself to manage the keys. For [2], I think we
>should keep it on the client side.
>
>Jiangjie (Becket) Qin
>
>On Fri, Jul 31, 2015 at 5:06 PM, Todd Palino <tp...@gmail.com> wrote:
>
>> 1 - Yes, authorization combined with encryption does get us most of the
>>way
>> there. However, depending on the auditor it might not be good enough.
>>The
>> problem is that if you are encrypting at the broker, then by definition
>> anyone who has access to the broker (i.e. operations staff) have access
>>to
>> the data. Consider the case where you are passing salary and other
>> information through the system, and those people do not need a view of
>>it.
>> I admit, the 90% solution might be better here than going for a perfect
>> solution, but it is something to think about.
>>
>> 2 - My worry is people wanting to integrate with different key systems.
>>For
>> example, one person may be fine with providing it in a config file,
>>while
>> someone else may want to use the solution from vendor A, someone else
>>wants
>> vendor B, and yet another person wants this obscure hardware-based
>>solution
>> that exists elsewhere.
>>
>> The compaction concern is definitely a good one I hadn't thought of. I'm
>> wondering if it's reasonable to just say that compaction will not work
>> properly with encrypted keys if you do not have consistent encryption
>>(that
>> is, the same string encrypts to the same string every time).
>>
>> Ultimately I don't like the idea of the broker doing any encrypt/decrypt
>> steps OR compression/decompression. This is all CPU overhead that you're
>> concentrating in one place instead of distributing the load out to the
>> clients. Now yes, I know that the broker decompresses to check the CRC
>>and
>> assign offsets and then compresses, and we can potentially avoid the
>> compression step with assigning the batch an offset and a count instead
>>but
>> we still need to consider the CRC. Adding encrypt/decrypt steps adds
>>even
>> more overhead and it's going to get very difficult to handle even 2
>>Gbits
>> worth of traffic at that rate.
>>
>> There are other situations that concern me, such as revocation of keys,
>>and
>> I'm not sure whether it is better with client-based or server-based
>> encryption. For example, if I want to revoke a key with client-based
>> encryption it becomes similar to how we handle Avro schemas (internally)
>> now - you change keys, and depending on what your desire is you either
>> expire out the data for some period of time with the older keys, or you
>> just let it sit there and your consuming clients won't have an issue.
>>With
>> broker-based encryption, the broker has to work with the multiple keys
>> per-topic.
>>
>> -Todd
>>
>>
>> On Fri, Jul 31, 2015 at 2:38 PM, Gwen Shapira <gs...@cloudera.com>
>> wrote:
>>
>> > Good points :)
>> >
>> > 1) Kafka already (pending commit) has an authorization layer, so
>> > theoretically we are good for SOX, HIPAA, PCI, etc. Transparent broker
>> > encryption will support PCI
>> > never-let-unencrypted-card-number-hit-disk.
>> >
>> > 2) Agree on Key Management being complete PITA. It may better to
>> > centralize this pain in the broker rather than distributing it to
>> > clients. Any reason you think its better to let the clients handle it?
>> > The way I see it, we'll need to handle key management the way we did
>> > authorization - give an API for interfacing with existing systems.
>> >
>> > More important, we need the broker to be able to decrypt and encrypt
>> > in order to support compaction (unless we can find a cool
>> > key-uniqueness-preserving encryption algorithm, but this may not be as
>> > secure). I think we also need the broker to be able to re-compress
>> > data, and since we always encrypt compressed bits (compressing
>> > encrypted bits doesn't compress), we need the broker to decrypt before
>> > re-compressing.
>> >
>> >
>> >
>> > On Fri, Jul 31, 2015 at 2:27 PM, Todd Palino <tp...@gmail.com>
>>wrote:
>> > > It does limit it to clients that have an implementation for
>>encryption,
>> > > however encryption on the client side is better from an auditing
>>point
>> of
>> > > view (whether that is SOX, HIPAA, PCI, or something else). Most of
>> those
>> > > types of standards are based around allowing visibility of data to
>>just
>> > the
>> > > people who need it. That includes the admins of the system (who are
>> often
>> > > not the people who use the data).
>> > >
>> > > Additionally, key management is a royal pain, and there are lots of
>> > > different types of systems that one may want to use. This is a
>>pretty
>> big
>> > > complication for the brokers.
>> > >
>> > > -Todd
>> > >
>> > >
>> > > On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira
>><gs...@cloudera.com>
>> > wrote:
>> > >
>> > >> I've seen interest in HDFS-like "encryption zones" in Kafka.
>> > >>
>> > >> This has the advantage of magically encrypting data at rest
>>regardless
>> > >> of which client is used as a producer.
>> > >> Adding it on the client side limits the feature to the java client.
>> > >>
>> > >> Gwen
>> > >>
>> > >> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
>> > >> <eu...@gmail.com> wrote:
>> > >> > I think that Hadoop and Cassandra do [1] (Transparent Encryption)
>> > >> >
>> > >> > We're doing [2] (on a side note, for [2] you still need
>> > authentication on
>> > >> > the producer side - you don't want an unauthorized user writing
>> > garbage).
>> > >> > Right now we have the 'user' doing the  encryption and submitting
>> raw
>> > >> bytes
>> > >> > to the producer. I was suggesting implementing an encryptor in
>>the
>> > >> > producer itself - I think it's cleaner and can be reused by other
>> > users
>> > >> > (instead of having to do their own encryption)
>> > >> >
>> > >> > Cheers,
>> > >> > Eugene
>> > >> >
>> > >> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin
>> > <jqin@linkedin.com.invalid
>> > >> >
>> > >> > wrote:
>> > >> >
>> > >> >> I think the goal here is to make the actual message stored on
>> broker
>> > to
>> > >> be
>> > >> >> encrypted, because after we have SSL, the transmission would be
>> > >> encrypted.
>> > >> >>
>> > >> >> In general there might be tow approaches:
>> > >> >> 1. Broker do the encryption/decryption
>> > >> >> 2. Client do the encryption/decryption
>> > >> >>
>> > >> >> From performance point of view, I would prefer [2]. It is just
>>in
>> > that
>> > >> >> case, maybe user does not necessarily need to use SSL anymore
>> because
>> > >> the
>> > >> >> data would be encrypted anyway.
>> > >> >>
>> > >> >> If we let client do the encryption, there are also two ways to
>>do
>> so
>> > -
>> > >> >> either we let producer take an encryptor or users can do
>> > >> >> serialization/encryption outside the producer and send raw
>>bytes.
>> The
>> > >> only
>> > >> >> difference between the two might be flexibility. For example, if
>> > someone
>> > >> >> wants to know the actual bytes of a message that got sent over
>>the
>> > wire,
>> > >> >> doing it outside the producer would probably more preferable.
>> > >> >>
>> > >> >> Jiangjie (Becket) Qin
>> > >> >>
>> > >> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
>> > >> >> eugene.miretsky@gmail.com
>> > >> >> > wrote:
>> > >> >>
>> > >> >> > Hi,
>> > >> >> >
>> > >> >> > Based on the security wiki page
>> > >> >> > <https://cwiki.apache.org/confluence/display/KAFKA/Security>
>> > >> encryption
>> > >> >> of
>> > >> >> > data at rest is out of scope for the time being. However, we
>>are
>> > >> >> >  implementing  encryption in Kafka and would like to see if
>>there
>> > is
>> > >> >> > interest in submitting a patch got it.
>> > >> >> >
>> > >> >> > I suppose that one way to implement  encryption would be to
>>add
>> an
>> > >> >> > 'encrypted key' field to the Message/MessageSet  structures in
>> the
>> > >> >> > wire protocole - however, this is a very big and fundamental
>> > change.
>> > >> >> >
>> > >> >> > A simpler way to add encryption support would be:
>> > >> >> > 1) Custom Serializer, but it wouldn't be compatible with other
>> > custom
>> > >> >> > serializers (Avro, etc. )
>> > >> >> > 2)  Add a step in KafkaProducer after serialization to encrypt
>> the
>> > >> data
>> > >> >> > before it's being submitted to the accumulator (encryption is
>> done
>> > in
>> > >> the
>> > >> >> > submitting thread, not in the producer io thread)
>> > >> >> >
>> > >> >> > Is there interest in adding #2 to Kafka?
>> > >> >> >
>> > >> >> > Cheers,
>> > >> >> > Eugene
>> > >> >> >
>> > >> >>
>> > >>
>> >
>>



Re: Gauging Interest in adding Encryption to Kafka

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
I agree with Todd, the major concern I have is still the complexity on
broker which can kill the performance - which a key advantage of Kafka. I
think there are two separate issues here:
1. Key management
2. the actual encryption/decryption work.

Personally I think it might be OK to have [1] supported in Kafka given we
might need to be compatible with different key management system anyway.
But we should just make Kafka compatible with other key management systems
instead of letting Kafka itself to manage the keys. For [2], I think we
should keep it on the client side.

Jiangjie (Becket) Qin

On Fri, Jul 31, 2015 at 5:06 PM, Todd Palino <tp...@gmail.com> wrote:

> 1 - Yes, authorization combined with encryption does get us most of the way
> there. However, depending on the auditor it might not be good enough. The
> problem is that if you are encrypting at the broker, then by definition
> anyone who has access to the broker (i.e. operations staff) have access to
> the data. Consider the case where you are passing salary and other
> information through the system, and those people do not need a view of it.
> I admit, the 90% solution might be better here than going for a perfect
> solution, but it is something to think about.
>
> 2 - My worry is people wanting to integrate with different key systems. For
> example, one person may be fine with providing it in a config file, while
> someone else may want to use the solution from vendor A, someone else wants
> vendor B, and yet another person wants this obscure hardware-based solution
> that exists elsewhere.
>
> The compaction concern is definitely a good one I hadn't thought of. I'm
> wondering if it's reasonable to just say that compaction will not work
> properly with encrypted keys if you do not have consistent encryption (that
> is, the same string encrypts to the same string every time).
>
> Ultimately I don't like the idea of the broker doing any encrypt/decrypt
> steps OR compression/decompression. This is all CPU overhead that you're
> concentrating in one place instead of distributing the load out to the
> clients. Now yes, I know that the broker decompresses to check the CRC and
> assign offsets and then compresses, and we can potentially avoid the
> compression step with assigning the batch an offset and a count instead but
> we still need to consider the CRC. Adding encrypt/decrypt steps adds even
> more overhead and it's going to get very difficult to handle even 2 Gbits
> worth of traffic at that rate.
>
> There are other situations that concern me, such as revocation of keys, and
> I'm not sure whether it is better with client-based or server-based
> encryption. For example, if I want to revoke a key with client-based
> encryption it becomes similar to how we handle Avro schemas (internally)
> now - you change keys, and depending on what your desire is you either
> expire out the data for some period of time with the older keys, or you
> just let it sit there and your consuming clients won't have an issue. With
> broker-based encryption, the broker has to work with the multiple keys
> per-topic.
>
> -Todd
>
>
> On Fri, Jul 31, 2015 at 2:38 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > Good points :)
> >
> > 1) Kafka already (pending commit) has an authorization layer, so
> > theoretically we are good for SOX, HIPAA, PCI, etc. Transparent broker
> > encryption will support PCI
> > never-let-unencrypted-card-number-hit-disk.
> >
> > 2) Agree on Key Management being complete PITA. It may better to
> > centralize this pain in the broker rather than distributing it to
> > clients. Any reason you think its better to let the clients handle it?
> > The way I see it, we'll need to handle key management the way we did
> > authorization - give an API for interfacing with existing systems.
> >
> > More important, we need the broker to be able to decrypt and encrypt
> > in order to support compaction (unless we can find a cool
> > key-uniqueness-preserving encryption algorithm, but this may not be as
> > secure). I think we also need the broker to be able to re-compress
> > data, and since we always encrypt compressed bits (compressing
> > encrypted bits doesn't compress), we need the broker to decrypt before
> > re-compressing.
> >
> >
> >
> > On Fri, Jul 31, 2015 at 2:27 PM, Todd Palino <tp...@gmail.com> wrote:
> > > It does limit it to clients that have an implementation for encryption,
> > > however encryption on the client side is better from an auditing point
> of
> > > view (whether that is SOX, HIPAA, PCI, or something else). Most of
> those
> > > types of standards are based around allowing visibility of data to just
> > the
> > > people who need it. That includes the admins of the system (who are
> often
> > > not the people who use the data).
> > >
> > > Additionally, key management is a royal pain, and there are lots of
> > > different types of systems that one may want to use. This is a pretty
> big
> > > complication for the brokers.
> > >
> > > -Todd
> > >
> > >
> > > On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira <gs...@cloudera.com>
> > wrote:
> > >
> > >> I've seen interest in HDFS-like "encryption zones" in Kafka.
> > >>
> > >> This has the advantage of magically encrypting data at rest regardless
> > >> of which client is used as a producer.
> > >> Adding it on the client side limits the feature to the java client.
> > >>
> > >> Gwen
> > >>
> > >> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
> > >> <eu...@gmail.com> wrote:
> > >> > I think that Hadoop and Cassandra do [1] (Transparent Encryption)
> > >> >
> > >> > We're doing [2] (on a side note, for [2] you still need
> > authentication on
> > >> > the producer side - you don't want an unauthorized user writing
> > garbage).
> > >> > Right now we have the 'user' doing the  encryption and submitting
> raw
> > >> bytes
> > >> > to the producer. I was suggesting implementing an encryptor in the
> > >> > producer itself - I think it's cleaner and can be reused by other
> > users
> > >> > (instead of having to do their own encryption)
> > >> >
> > >> > Cheers,
> > >> > Eugene
> > >> >
> > >> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin
> > <jqin@linkedin.com.invalid
> > >> >
> > >> > wrote:
> > >> >
> > >> >> I think the goal here is to make the actual message stored on
> broker
> > to
> > >> be
> > >> >> encrypted, because after we have SSL, the transmission would be
> > >> encrypted.
> > >> >>
> > >> >> In general there might be tow approaches:
> > >> >> 1. Broker do the encryption/decryption
> > >> >> 2. Client do the encryption/decryption
> > >> >>
> > >> >> From performance point of view, I would prefer [2]. It is just in
> > that
> > >> >> case, maybe user does not necessarily need to use SSL anymore
> because
> > >> the
> > >> >> data would be encrypted anyway.
> > >> >>
> > >> >> If we let client do the encryption, there are also two ways to do
> so
> > -
> > >> >> either we let producer take an encryptor or users can do
> > >> >> serialization/encryption outside the producer and send raw bytes.
> The
> > >> only
> > >> >> difference between the two might be flexibility. For example, if
> > someone
> > >> >> wants to know the actual bytes of a message that got sent over the
> > wire,
> > >> >> doing it outside the producer would probably more preferable.
> > >> >>
> > >> >> Jiangjie (Becket) Qin
> > >> >>
> > >> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
> > >> >> eugene.miretsky@gmail.com
> > >> >> > wrote:
> > >> >>
> > >> >> > Hi,
> > >> >> >
> > >> >> > Based on the security wiki page
> > >> >> > <https://cwiki.apache.org/confluence/display/KAFKA/Security>
> > >> encryption
> > >> >> of
> > >> >> > data at rest is out of scope for the time being. However, we are
> > >> >> >  implementing  encryption in Kafka and would like to see if there
> > is
> > >> >> > interest in submitting a patch got it.
> > >> >> >
> > >> >> > I suppose that one way to implement  encryption would be to add
> an
> > >> >> > 'encrypted key' field to the Message/MessageSet  structures in
> the
> > >> >> > wire protocole - however, this is a very big and fundamental
> > change.
> > >> >> >
> > >> >> > A simpler way to add encryption support would be:
> > >> >> > 1) Custom Serializer, but it wouldn't be compatible with other
> > custom
> > >> >> > serializers (Avro, etc. )
> > >> >> > 2)  Add a step in KafkaProducer after serialization to encrypt
> the
> > >> data
> > >> >> > before it's being submitted to the accumulator (encryption is
> done
> > in
> > >> the
> > >> >> > submitting thread, not in the producer io thread)
> > >> >> >
> > >> >> > Is there interest in adding #2 to Kafka?
> > >> >> >
> > >> >> > Cheers,
> > >> >> > Eugene
> > >> >> >
> > >> >>
> > >>
> >
>

Re: Gauging Interest in adding Encryption to Kafka

Posted by Todd Palino <tp...@gmail.com>.
1 - Yes, authorization combined with encryption does get us most of the way
there. However, depending on the auditor it might not be good enough. The
problem is that if you are encrypting at the broker, then by definition
anyone who has access to the broker (i.e. operations staff) have access to
the data. Consider the case where you are passing salary and other
information through the system, and those people do not need a view of it.
I admit, the 90% solution might be better here than going for a perfect
solution, but it is something to think about.

2 - My worry is people wanting to integrate with different key systems. For
example, one person may be fine with providing it in a config file, while
someone else may want to use the solution from vendor A, someone else wants
vendor B, and yet another person wants this obscure hardware-based solution
that exists elsewhere.

The compaction concern is definitely a good one I hadn't thought of. I'm
wondering if it's reasonable to just say that compaction will not work
properly with encrypted keys if you do not have consistent encryption (that
is, the same string encrypts to the same string every time).

Ultimately I don't like the idea of the broker doing any encrypt/decrypt
steps OR compression/decompression. This is all CPU overhead that you're
concentrating in one place instead of distributing the load out to the
clients. Now yes, I know that the broker decompresses to check the CRC and
assign offsets and then compresses, and we can potentially avoid the
compression step with assigning the batch an offset and a count instead but
we still need to consider the CRC. Adding encrypt/decrypt steps adds even
more overhead and it's going to get very difficult to handle even 2 Gbits
worth of traffic at that rate.

There are other situations that concern me, such as revocation of keys, and
I'm not sure whether it is better with client-based or server-based
encryption. For example, if I want to revoke a key with client-based
encryption it becomes similar to how we handle Avro schemas (internally)
now - you change keys, and depending on what your desire is you either
expire out the data for some period of time with the older keys, or you
just let it sit there and your consuming clients won't have an issue. With
broker-based encryption, the broker has to work with the multiple keys
per-topic.

-Todd


On Fri, Jul 31, 2015 at 2:38 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Good points :)
>
> 1) Kafka already (pending commit) has an authorization layer, so
> theoretically we are good for SOX, HIPAA, PCI, etc. Transparent broker
> encryption will support PCI
> never-let-unencrypted-card-number-hit-disk.
>
> 2) Agree on Key Management being complete PITA. It may better to
> centralize this pain in the broker rather than distributing it to
> clients. Any reason you think its better to let the clients handle it?
> The way I see it, we'll need to handle key management the way we did
> authorization - give an API for interfacing with existing systems.
>
> More important, we need the broker to be able to decrypt and encrypt
> in order to support compaction (unless we can find a cool
> key-uniqueness-preserving encryption algorithm, but this may not be as
> secure). I think we also need the broker to be able to re-compress
> data, and since we always encrypt compressed bits (compressing
> encrypted bits doesn't compress), we need the broker to decrypt before
> re-compressing.
>
>
>
> On Fri, Jul 31, 2015 at 2:27 PM, Todd Palino <tp...@gmail.com> wrote:
> > It does limit it to clients that have an implementation for encryption,
> > however encryption on the client side is better from an auditing point of
> > view (whether that is SOX, HIPAA, PCI, or something else). Most of those
> > types of standards are based around allowing visibility of data to just
> the
> > people who need it. That includes the admins of the system (who are often
> > not the people who use the data).
> >
> > Additionally, key management is a royal pain, and there are lots of
> > different types of systems that one may want to use. This is a pretty big
> > complication for the brokers.
> >
> > -Todd
> >
> >
> > On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
> >
> >> I've seen interest in HDFS-like "encryption zones" in Kafka.
> >>
> >> This has the advantage of magically encrypting data at rest regardless
> >> of which client is used as a producer.
> >> Adding it on the client side limits the feature to the java client.
> >>
> >> Gwen
> >>
> >> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
> >> <eu...@gmail.com> wrote:
> >> > I think that Hadoop and Cassandra do [1] (Transparent Encryption)
> >> >
> >> > We're doing [2] (on a side note, for [2] you still need
> authentication on
> >> > the producer side - you don't want an unauthorized user writing
> garbage).
> >> > Right now we have the 'user' doing the  encryption and submitting raw
> >> bytes
> >> > to the producer. I was suggesting implementing an encryptor in the
> >> > producer itself - I think it's cleaner and can be reused by other
> users
> >> > (instead of having to do their own encryption)
> >> >
> >> > Cheers,
> >> > Eugene
> >> >
> >> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin
> <jqin@linkedin.com.invalid
> >> >
> >> > wrote:
> >> >
> >> >> I think the goal here is to make the actual message stored on broker
> to
> >> be
> >> >> encrypted, because after we have SSL, the transmission would be
> >> encrypted.
> >> >>
> >> >> In general there might be tow approaches:
> >> >> 1. Broker do the encryption/decryption
> >> >> 2. Client do the encryption/decryption
> >> >>
> >> >> From performance point of view, I would prefer [2]. It is just in
> that
> >> >> case, maybe user does not necessarily need to use SSL anymore because
> >> the
> >> >> data would be encrypted anyway.
> >> >>
> >> >> If we let client do the encryption, there are also two ways to do so
> -
> >> >> either we let producer take an encryptor or users can do
> >> >> serialization/encryption outside the producer and send raw bytes. The
> >> only
> >> >> difference between the two might be flexibility. For example, if
> someone
> >> >> wants to know the actual bytes of a message that got sent over the
> wire,
> >> >> doing it outside the producer would probably more preferable.
> >> >>
> >> >> Jiangjie (Becket) Qin
> >> >>
> >> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
> >> >> eugene.miretsky@gmail.com
> >> >> > wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > Based on the security wiki page
> >> >> > <https://cwiki.apache.org/confluence/display/KAFKA/Security>
> >> encryption
> >> >> of
> >> >> > data at rest is out of scope for the time being. However, we are
> >> >> >  implementing  encryption in Kafka and would like to see if there
> is
> >> >> > interest in submitting a patch got it.
> >> >> >
> >> >> > I suppose that one way to implement  encryption would be to add an
> >> >> > 'encrypted key' field to the Message/MessageSet  structures in the
> >> >> > wire protocole - however, this is a very big and fundamental
> change.
> >> >> >
> >> >> > A simpler way to add encryption support would be:
> >> >> > 1) Custom Serializer, but it wouldn't be compatible with other
> custom
> >> >> > serializers (Avro, etc. )
> >> >> > 2)  Add a step in KafkaProducer after serialization to encrypt the
> >> data
> >> >> > before it's being submitted to the accumulator (encryption is done
> in
> >> the
> >> >> > submitting thread, not in the producer io thread)
> >> >> >
> >> >> > Is there interest in adding #2 to Kafka?
> >> >> >
> >> >> > Cheers,
> >> >> > Eugene
> >> >> >
> >> >>
> >>
>

Re: Gauging Interest in adding Encryption to Kafka

Posted by Gwen Shapira <gs...@cloudera.com>.
Good points :)

1) Kafka already (pending commit) has an authorization layer, so
theoretically we are good for SOX, HIPAA, PCI, etc. Transparent broker
encryption will support PCI
never-let-unencrypted-card-number-hit-disk.

2) Agree on Key Management being complete PITA. It may better to
centralize this pain in the broker rather than distributing it to
clients. Any reason you think its better to let the clients handle it?
The way I see it, we'll need to handle key management the way we did
authorization - give an API for interfacing with existing systems.

More important, we need the broker to be able to decrypt and encrypt
in order to support compaction (unless we can find a cool
key-uniqueness-preserving encryption algorithm, but this may not be as
secure). I think we also need the broker to be able to re-compress
data, and since we always encrypt compressed bits (compressing
encrypted bits doesn't compress), we need the broker to decrypt before
re-compressing.



On Fri, Jul 31, 2015 at 2:27 PM, Todd Palino <tp...@gmail.com> wrote:
> It does limit it to clients that have an implementation for encryption,
> however encryption on the client side is better from an auditing point of
> view (whether that is SOX, HIPAA, PCI, or something else). Most of those
> types of standards are based around allowing visibility of data to just the
> people who need it. That includes the admins of the system (who are often
> not the people who use the data).
>
> Additionally, key management is a royal pain, and there are lots of
> different types of systems that one may want to use. This is a pretty big
> complication for the brokers.
>
> -Todd
>
>
> On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira <gs...@cloudera.com> wrote:
>
>> I've seen interest in HDFS-like "encryption zones" in Kafka.
>>
>> This has the advantage of magically encrypting data at rest regardless
>> of which client is used as a producer.
>> Adding it on the client side limits the feature to the java client.
>>
>> Gwen
>>
>> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
>> <eu...@gmail.com> wrote:
>> > I think that Hadoop and Cassandra do [1] (Transparent Encryption)
>> >
>> > We're doing [2] (on a side note, for [2] you still need authentication on
>> > the producer side - you don't want an unauthorized user writing garbage).
>> > Right now we have the 'user' doing the  encryption and submitting raw
>> bytes
>> > to the producer. I was suggesting implementing an encryptor in the
>> > producer itself - I think it's cleaner and can be reused by other users
>> > (instead of having to do their own encryption)
>> >
>> > Cheers,
>> > Eugene
>> >
>> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin <jqin@linkedin.com.invalid
>> >
>> > wrote:
>> >
>> >> I think the goal here is to make the actual message stored on broker to
>> be
>> >> encrypted, because after we have SSL, the transmission would be
>> encrypted.
>> >>
>> >> In general there might be tow approaches:
>> >> 1. Broker do the encryption/decryption
>> >> 2. Client do the encryption/decryption
>> >>
>> >> From performance point of view, I would prefer [2]. It is just in that
>> >> case, maybe user does not necessarily need to use SSL anymore because
>> the
>> >> data would be encrypted anyway.
>> >>
>> >> If we let client do the encryption, there are also two ways to do so -
>> >> either we let producer take an encryptor or users can do
>> >> serialization/encryption outside the producer and send raw bytes. The
>> only
>> >> difference between the two might be flexibility. For example, if someone
>> >> wants to know the actual bytes of a message that got sent over the wire,
>> >> doing it outside the producer would probably more preferable.
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
>> >> eugene.miretsky@gmail.com
>> >> > wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > Based on the security wiki page
>> >> > <https://cwiki.apache.org/confluence/display/KAFKA/Security>
>> encryption
>> >> of
>> >> > data at rest is out of scope for the time being. However, we are
>> >> >  implementing  encryption in Kafka and would like to see if there is
>> >> > interest in submitting a patch got it.
>> >> >
>> >> > I suppose that one way to implement  encryption would be to add an
>> >> > 'encrypted key' field to the Message/MessageSet  structures in the
>> >> > wire protocole - however, this is a very big and fundamental change.
>> >> >
>> >> > A simpler way to add encryption support would be:
>> >> > 1) Custom Serializer, but it wouldn't be compatible with other  custom
>> >> > serializers (Avro, etc. )
>> >> > 2)  Add a step in KafkaProducer after serialization to encrypt the
>> data
>> >> > before it's being submitted to the accumulator (encryption is done in
>> the
>> >> > submitting thread, not in the producer io thread)
>> >> >
>> >> > Is there interest in adding #2 to Kafka?
>> >> >
>> >> > Cheers,
>> >> > Eugene
>> >> >
>> >>
>>

Re: Gauging Interest in adding Encryption to Kafka

Posted by Todd Palino <tp...@gmail.com>.
It does limit it to clients that have an implementation for encryption,
however encryption on the client side is better from an auditing point of
view (whether that is SOX, HIPAA, PCI, or something else). Most of those
types of standards are based around allowing visibility of data to just the
people who need it. That includes the admins of the system (who are often
not the people who use the data).

Additionally, key management is a royal pain, and there are lots of
different types of systems that one may want to use. This is a pretty big
complication for the brokers.

-Todd


On Fri, Jul 31, 2015 at 2:21 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> I've seen interest in HDFS-like "encryption zones" in Kafka.
>
> This has the advantage of magically encrypting data at rest regardless
> of which client is used as a producer.
> Adding it on the client side limits the feature to the java client.
>
> Gwen
>
> On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
> <eu...@gmail.com> wrote:
> > I think that Hadoop and Cassandra do [1] (Transparent Encryption)
> >
> > We're doing [2] (on a side note, for [2] you still need authentication on
> > the producer side - you don't want an unauthorized user writing garbage).
> > Right now we have the 'user' doing the  encryption and submitting raw
> bytes
> > to the producer. I was suggesting implementing an encryptor in the
> > producer itself - I think it's cleaner and can be reused by other users
> > (instead of having to do their own encryption)
> >
> > Cheers,
> > Eugene
> >
> > On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> > wrote:
> >
> >> I think the goal here is to make the actual message stored on broker to
> be
> >> encrypted, because after we have SSL, the transmission would be
> encrypted.
> >>
> >> In general there might be tow approaches:
> >> 1. Broker do the encryption/decryption
> >> 2. Client do the encryption/decryption
> >>
> >> From performance point of view, I would prefer [2]. It is just in that
> >> case, maybe user does not necessarily need to use SSL anymore because
> the
> >> data would be encrypted anyway.
> >>
> >> If we let client do the encryption, there are also two ways to do so -
> >> either we let producer take an encryptor or users can do
> >> serialization/encryption outside the producer and send raw bytes. The
> only
> >> difference between the two might be flexibility. For example, if someone
> >> wants to know the actual bytes of a message that got sent over the wire,
> >> doing it outside the producer would probably more preferable.
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
> >> eugene.miretsky@gmail.com
> >> > wrote:
> >>
> >> > Hi,
> >> >
> >> > Based on the security wiki page
> >> > <https://cwiki.apache.org/confluence/display/KAFKA/Security>
> encryption
> >> of
> >> > data at rest is out of scope for the time being. However, we are
> >> >  implementing  encryption in Kafka and would like to see if there is
> >> > interest in submitting a patch got it.
> >> >
> >> > I suppose that one way to implement  encryption would be to add an
> >> > 'encrypted key' field to the Message/MessageSet  structures in the
> >> > wire protocole - however, this is a very big and fundamental change.
> >> >
> >> > A simpler way to add encryption support would be:
> >> > 1) Custom Serializer, but it wouldn't be compatible with other  custom
> >> > serializers (Avro, etc. )
> >> > 2)  Add a step in KafkaProducer after serialization to encrypt the
> data
> >> > before it's being submitted to the accumulator (encryption is done in
> the
> >> > submitting thread, not in the producer io thread)
> >> >
> >> > Is there interest in adding #2 to Kafka?
> >> >
> >> > Cheers,
> >> > Eugene
> >> >
> >>
>

Re: Gauging Interest in adding Encryption to Kafka

Posted by Gwen Shapira <gs...@cloudera.com>.
I've seen interest in HDFS-like "encryption zones" in Kafka.

This has the advantage of magically encrypting data at rest regardless
of which client is used as a producer.
Adding it on the client side limits the feature to the java client.

Gwen

On Fri, Jul 31, 2015 at 1:20 PM, eugene miretsky
<eu...@gmail.com> wrote:
> I think that Hadoop and Cassandra do [1] (Transparent Encryption)
>
> We're doing [2] (on a side note, for [2] you still need authentication on
> the producer side - you don't want an unauthorized user writing garbage).
> Right now we have the 'user' doing the  encryption and submitting raw bytes
> to the producer. I was suggesting implementing an encryptor in the
> producer itself - I think it's cleaner and can be reused by other users
> (instead of having to do their own encryption)
>
> Cheers,
> Eugene
>
> On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
>> I think the goal here is to make the actual message stored on broker to be
>> encrypted, because after we have SSL, the transmission would be encrypted.
>>
>> In general there might be tow approaches:
>> 1. Broker do the encryption/decryption
>> 2. Client do the encryption/decryption
>>
>> From performance point of view, I would prefer [2]. It is just in that
>> case, maybe user does not necessarily need to use SSL anymore because the
>> data would be encrypted anyway.
>>
>> If we let client do the encryption, there are also two ways to do so -
>> either we let producer take an encryptor or users can do
>> serialization/encryption outside the producer and send raw bytes. The only
>> difference between the two might be flexibility. For example, if someone
>> wants to know the actual bytes of a message that got sent over the wire,
>> doing it outside the producer would probably more preferable.
>>
>> Jiangjie (Becket) Qin
>>
>> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
>> eugene.miretsky@gmail.com
>> > wrote:
>>
>> > Hi,
>> >
>> > Based on the security wiki page
>> > <https://cwiki.apache.org/confluence/display/KAFKA/Security> encryption
>> of
>> > data at rest is out of scope for the time being. However, we are
>> >  implementing  encryption in Kafka and would like to see if there is
>> > interest in submitting a patch got it.
>> >
>> > I suppose that one way to implement  encryption would be to add an
>> > 'encrypted key' field to the Message/MessageSet  structures in the
>> > wire protocole - however, this is a very big and fundamental change.
>> >
>> > A simpler way to add encryption support would be:
>> > 1) Custom Serializer, but it wouldn't be compatible with other  custom
>> > serializers (Avro, etc. )
>> > 2)  Add a step in KafkaProducer after serialization to encrypt the data
>> > before it's being submitted to the accumulator (encryption is done in the
>> > submitting thread, not in the producer io thread)
>> >
>> > Is there interest in adding #2 to Kafka?
>> >
>> > Cheers,
>> > Eugene
>> >
>>

Re: Gauging Interest in adding Encryption to Kafka

Posted by eugene miretsky <eu...@gmail.com>.
I think that Hadoop and Cassandra do [1] (Transparent Encryption)

We're doing [2] (on a side note, for [2] you still need authentication on
the producer side - you don't want an unauthorized user writing garbage).
Right now we have the 'user' doing the  encryption and submitting raw bytes
to the producer. I was suggesting implementing an encryptor in the
producer itself - I think it's cleaner and can be reused by other users
(instead of having to do their own encryption)

Cheers,
Eugene

On Fri, Jul 31, 2015 at 4:04 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> I think the goal here is to make the actual message stored on broker to be
> encrypted, because after we have SSL, the transmission would be encrypted.
>
> In general there might be tow approaches:
> 1. Broker do the encryption/decryption
> 2. Client do the encryption/decryption
>
> From performance point of view, I would prefer [2]. It is just in that
> case, maybe user does not necessarily need to use SSL anymore because the
> data would be encrypted anyway.
>
> If we let client do the encryption, there are also two ways to do so -
> either we let producer take an encryptor or users can do
> serialization/encryption outside the producer and send raw bytes. The only
> difference between the two might be flexibility. For example, if someone
> wants to know the actual bytes of a message that got sent over the wire,
> doing it outside the producer would probably more preferable.
>
> Jiangjie (Becket) Qin
>
> On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <
> eugene.miretsky@gmail.com
> > wrote:
>
> > Hi,
> >
> > Based on the security wiki page
> > <https://cwiki.apache.org/confluence/display/KAFKA/Security> encryption
> of
> > data at rest is out of scope for the time being. However, we are
> >  implementing  encryption in Kafka and would like to see if there is
> > interest in submitting a patch got it.
> >
> > I suppose that one way to implement  encryption would be to add an
> > 'encrypted key' field to the Message/MessageSet  structures in the
> > wire protocole - however, this is a very big and fundamental change.
> >
> > A simpler way to add encryption support would be:
> > 1) Custom Serializer, but it wouldn't be compatible with other  custom
> > serializers (Avro, etc. )
> > 2)  Add a step in KafkaProducer after serialization to encrypt the data
> > before it's being submitted to the accumulator (encryption is done in the
> > submitting thread, not in the producer io thread)
> >
> > Is there interest in adding #2 to Kafka?
> >
> > Cheers,
> > Eugene
> >
>

Re: Gauging Interest in adding Encryption to Kafka

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
I think the goal here is to make the actual message stored on broker to be
encrypted, because after we have SSL, the transmission would be encrypted.

In general there might be tow approaches:
1. Broker do the encryption/decryption
2. Client do the encryption/decryption

>From performance point of view, I would prefer [2]. It is just in that
case, maybe user does not necessarily need to use SSL anymore because the
data would be encrypted anyway.

If we let client do the encryption, there are also two ways to do so -
either we let producer take an encryptor or users can do
serialization/encryption outside the producer and send raw bytes. The only
difference between the two might be flexibility. For example, if someone
wants to know the actual bytes of a message that got sent over the wire,
doing it outside the producer would probably more preferable.

Jiangjie (Becket) Qin

On Thu, Jul 30, 2015 at 12:16 PM, eugene miretsky <eugene.miretsky@gmail.com
> wrote:

> Hi,
>
> Based on the security wiki page
> <https://cwiki.apache.org/confluence/display/KAFKA/Security> encryption of
> data at rest is out of scope for the time being. However, we are
>  implementing  encryption in Kafka and would like to see if there is
> interest in submitting a patch got it.
>
> I suppose that one way to implement  encryption would be to add an
> 'encrypted key' field to the Message/MessageSet  structures in the
> wire protocole - however, this is a very big and fundamental change.
>
> A simpler way to add encryption support would be:
> 1) Custom Serializer, but it wouldn't be compatible with other  custom
> serializers (Avro, etc. )
> 2)  Add a step in KafkaProducer after serialization to encrypt the data
> before it's being submitted to the accumulator (encryption is done in the
> submitting thread, not in the producer io thread)
>
> Is there interest in adding #2 to Kafka?
>
> Cheers,
> Eugene
>