You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Stefan Miklosovic <st...@instaclustr.com> on 2021/11/13 11:00:47 UTC

Resurrection of CASSANDRA-9633 - SSTable encryption

Hi list,

an engineer from Intel - Shylaja Kokoori (who is watching this list
closely) has retrofitted the original code from CASSANDRA-9633 work in
times of 3.4 to the current trunk with my help here and there, mostly
cosmetic.

I would like to know if there is a general consensus about me going to
create a CEP for this feature or what is your perception on this. I
know we have it a little bit backwards here as we should first discuss
and then code but I am super glad that we have some POC we can
elaborate further on and CEP would just cement  and summarise the
approach / other implementation aspects of this feature.

I think that having 9633 merged will fill quite a big operational gap
when it comes to security. There are a lot of enterprises who desire
this feature so much. I can not remember when I last saw a ticket with
50 watchers which was inactive for such a long time.

Regards

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "J. D. Jordan" <je...@gmail.com>.

Another comment here. I tried to find the patch to check but couldn’t find it linked to the ticket. If it is not already, given the TDE key class is pluggable in the yaml, when a file is written everything need to instantiate the class to decrypt it should be in the metadata. Just like happens now for compression. So if someone switches to a different TDE class you can still know to instantiate the old one to read existing files.  The class from the yaml config should just be used for encrypting new files.

> On Nov 14, 2021, at 3:54 PM, Stefan Miklosovic <st...@instaclustr.com> wrote:
> 
> Hey,
> 
> there are two points we are not completely sure about.
> 
> The first one is streaming. If there is a cluster of 5 nodes, each
> node has its own unique encryption key. Hence, if a SSTable is stored
> on a disk with the key for node 1 and this is streamed to node 2 -
> which has a different key - it would not be able to decrypt that. Our
> idea is to actually send data over the wire _decrypted_ however it
> would be still secure if internode communication is done via TLS. Is
> this approach good with you?
> 
> The second question is about key rotation. If an operator needs to
> roll the key because it was compromised or there is some policy around
> that, we should be able to provide some way to rotate it. Our idea is
> to write a tool (either a subcommand of nodetool (rewritesstables)
> command or a completely standalone one in tools) which would take the
> first, original key, the second, new key and dir with sstables as
> input and it would literally took the data and it would rewrite it to
> the second set of sstables which would be encrypted with the second
> key. What do you think about this?
> 
> Regards
> 
>> On Sat, 13 Nov 2021 at 19:35, <sc...@paradoxica.net> wrote:
>> 
>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>> 
>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>> 
>> – Scott
>> 
>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams <dr...@gmail.com> wrote:
>>> 
>>> We already have a ticket and this predated CEPs, and being an
>>> obviously good improvement to have that many have been asking for for
>>> some time now, I don't see the need for a CEP here.
>>> 
>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>> <st...@instaclustr.com> wrote:
>>>> 
>>>> Hi list,
>>>> 
>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>> cosmetic.
>>>> 
>>>> I would like to know if there is a general consensus about me going to
>>>> create a CEP for this feature or what is your perception on this. I
>>>> know we have it a little bit backwards here as we should first discuss
>>>> and then code but I am super glad that we have some POC we can
>>>> elaborate further on and CEP would just cement  and summarise the
>>>> approach / other implementation aspects of this feature.
>>>> 
>>>> I think that having 9633 merged will fill quite a big operational gap
>>>> when it comes to security. There are a lot of enterprises who desire
>>>> this feature so much. I can not remember when I last saw a ticket with
>>>> 50 watchers which was inactive for such a long time.
>>>> 
>>>> Regards
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "benedict@apache.org" <be...@apache.org>.

If decrypting before transmission we’ll want to require the cluster to have an internode authenticator setup, else a nefarious process could simply ask for data to be streamed to it to circumvent the encryption.

I agree it would be nice to have the nodes share the secret some way to avoid the additional performance penalties and potential security issues associated with decryption/encryption for transmission, but I don’t have any practical experience securely sharing secrets between nodes, so I’ll leave that decision to those who do.

From: Jeremiah D Jordan <je...@datastax.com>
Date: Monday, 15 November 2021 at 22:09
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption

> On Nov 15, 2021, at 2:25 PM, Stefan Miklosovic <st...@instaclustr.com> wrote:
>
> On Mon, 15 Nov 2021 at 19:42, Jeremiah D Jordan
> <jeremiah.jordan@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>
>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic <st...@instaclustr.com> wrote:
>>>
>>> Hey,
>>>
>>> there are two points we are not completely sure about.
>>>
>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>> which has a different key - it would not be able to decrypt that. Our
>>> idea is to actually send data over the wire _decrypted_ however it
>>> would be still secure if internode communication is done via TLS. Is
>>> this approach good with you?
>>>
>>
>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>
>
> Yes, I would likely fail the start when encryption is enabled and
> there is no TLS between nodes and yes, zero copy streaming should not
> be triggered here.
>
> I have not considered that distribution. Honestly this seems like a
> very complex setup. Due to the nature of Cassandra how one can easily
> add / remove nodes, there would be a lot of hassle to distribute the
> key of a new node to all other nodes somehow conveniently. I can't
> even imagine how it would look in practice.

One since the default implementation seems to be to just read the key out of a keystore, you could require having the same keystore on every node, and then it could just be the same key used by all nodes.  Or there could actually be a different key for each node as the “key_alias” in the yaml, but every node has all the other nodes keys in its local keystore still.  When adding a new node with a new key the operator would have to go add the new key to each existing keystore before adding the new node to the cluster.

Another method could be to get the keys from a key server rather than a local file.

But yes decrypt before streaming, stream encrypted using TLS, re-encrypt before writing to disk is an option.  Just one that requires losing all the performance advantages gained when “do not decompress streaming” and then later “zero copy streaming” were implemented.

>
>>> The second question is about key rotation. If an operator needs to
>>> roll the key because it was compromised or there is some policy around
>>> that, we should be able to provide some way to rotate it. Our idea is
>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>> command or a completely standalone one in tools) which would take the
>>> first, original key, the second, new key and dir with sstables as
>>> input and it would literally took the data and it would rewrite it to
>>> the second set of sstables which would be encrypted with the second
>>> key. What do you think about this?
>>
>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>
> How would this key be added when Cassanra runs? Via JMX? So that means
> JMX itself has to be secure to send that key to it or it would not
> make sense. Or adding a new key would mean a node needs to go down and
> we would somehow configure it on startup? But if a node needs to go
> down first, we can just rewrite these tables while it is offline and
> there is no need to do it that way.

Given the current comments in the cassandra.yaml I would expect it to work like I just said:
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1328-L1332 <https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1328-L1332>

# Enables encrypting data at-rest (on disk). Different key providers can be plugged in, but the default reads from
# a JCE-style keystore. A single keystore can hold multiple keys, but the one referenced by
# the "key_alias" is the only key that will be used for encrypt opertaions; previously used keys
# can still (and should!) be in the keystore and will be used on decrypt operations
# (to handle the case of key rotation).

The intended method to rotate keys from that comments seems to be:
1. Stop the node
2. Adds a new key to the keystore
3. Changes the “key_alias” to reference the new key
4. Start the node

All newly encrypted things would use the new key, existing things would use metadata to get at the previous key they were encrypted with.  And then with such a method running “nodetool upgradesstables --all” or doing an offline “upgradesstables” would rewrite things with the new key.

>
> The tangential topic to this problem is if we are trying to do this
> while the whole cluster is fully up and operational or we are ok to
> bring that respective node down for a while to rewrite sstables for
> it. I consider the "always up" scenario very complex to nail down
> correctly and that is not a task I can do on my own with my current
> understanding of Cassandra codebase.

I think we need to be able to do this with the nodes up and running, per the procedure described above.  It can take a long while to decrypt and re-encrypt 2 TB of data, so I would not want to keep a node down long enough to do that.

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Jeremiah D Jordan <je...@datastax.com>.

> On Nov 15, 2021, at 2:25 PM, Stefan Miklosovic <st...@instaclustr.com> wrote:
> 
> On Mon, 15 Nov 2021 at 19:42, Jeremiah D Jordan
> <jeremiah.jordan@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> 
>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic <st...@instaclustr.com> wrote:
>>> 
>>> Hey,
>>> 
>>> there are two points we are not completely sure about.
>>> 
>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>> which has a different key - it would not be able to decrypt that. Our
>>> idea is to actually send data over the wire _decrypted_ however it
>>> would be still secure if internode communication is done via TLS. Is
>>> this approach good with you?
>>> 
>> 
>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>> 
> 
> Yes, I would likely fail the start when encryption is enabled and
> there is no TLS between nodes and yes, zero copy streaming should not
> be triggered here.
> 
> I have not considered that distribution. Honestly this seems like a
> very complex setup. Due to the nature of Cassandra how one can easily
> add / remove nodes, there would be a lot of hassle to distribute the
> key of a new node to all other nodes somehow conveniently. I can't
> even imagine how it would look in practice.

One since the default implementation seems to be to just read the key out of a keystore, you could require having the same keystore on every node, and then it could just be the same key used by all nodes.  Or there could actually be a different key for each node as the “key_alias” in the yaml, but every node has all the other nodes keys in its local keystore still.  When adding a new node with a new key the operator would have to go add the new key to each existing keystore before adding the new node to the cluster.

Another method could be to get the keys from a key server rather than a local file.

But yes decrypt before streaming, stream encrypted using TLS, re-encrypt before writing to disk is an option.  Just one that requires losing all the performance advantages gained when “do not decompress streaming” and then later “zero copy streaming” were implemented.

> 
>>> The second question is about key rotation. If an operator needs to
>>> roll the key because it was compromised or there is some policy around
>>> that, we should be able to provide some way to rotate it. Our idea is
>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>> command or a completely standalone one in tools) which would take the
>>> first, original key, the second, new key and dir with sstables as
>>> input and it would literally took the data and it would rewrite it to
>>> the second set of sstables which would be encrypted with the second
>>> key. What do you think about this?
>> 
>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> 
> How would this key be added when Cassanra runs? Via JMX? So that means
> JMX itself has to be secure to send that key to it or it would not
> make sense. Or adding a new key would mean a node needs to go down and
> we would somehow configure it on startup? But if a node needs to go
> down first, we can just rewrite these tables while it is offline and
> there is no need to do it that way.

Given the current comments in the cassandra.yaml I would expect it to work like I just said:
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1328-L1332 <https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1328-L1332>

# Enables encrypting data at-rest (on disk). Different key providers can be plugged in, but the default reads from
# a JCE-style keystore. A single keystore can hold multiple keys, but the one referenced by
# the "key_alias" is the only key that will be used for encrypt opertaions; previously used keys
# can still (and should!) be in the keystore and will be used on decrypt operations
# (to handle the case of key rotation).

The intended method to rotate keys from that comments seems to be:
1. Stop the node
2. Adds a new key to the keystore
3. Changes the “key_alias” to reference the new key
4. Start the node

All newly encrypted things would use the new key, existing things would use metadata to get at the previous key they were encrypted with.  And then with such a method running “nodetool upgradesstables --all” or doing an offline “upgradesstables” would rewrite things with the new key.

> 
> The tangential topic to this problem is if we are trying to do this
> while the whole cluster is fully up and operational or we are ok to
> bring that respective node down for a while to rewrite sstables for
> it. I consider the "always up" scenario very complex to nail down
> correctly and that is not a task I can do on my own with my current
> understanding of Cassandra codebase.

I think we need to be able to do this with the nodes up and running, per the procedure described above.  It can take a long while to decrypt and re-encrypt 2 TB of data, so I would not want to keep a node down long enough to do that.

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

On Mon, 15 Nov 2021 at 19:42, Jeremiah D Jordan
<je...@gmail.com> wrote:
>
>
>
> > On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic <st...@instaclustr.com> wrote:
> >
> > Hey,
> >
> > there are two points we are not completely sure about.
> >
> > The first one is streaming. If there is a cluster of 5 nodes, each
> > node has its own unique encryption key. Hence, if a SSTable is stored
> > on a disk with the key for node 1 and this is streamed to node 2 -
> > which has a different key - it would not be able to decrypt that. Our
> > idea is to actually send data over the wire _decrypted_ however it
> > would be still secure if internode communication is done via TLS. Is
> > this approach good with you?
> >
>
> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>

Yes, I would likely fail the start when encryption is enabled and
there is no TLS between nodes and yes, zero copy streaming should not
be triggered here.

I have not considered that distribution. Honestly this seems like a
very complex setup. Due to the nature of Cassandra how one can easily
add / remove nodes, there would be a lot of hassle to distribute the
key of a new node to all other nodes somehow conveniently. I can't
even imagine how it would look in practice.

> > The second question is about key rotation. If an operator needs to
> > roll the key because it was compromised or there is some policy around
> > that, we should be able to provide some way to rotate it. Our idea is
> > to write a tool (either a subcommand of nodetool (rewritesstables)
> > command or a completely standalone one in tools) which would take the
> > first, original key, the second, new key and dir with sstables as
> > input and it would literally took the data and it would rewrite it to
> > the second set of sstables which would be encrypted with the second
> > key. What do you think about this?
>
> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.

How would this key be added when Cassanra runs? Via JMX? So that means
JMX itself has to be secure to send that key to it or it would not
make sense. Or adding a new key would mean a node needs to go down and
we would somehow configure it on startup? But if a node needs to go
down first, we can just rewrite these tables while it is offline and
there is no need to do it that way.

The tangential topic to this problem is if we are trying to do this
while the whole cluster is fully up and operational or we are ok to
bring that respective node down for a while to rewrite sstables for
it. I consider the "always up" scenario very complex to nail down
correctly and that is not a task I can do on my own with my current
understanding of Cassandra codebase.

> >
> > Regards
> >
> > On Sat, 13 Nov 2021 at 19:35, <sc...@paradoxica.net> wrote:
> >>
> >> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
> >>
> >> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
> >>
> >> – Scott
> >>
> >>> On Nov 13, 2021, at 7:53 AM, Brandon Williams <dr...@gmail.com> wrote:
> >>>
> >>> We already have a ticket and this predated CEPs, and being an
> >>> obviously good improvement to have that many have been asking for for
> >>> some time now, I don't see the need for a CEP here.
> >>>
> >>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> >>> <st...@instaclustr.com> wrote:
> >>>>
> >>>> Hi list,
> >>>>
> >>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >>>> times of 3.4 to the current trunk with my help here and there, mostly
> >>>> cosmetic.
> >>>>
> >>>> I would like to know if there is a general consensus about me going to
> >>>> create a CEP for this feature or what is your perception on this. I
> >>>> know we have it a little bit backwards here as we should first discuss
> >>>> and then code but I am super glad that we have some POC we can
> >>>> elaborate further on and CEP would just cement  and summarise the
> >>>> approach / other implementation aspects of this feature.
> >>>>
> >>>> I think that having 9633 merged will fill quite a big operational gap
> >>>> when it comes to security. There are a lot of enterprises who desire
> >>>> this feature so much. I can not remember when I last saw a ticket with
> >>>> 50 watchers which was inactive for such a long time.
> >>>>
> >>>> Regards
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

Then you are reusing the same KEK for all SSTable files belong to the 
same Cassandra table.

The reason to have KEK derived from some unique information is to avoid 
reusing keys which may open up some attack vectors.

On that thought, table UUID+GEN is actually not good enough, because the 
table UUID is the same across all nodes and the GEN is only unique on a 
given node. The proper solution may require adding an additional UUID 
field to each SSTable file header, and then use that UUID in the KDF. If 
this is implemented, no additional information will need to be send 
during a streaming session, as the receiving end will have received the 
SSTable file with the header information anyway.

On 16/11/2021 13:05, Stefan Miklosovic wrote:
> Ok, but this does not need to be something which is _explicitly_ sent
> to it as I believe a receiving node can derive this on its own - if we
> way that gen is a hash of keyspace + table + table id, for example
> (which is same across the cluster for each node).
>
> On Tue, 16 Nov 2021 at 13:55, Bowen Song <bo...@bso.ng.invalid> wrote:
>> If the same user chosen key Km is used across all nodes in the same
>> cluster, the sender will only need to share their SSTable generation GEN
>> with the receiving side. This is because the receiving side will need to
>> use the GEN to reproduce the KEK used in the source node. The receiving
>> side will then need to unwrap Kr with the KEK and re-wrap it with a new
>> KEK' derived from their own GEN. GEN is not considered as a secret.
>>
>>
>> On 16/11/2021 12:13, Stefan Miklosovic wrote:
>>> Thanks for the insights of everybody.
>>>
>>> I would like to return to Km. If we require that all Km's are the same
>>> before streaming, is it not true that we do not need to move any
>>> secrets around at all? So TLS would not be required either as only
>>> encrypted tables would ever be streamed. That way Kr would never ever
>>> leave the node and new Km would be rolled over first. To use correct
>>> Km, we would have hash of that upon received table from the
>>> recipient's perspective. This would also avoid the fairly complex
>>> algorithm in the last Bowen's reply when I got that right.
>>>
>>> On Tue, 16 Nov 2021 at 13:02, benedict@apache.org <be...@apache.org> wrote:
>>>> We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured.
>>>>
>>>> It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest.
>>>>
>>>>
>>>> From: Bowen Song <bo...@bso.ng.INVALID>
>>>> Date: Tuesday, 16 November 2021 at 11:56
>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>>>> I think authenticating a receiving node is important, but it is perhaps
>>>> not in the scope of this ticket (or CEP if it becomes one). This applies
>>>> to not only encrypted SSTables, but also unencrypted SSTables. A
>>>> malicious node can join the cluster and send bogus requests to other
>>>> nodes is a general problem not specific to the on-disk encryption.
>>>>
>>>> On 16/11/2021 10:50, benedict@apache.org wrote:
>>>>> I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
>>>>>
>>>>> Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
>>>>>
>>>>>
>>>>> From: Stefan Miklosovic <st...@instaclustr.com>
>>>>> Date: Tuesday, 16 November 2021 at 10:45
>>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>>>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>>>>> Ok but this also means that Km would need to be the same for all nodes right?
>>>>>
>>>>> If we are rolling in node by node fashion, Km is changed at node 1, we
>>>>> change the wrapped key which is stored on disk and we stream this
>>>>> table to the other node which is still on the old Km. Would this work?
>>>>> I think we would need to rotate first before anything is streamed. Or
>>>>> no?
>>>>>
>>>>> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>> Yes, that's correct. The actual key used to encrypt the SSTable will
>>>>>> stay the same once the SSTable is created. This is a widely used
>>>>>> practice in many encrypt-at-rest applications. One good example is the
>>>>>> LUKS full disk encryption, which also supports multiple keys to unlock
>>>>>> (decrypt) the same data. Multiple unlocking keys is only possible
>>>>>> because the actual key used to encrypt the data is randomly generated
>>>>>> and then stored encrypted by (a key derived from) a user chosen key.
>>>>>>
>>>>>> If this approach is adopted, the streaming process can share the Kr
>>>>>> without disclosing the Km, therefore enableling zero-copy streaming.
>>>>>>
>>>>>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
>>>>>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
>>>>>>> key for the actual sstable encryption would be always the same, it is
>>>>>>> just what is wrapped would differ. So if we rotate, we basically only
>>>>>>> change Km hence KEK hence the result of wrapping but there would still
>>>>>>> be the original Kr key used.
>>>>>>>
>>>>>>> Jeremiah - I will prepare that branch very soon.
>>>>>>>
>>>>>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>>>>>         The second question is about key rotation. If an operator needs to
>>>>>>>>>         roll the key because it was compromised or there is some policy around
>>>>>>>>>         that, we should be able to provide some way to rotate it. Our idea is
>>>>>>>>>         to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>>>>         command or a completely standalone one in tools) which would take the
>>>>>>>>>         first, original key, the second, new key and dir with sstables as
>>>>>>>>>         input and it would literally took the data and it would rewrite it to
>>>>>>>>>         the second set of sstables which would be encrypted with the second
>>>>>>>>>         key. What do you think about this?
>>>>>>>>         I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>>>>
>>>>>>>> There's a much better approach to solve this issue. You can stored a
>>>>>>>> wrapped key in an encryption info file alone side the SSTable file.
>>>>>>>> Here's how it works:
>>>>>>>> 1. randomly generate a key Kr
>>>>>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>>>>>>>> file on disk
>>>>>>>> 3. derive a key encryption key KEK from the SSTable file's information
>>>>>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>>>>>>>> you have KEK = KDF(UUID+GEN, Km)
>>>>>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>>>>>>>> 5. hash the Km, the hash will used as a key ID to identify which master
>>>>>>>> key was used to encrypt the key Kr if the server has multiple master
>>>>>>>> keys in use
>>>>>>>> 6. store the the WKr and the hash of Km in a separate file alone side
>>>>>>>> the SSTable file
>>>>>>>>
>>>>>>>> In the read path, the Kr should be kept in memory to help improve
>>>>>>>> performance and this will also allow zero-downtime master key rotation.
>>>>>>>>
>>>>>>>> During a key rotation:
>>>>>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>>>>>>>> 2. read the WKr from the encryption information file, and unwrap
>>>>>>>> (decrypt) it using the KEK to get the Kr
>>>>>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
>>>>>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>>>>>>>> 5. hash the new master key Km', and store it together with the WKr' in
>>>>>>>> the encryption info file
>>>>>>>>
>>>>>>>> Since the key rotation only involves rewriting the encryption info file,
>>>>>>>> the operation should take only a few milliseconds per SSTable file, it
>>>>>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>>>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>>>>>>>
>>>>>>>>>> Hey,
>>>>>>>>>>
>>>>>>>>>> there are two points we are not completely sure about.
>>>>>>>>>>
>>>>>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>>>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>>>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>>>>>>>> which has a different key - it would not be able to decrypt that. Our
>>>>>>>>>> idea is to actually send data over the wire _decrypted_ however it
>>>>>>>>>> would be still secure if internode communication is done via TLS. Is
>>>>>>>>>> this approach good with you?
>>>>>>>>>>
>>>>>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>>>>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>>>>>>>
>>>>>>>>>> The second question is about key rotation. If an operator needs to
>>>>>>>>>> roll the key because it was compromised or there is some policy around
>>>>>>>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>>>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>>>>> command or a completely standalone one in tools) which would take the
>>>>>>>>>> first, original key, the second, new key and dir with sstables as
>>>>>>>>>> input and it would literally took the data and it would rewrite it to
>>>>>>>>>> the second set of sstables which would be encrypted with the second
>>>>>>>>>> key. What do you think about this?
>>>>>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>>>>>>>
>>>>>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>>>>>>>
>>>>>>>>>>> – Scott
>>>>>>>>>>>
>>>>>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>>>>>>>> obviously good improvement to have that many have been asking for for
>>>>>>>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>>>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>>>>>>>> Hi list,
>>>>>>>>>>>>>
>>>>>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>>>>>>>> cosmetic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Ok, but this does not need to be something which is _explicitly_ sent
to it as I believe a receiving node can derive this on its own - if we
way that gen is a hash of keyspace + table + table id, for example
(which is same across the cluster for each node).

On Tue, 16 Nov 2021 at 13:55, Bowen Song <bo...@bso.ng.invalid> wrote:
>
> If the same user chosen key Km is used across all nodes in the same
> cluster, the sender will only need to share their SSTable generation GEN
> with the receiving side. This is because the receiving side will need to
> use the GEN to reproduce the KEK used in the source node. The receiving
> side will then need to unwrap Kr with the KEK and re-wrap it with a new
> KEK' derived from their own GEN. GEN is not considered as a secret.
>
>
> On 16/11/2021 12:13, Stefan Miklosovic wrote:
> > Thanks for the insights of everybody.
> >
> > I would like to return to Km. If we require that all Km's are the same
> > before streaming, is it not true that we do not need to move any
> > secrets around at all? So TLS would not be required either as only
> > encrypted tables would ever be streamed. That way Kr would never ever
> > leave the node and new Km would be rolled over first. To use correct
> > Km, we would have hash of that upon received table from the
> > recipient's perspective. This would also avoid the fairly complex
> > algorithm in the last Bowen's reply when I got that right.
> >
> > On Tue, 16 Nov 2021 at 13:02, benedict@apache.org <be...@apache.org> wrote:
> >> We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured.
> >>
> >> It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest.
> >>
> >>
> >> From: Bowen Song <bo...@bso.ng.INVALID>
> >> Date: Tuesday, 16 November 2021 at 11:56
> >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> >> I think authenticating a receiving node is important, but it is perhaps
> >> not in the scope of this ticket (or CEP if it becomes one). This applies
> >> to not only encrypted SSTables, but also unencrypted SSTables. A
> >> malicious node can join the cluster and send bogus requests to other
> >> nodes is a general problem not specific to the on-disk encryption.
> >>
> >> On 16/11/2021 10:50, benedict@apache.org wrote:
> >>> I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
> >>>
> >>> Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
> >>>
> >>>
> >>> From: Stefan Miklosovic <st...@instaclustr.com>
> >>> Date: Tuesday, 16 November 2021 at 10:45
> >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> >>> Ok but this also means that Km would need to be the same for all nodes right?
> >>>
> >>> If we are rolling in node by node fashion, Km is changed at node 1, we
> >>> change the wrapped key which is stored on disk and we stream this
> >>> table to the other node which is still on the old Km. Would this work?
> >>> I think we would need to rotate first before anything is streamed. Or
> >>> no?
> >>>
> >>> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
> >>>> Yes, that's correct. The actual key used to encrypt the SSTable will
> >>>> stay the same once the SSTable is created. This is a widely used
> >>>> practice in many encrypt-at-rest applications. One good example is the
> >>>> LUKS full disk encryption, which also supports multiple keys to unlock
> >>>> (decrypt) the same data. Multiple unlocking keys is only possible
> >>>> because the actual key used to encrypt the data is randomly generated
> >>>> and then stored encrypted by (a key derived from) a user chosen key.
> >>>>
> >>>> If this approach is adopted, the streaming process can share the Kr
> >>>> without disclosing the Km, therefore enableling zero-copy streaming.
> >>>>
> >>>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
> >>>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
> >>>>> key for the actual sstable encryption would be always the same, it is
> >>>>> just what is wrapped would differ. So if we rotate, we basically only
> >>>>> change Km hence KEK hence the result of wrapping but there would still
> >>>>> be the original Kr key used.
> >>>>>
> >>>>> Jeremiah - I will prepare that branch very soon.
> >>>>>
> >>>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
> >>>>>>>        The second question is about key rotation. If an operator needs to
> >>>>>>>        roll the key because it was compromised or there is some policy around
> >>>>>>>        that, we should be able to provide some way to rotate it. Our idea is
> >>>>>>>        to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>>>>>        command or a completely standalone one in tools) which would take the
> >>>>>>>        first, original key, the second, new key and dir with sstables as
> >>>>>>>        input and it would literally took the data and it would rewrite it to
> >>>>>>>        the second set of sstables which would be encrypted with the second
> >>>>>>>        key. What do you think about this?
> >>>>>>        I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>>>>>
> >>>>>> There's a much better approach to solve this issue. You can stored a
> >>>>>> wrapped key in an encryption info file alone side the SSTable file.
> >>>>>> Here's how it works:
> >>>>>> 1. randomly generate a key Kr
> >>>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
> >>>>>> file on disk
> >>>>>> 3. derive a key encryption key KEK from the SSTable file's information
> >>>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
> >>>>>> you have KEK = KDF(UUID+GEN, Km)
> >>>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
> >>>>>> 5. hash the Km, the hash will used as a key ID to identify which master
> >>>>>> key was used to encrypt the key Kr if the server has multiple master
> >>>>>> keys in use
> >>>>>> 6. store the the WKr and the hash of Km in a separate file alone side
> >>>>>> the SSTable file
> >>>>>>
> >>>>>> In the read path, the Kr should be kept in memory to help improve
> >>>>>> performance and this will also allow zero-downtime master key rotation.
> >>>>>>
> >>>>>> During a key rotation:
> >>>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
> >>>>>> 2. read the WKr from the encryption information file, and unwrap
> >>>>>> (decrypt) it using the KEK to get the Kr
> >>>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
> >>>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
> >>>>>> 5. hash the new master key Km', and store it together with the WKr' in
> >>>>>> the encryption info file
> >>>>>>
> >>>>>> Since the key rotation only involves rewriting the encryption info file,
> >>>>>> the operation should take only a few milliseconds per SSTable file, it
> >>>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
> >>>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
> >>>>>>>>
> >>>>>>>> Hey,
> >>>>>>>>
> >>>>>>>> there are two points we are not completely sure about.
> >>>>>>>>
> >>>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
> >>>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
> >>>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
> >>>>>>>> which has a different key - it would not be able to decrypt that. Our
> >>>>>>>> idea is to actually send data over the wire _decrypted_ however it
> >>>>>>>> would be still secure if internode communication is done via TLS. Is
> >>>>>>>> this approach good with you?
> >>>>>>>>
> >>>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> >>>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
> >>>>>>>
> >>>>>>>> The second question is about key rotation. If an operator needs to
> >>>>>>>> roll the key because it was compromised or there is some policy around
> >>>>>>>> that, we should be able to provide some way to rotate it. Our idea is
> >>>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>>>>>> command or a completely standalone one in tools) which would take the
> >>>>>>>> first, original key, the second, new key and dir with sstables as
> >>>>>>>> input and it would literally took the data and it would rewrite it to
> >>>>>>>> the second set of sstables which would be encrypted with the second
> >>>>>>>> key. What do you think about this?
> >>>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>>>>>>
> >>>>>>>> Regards
> >>>>>>>>
> >>>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
> >>>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
> >>>>>>>>>
> >>>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
> >>>>>>>>>
> >>>>>>>>> – Scott
> >>>>>>>>>
> >>>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
> >>>>>>>>>>
> >>>>>>>>>> We already have a ticket and this predated CEPs, and being an
> >>>>>>>>>> obviously good improvement to have that many have been asking for for
> >>>>>>>>>> some time now, I don't see the need for a CEP here.
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> >>>>>>>>>> <st...@instaclustr.com>  wrote:
> >>>>>>>>>>> Hi list,
> >>>>>>>>>>>
> >>>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >>>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >>>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
> >>>>>>>>>>> cosmetic.
> >>>>>>>>>>>
> >>>>>>>>>>> I would like to know if there is a general consensus about me going to
> >>>>>>>>>>> create a CEP for this feature or what is your perception on this. I
> >>>>>>>>>>> know we have it a little bit backwards here as we should first discuss
> >>>>>>>>>>> and then code but I am super glad that we have some POC we can
> >>>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
> >>>>>>>>>>> approach / other implementation aspects of this feature.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
> >>>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
> >>>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
> >>>>>>>>>>> 50 watchers which was inactive for such a long time.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards
> >>>>>>>>>>>
> >>>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>>>
> >>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>>
> >>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

If the same user chosen key Km is used across all nodes in the same 
cluster, the sender will only need to share their SSTable generation GEN 
with the receiving side. This is because the receiving side will need to 
use the GEN to reproduce the KEK used in the source node. The receiving 
side will then need to unwrap Kr with the KEK and re-wrap it with a new 
KEK' derived from their own GEN. GEN is not considered as a secret.


On 16/11/2021 12:13, Stefan Miklosovic wrote:
> Thanks for the insights of everybody.
>
> I would like to return to Km. If we require that all Km's are the same
> before streaming, is it not true that we do not need to move any
> secrets around at all? So TLS would not be required either as only
> encrypted tables would ever be streamed. That way Kr would never ever
> leave the node and new Km would be rolled over first. To use correct
> Km, we would have hash of that upon received table from the
> recipient's perspective. This would also avoid the fairly complex
> algorithm in the last Bowen's reply when I got that right.
>
> On Tue, 16 Nov 2021 at 13:02, benedict@apache.org <be...@apache.org> wrote:
>> We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured.
>>
>> It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest.
>>
>>
>> From: Bowen Song <bo...@bso.ng.INVALID>
>> Date: Tuesday, 16 November 2021 at 11:56
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>> I think authenticating a receiving node is important, but it is perhaps
>> not in the scope of this ticket (or CEP if it becomes one). This applies
>> to not only encrypted SSTables, but also unencrypted SSTables. A
>> malicious node can join the cluster and send bogus requests to other
>> nodes is a general problem not specific to the on-disk encryption.
>>
>> On 16/11/2021 10:50, benedict@apache.org wrote:
>>> I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
>>>
>>> Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
>>>
>>>
>>> From: Stefan Miklosovic <st...@instaclustr.com>
>>> Date: Tuesday, 16 November 2021 at 10:45
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>>> Ok but this also means that Km would need to be the same for all nodes right?
>>>
>>> If we are rolling in node by node fashion, Km is changed at node 1, we
>>> change the wrapped key which is stored on disk and we stream this
>>> table to the other node which is still on the old Km. Would this work?
>>> I think we would need to rotate first before anything is streamed. Or
>>> no?
>>>
>>> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>> Yes, that's correct. The actual key used to encrypt the SSTable will
>>>> stay the same once the SSTable is created. This is a widely used
>>>> practice in many encrypt-at-rest applications. One good example is the
>>>> LUKS full disk encryption, which also supports multiple keys to unlock
>>>> (decrypt) the same data. Multiple unlocking keys is only possible
>>>> because the actual key used to encrypt the data is randomly generated
>>>> and then stored encrypted by (a key derived from) a user chosen key.
>>>>
>>>> If this approach is adopted, the streaming process can share the Kr
>>>> without disclosing the Km, therefore enableling zero-copy streaming.
>>>>
>>>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
>>>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
>>>>> key for the actual sstable encryption would be always the same, it is
>>>>> just what is wrapped would differ. So if we rotate, we basically only
>>>>> change Km hence KEK hence the result of wrapping but there would still
>>>>> be the original Kr key used.
>>>>>
>>>>> Jeremiah - I will prepare that branch very soon.
>>>>>
>>>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>>>        The second question is about key rotation. If an operator needs to
>>>>>>>        roll the key because it was compromised or there is some policy around
>>>>>>>        that, we should be able to provide some way to rotate it. Our idea is
>>>>>>>        to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>>        command or a completely standalone one in tools) which would take the
>>>>>>>        first, original key, the second, new key and dir with sstables as
>>>>>>>        input and it would literally took the data and it would rewrite it to
>>>>>>>        the second set of sstables which would be encrypted with the second
>>>>>>>        key. What do you think about this?
>>>>>>        I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>>
>>>>>> There's a much better approach to solve this issue. You can stored a
>>>>>> wrapped key in an encryption info file alone side the SSTable file.
>>>>>> Here's how it works:
>>>>>> 1. randomly generate a key Kr
>>>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>>>>>> file on disk
>>>>>> 3. derive a key encryption key KEK from the SSTable file's information
>>>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>>>>>> you have KEK = KDF(UUID+GEN, Km)
>>>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>>>>>> 5. hash the Km, the hash will used as a key ID to identify which master
>>>>>> key was used to encrypt the key Kr if the server has multiple master
>>>>>> keys in use
>>>>>> 6. store the the WKr and the hash of Km in a separate file alone side
>>>>>> the SSTable file
>>>>>>
>>>>>> In the read path, the Kr should be kept in memory to help improve
>>>>>> performance and this will also allow zero-downtime master key rotation.
>>>>>>
>>>>>> During a key rotation:
>>>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>>>>>> 2. read the WKr from the encryption information file, and unwrap
>>>>>> (decrypt) it using the KEK to get the Kr
>>>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
>>>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>>>>>> 5. hash the new master key Km', and store it together with the WKr' in
>>>>>> the encryption info file
>>>>>>
>>>>>> Since the key rotation only involves rewriting the encryption info file,
>>>>>> the operation should take only a few milliseconds per SSTable file, it
>>>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>>>>>
>>>>>>>> Hey,
>>>>>>>>
>>>>>>>> there are two points we are not completely sure about.
>>>>>>>>
>>>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>>>>>> which has a different key - it would not be able to decrypt that. Our
>>>>>>>> idea is to actually send data over the wire _decrypted_ however it
>>>>>>>> would be still secure if internode communication is done via TLS. Is
>>>>>>>> this approach good with you?
>>>>>>>>
>>>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>>>>>
>>>>>>>> The second question is about key rotation. If an operator needs to
>>>>>>>> roll the key because it was compromised or there is some policy around
>>>>>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>>> command or a completely standalone one in tools) which would take the
>>>>>>>> first, original key, the second, new key and dir with sstables as
>>>>>>>> input and it would literally took the data and it would rewrite it to
>>>>>>>> the second set of sstables which would be encrypted with the second
>>>>>>>> key. What do you think about this?
>>>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>>>>>
>>>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>>>>>
>>>>>>>>> – Scott
>>>>>>>>>
>>>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>>>>>> obviously good improvement to have that many have been asking for for
>>>>>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>>>>>
>>>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>>>>>> Hi list,
>>>>>>>>>>>
>>>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>>>>>> cosmetic.
>>>>>>>>>>>
>>>>>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>>>>>
>>>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Thanks for the insights of everybody.

I would like to return to Km. If we require that all Km's are the same
before streaming, is it not true that we do not need to move any
secrets around at all? So TLS would not be required either as only
encrypted tables would ever be streamed. That way Kr would never ever
leave the node and new Km would be rolled over first. To use correct
Km, we would have hash of that upon received table from the
recipient's perspective. This would also avoid the fairly complex
algorithm in the last Bowen's reply when I got that right.

On Tue, 16 Nov 2021 at 13:02, benedict@apache.org <be...@apache.org> wrote:
>
> We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured.
>
> It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest.
>
>
> From: Bowen Song <bo...@bso.ng.INVALID>
> Date: Tuesday, 16 November 2021 at 11:56
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> I think authenticating a receiving node is important, but it is perhaps
> not in the scope of this ticket (or CEP if it becomes one). This applies
> to not only encrypted SSTables, but also unencrypted SSTables. A
> malicious node can join the cluster and send bogus requests to other
> nodes is a general problem not specific to the on-disk encryption.
>
> On 16/11/2021 10:50, benedict@apache.org wrote:
> > I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
> >
> > Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
> >
> >
> > From: Stefan Miklosovic <st...@instaclustr.com>
> > Date: Tuesday, 16 November 2021 at 10:45
> > To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> > Ok but this also means that Km would need to be the same for all nodes right?
> >
> > If we are rolling in node by node fashion, Km is changed at node 1, we
> > change the wrapped key which is stored on disk and we stream this
> > table to the other node which is still on the old Km. Would this work?
> > I think we would need to rotate first before anything is streamed. Or
> > no?
> >
> > On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
> >> Yes, that's correct. The actual key used to encrypt the SSTable will
> >> stay the same once the SSTable is created. This is a widely used
> >> practice in many encrypt-at-rest applications. One good example is the
> >> LUKS full disk encryption, which also supports multiple keys to unlock
> >> (decrypt) the same data. Multiple unlocking keys is only possible
> >> because the actual key used to encrypt the data is randomly generated
> >> and then stored encrypted by (a key derived from) a user chosen key.
> >>
> >> If this approach is adopted, the streaming process can share the Kr
> >> without disclosing the Km, therefore enableling zero-copy streaming.
> >>
> >> On 16/11/2021 08:56, Stefan Miklosovic wrote:
> >>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
> >>> key for the actual sstable encryption would be always the same, it is
> >>> just what is wrapped would differ. So if we rotate, we basically only
> >>> change Km hence KEK hence the result of wrapping but there would still
> >>> be the original Kr key used.
> >>>
> >>> Jeremiah - I will prepare that branch very soon.
> >>>
> >>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
> >>>>>       The second question is about key rotation. If an operator needs to
> >>>>>       roll the key because it was compromised or there is some policy around
> >>>>>       that, we should be able to provide some way to rotate it. Our idea is
> >>>>>       to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>>>       command or a completely standalone one in tools) which would take the
> >>>>>       first, original key, the second, new key and dir with sstables as
> >>>>>       input and it would literally took the data and it would rewrite it to
> >>>>>       the second set of sstables which would be encrypted with the second
> >>>>>       key. What do you think about this?
> >>>>       I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>>>
> >>>> There's a much better approach to solve this issue. You can stored a
> >>>> wrapped key in an encryption info file alone side the SSTable file.
> >>>> Here's how it works:
> >>>> 1. randomly generate a key Kr
> >>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
> >>>> file on disk
> >>>> 3. derive a key encryption key KEK from the SSTable file's information
> >>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
> >>>> you have KEK = KDF(UUID+GEN, Km)
> >>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
> >>>> 5. hash the Km, the hash will used as a key ID to identify which master
> >>>> key was used to encrypt the key Kr if the server has multiple master
> >>>> keys in use
> >>>> 6. store the the WKr and the hash of Km in a separate file alone side
> >>>> the SSTable file
> >>>>
> >>>> In the read path, the Kr should be kept in memory to help improve
> >>>> performance and this will also allow zero-downtime master key rotation.
> >>>>
> >>>> During a key rotation:
> >>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
> >>>> 2. read the WKr from the encryption information file, and unwrap
> >>>> (decrypt) it using the KEK to get the Kr
> >>>> 3. derive a new KEK' from the new master key Km' in the same way as above
> >>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
> >>>> 5. hash the new master key Km', and store it together with the WKr' in
> >>>> the encryption info file
> >>>>
> >>>> Since the key rotation only involves rewriting the encryption info file,
> >>>> the operation should take only a few milliseconds per SSTable file, it
> >>>> will be much faster than decrypting and then re-encrypting the SSTable data.
> >>>>
> >>>>
> >>>>
> >>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
> >>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
> >>>>>>
> >>>>>> Hey,
> >>>>>>
> >>>>>> there are two points we are not completely sure about.
> >>>>>>
> >>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
> >>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
> >>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
> >>>>>> which has a different key - it would not be able to decrypt that. Our
> >>>>>> idea is to actually send data over the wire _decrypted_ however it
> >>>>>> would be still secure if internode communication is done via TLS. Is
> >>>>>> this approach good with you?
> >>>>>>
> >>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> >>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
> >>>>>
> >>>>>> The second question is about key rotation. If an operator needs to
> >>>>>> roll the key because it was compromised or there is some policy around
> >>>>>> that, we should be able to provide some way to rotate it. Our idea is
> >>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>>>> command or a completely standalone one in tools) which would take the
> >>>>>> first, original key, the second, new key and dir with sstables as
> >>>>>> input and it would literally took the data and it would rewrite it to
> >>>>>> the second set of sstables which would be encrypted with the second
> >>>>>> key. What do you think about this?
> >>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
> >>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
> >>>>>>>
> >>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
> >>>>>>>
> >>>>>>> – Scott
> >>>>>>>
> >>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
> >>>>>>>>
> >>>>>>>> We already have a ticket and this predated CEPs, and being an
> >>>>>>>> obviously good improvement to have that many have been asking for for
> >>>>>>>> some time now, I don't see the need for a CEP here.
> >>>>>>>>
> >>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> >>>>>>>> <st...@instaclustr.com>  wrote:
> >>>>>>>>> Hi list,
> >>>>>>>>>
> >>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
> >>>>>>>>> cosmetic.
> >>>>>>>>>
> >>>>>>>>> I would like to know if there is a general consensus about me going to
> >>>>>>>>> create a CEP for this feature or what is your perception on this. I
> >>>>>>>>> know we have it a little bit backwards here as we should first discuss
> >>>>>>>>> and then code but I am super glad that we have some POC we can
> >>>>>>>>> elaborate further on and CEP would just cement  and summarise the
> >>>>>>>>> approach / other implementation aspects of this feature.
> >>>>>>>>>
> >>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
> >>>>>>>>> when it comes to security. There are a lot of enterprises who desire
> >>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
> >>>>>>>>> 50 watchers which was inactive for such a long time.
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>>
> >>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "benedict@apache.org" <be...@apache.org>.

I’m not suggesting enforcing network encryption, just prohibiting unauthenticated connections from peers so that we do not effectively offer a decrypt-all-the-data endpoint.

If as an operator you know that it is impossible for unauthenticated peers to open a connection due to your network configuration, then we can offer some special SafeAllowAllInternodeAuthenticator that permits things to proceed as normal, but we should definitely ensure operators have considered internode authentication in the case we have at rest encryption. It’s far too easy for this to be overlooked otherwise, and for an operator to thereby fail to protect their data.


From: Bowen Song <bo...@bso.ng.INVALID>
Date: Tuesday, 16 November 2021 at 12:33
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I think a warning message is fine, but Cassandra should not enforce
network encryption when on-disk encryption is enabled. It's definitely a
valid use case to have Cassandra over IPSec without enabling TLS.

On 16/11/2021 12:02, benedict@apache.org wrote:
> We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured.
>
> It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest.
>
>
> From: Bowen Song <bo...@bso.ng.INVALID>
> Date: Tuesday, 16 November 2021 at 11:56
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> I think authenticating a receiving node is important, but it is perhaps
> not in the scope of this ticket (or CEP if it becomes one). This applies
> to not only encrypted SSTables, but also unencrypted SSTables. A
> malicious node can join the cluster and send bogus requests to other
> nodes is a general problem not specific to the on-disk encryption.
>
> On 16/11/2021 10:50, benedict@apache.org wrote:
>> I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
>>
>> Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
>>
>>
>> From: Stefan Miklosovic <st...@instaclustr.com>
>> Date: Tuesday, 16 November 2021 at 10:45
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>> Ok but this also means that Km would need to be the same for all nodes right?
>>
>> If we are rolling in node by node fashion, Km is changed at node 1, we
>> change the wrapped key which is stored on disk and we stream this
>> table to the other node which is still on the old Km. Would this work?
>> I think we would need to rotate first before anything is streamed. Or
>> no?
>>
>> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>>> Yes, that's correct. The actual key used to encrypt the SSTable will
>>> stay the same once the SSTable is created. This is a widely used
>>> practice in many encrypt-at-rest applications. One good example is the
>>> LUKS full disk encryption, which also supports multiple keys to unlock
>>> (decrypt) the same data. Multiple unlocking keys is only possible
>>> because the actual key used to encrypt the data is randomly generated
>>> and then stored encrypted by (a key derived from) a user chosen key.
>>>
>>> If this approach is adopted, the streaming process can share the Kr
>>> without disclosing the Km, therefore enableling zero-copy streaming.
>>>
>>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
>>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
>>>> key for the actual sstable encryption would be always the same, it is
>>>> just what is wrapped would differ. So if we rotate, we basically only
>>>> change Km hence KEK hence the result of wrapping but there would still
>>>> be the original Kr key used.
>>>>
>>>> Jeremiah - I will prepare that branch very soon.
>>>>
>>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>>        The second question is about key rotation. If an operator needs to
>>>>>>        roll the key because it was compromised or there is some policy around
>>>>>>        that, we should be able to provide some way to rotate it. Our idea is
>>>>>>        to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>        command or a completely standalone one in tools) which would take the
>>>>>>        first, original key, the second, new key and dir with sstables as
>>>>>>        input and it would literally took the data and it would rewrite it to
>>>>>>        the second set of sstables which would be encrypted with the second
>>>>>>        key. What do you think about this?
>>>>>        I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>
>>>>> There's a much better approach to solve this issue. You can stored a
>>>>> wrapped key in an encryption info file alone side the SSTable file.
>>>>> Here's how it works:
>>>>> 1. randomly generate a key Kr
>>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>>>>> file on disk
>>>>> 3. derive a key encryption key KEK from the SSTable file's information
>>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>>>>> you have KEK = KDF(UUID+GEN, Km)
>>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>>>>> 5. hash the Km, the hash will used as a key ID to identify which master
>>>>> key was used to encrypt the key Kr if the server has multiple master
>>>>> keys in use
>>>>> 6. store the the WKr and the hash of Km in a separate file alone side
>>>>> the SSTable file
>>>>>
>>>>> In the read path, the Kr should be kept in memory to help improve
>>>>> performance and this will also allow zero-downtime master key rotation.
>>>>>
>>>>> During a key rotation:
>>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>>>>> 2. read the WKr from the encryption information file, and unwrap
>>>>> (decrypt) it using the KEK to get the Kr
>>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
>>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>>>>> 5. hash the new master key Km', and store it together with the WKr' in
>>>>> the encryption info file
>>>>>
>>>>> Since the key rotation only involves rewriting the encryption info file,
>>>>> the operation should take only a few milliseconds per SSTable file, it
>>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>>>>
>>>>>
>>>>>
>>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> there are two points we are not completely sure about.
>>>>>>>
>>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>>>>> which has a different key - it would not be able to decrypt that. Our
>>>>>>> idea is to actually send data over the wire _decrypted_ however it
>>>>>>> would be still secure if internode communication is done via TLS. Is
>>>>>>> this approach good with you?
>>>>>>>
>>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>>>>
>>>>>>> The second question is about key rotation. If an operator needs to
>>>>>>> roll the key because it was compromised or there is some policy around
>>>>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>> command or a completely standalone one in tools) which would take the
>>>>>>> first, original key, the second, new key and dir with sstables as
>>>>>>> input and it would literally took the data and it would rewrite it to
>>>>>>> the second set of sstables which would be encrypted with the second
>>>>>>> key. What do you think about this?
>>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>>>>
>>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>>>>
>>>>>>>> – Scott
>>>>>>>>
>>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>>>>
>>>>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>>>>> obviously good improvement to have that many have been asking for for
>>>>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>>>>
>>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>>>>> Hi list,
>>>>>>>>>>
>>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>>>>> cosmetic.
>>>>>>>>>>
>>>>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>>>>
>>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

I think a warning message is fine, but Cassandra should not enforce 
network encryption when on-disk encryption is enabled. It's definitely a 
valid use case to have Cassandra over IPSec without enabling TLS.

On 16/11/2021 12:02, benedict@apache.org wrote:
> We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured.
>
> It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest.
>
>
> From: Bowen Song <bo...@bso.ng.INVALID>
> Date: Tuesday, 16 November 2021 at 11:56
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> I think authenticating a receiving node is important, but it is perhaps
> not in the scope of this ticket (or CEP if it becomes one). This applies
> to not only encrypted SSTables, but also unencrypted SSTables. A
> malicious node can join the cluster and send bogus requests to other
> nodes is a general problem not specific to the on-disk encryption.
>
> On 16/11/2021 10:50, benedict@apache.org wrote:
>> I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
>>
>> Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
>>
>>
>> From: Stefan Miklosovic <st...@instaclustr.com>
>> Date: Tuesday, 16 November 2021 at 10:45
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>> Ok but this also means that Km would need to be the same for all nodes right?
>>
>> If we are rolling in node by node fashion, Km is changed at node 1, we
>> change the wrapped key which is stored on disk and we stream this
>> table to the other node which is still on the old Km. Would this work?
>> I think we would need to rotate first before anything is streamed. Or
>> no?
>>
>> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>>> Yes, that's correct. The actual key used to encrypt the SSTable will
>>> stay the same once the SSTable is created. This is a widely used
>>> practice in many encrypt-at-rest applications. One good example is the
>>> LUKS full disk encryption, which also supports multiple keys to unlock
>>> (decrypt) the same data. Multiple unlocking keys is only possible
>>> because the actual key used to encrypt the data is randomly generated
>>> and then stored encrypted by (a key derived from) a user chosen key.
>>>
>>> If this approach is adopted, the streaming process can share the Kr
>>> without disclosing the Km, therefore enableling zero-copy streaming.
>>>
>>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
>>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
>>>> key for the actual sstable encryption would be always the same, it is
>>>> just what is wrapped would differ. So if we rotate, we basically only
>>>> change Km hence KEK hence the result of wrapping but there would still
>>>> be the original Kr key used.
>>>>
>>>> Jeremiah - I will prepare that branch very soon.
>>>>
>>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>>        The second question is about key rotation. If an operator needs to
>>>>>>        roll the key because it was compromised or there is some policy around
>>>>>>        that, we should be able to provide some way to rotate it. Our idea is
>>>>>>        to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>        command or a completely standalone one in tools) which would take the
>>>>>>        first, original key, the second, new key and dir with sstables as
>>>>>>        input and it would literally took the data and it would rewrite it to
>>>>>>        the second set of sstables which would be encrypted with the second
>>>>>>        key. What do you think about this?
>>>>>        I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>
>>>>> There's a much better approach to solve this issue. You can stored a
>>>>> wrapped key in an encryption info file alone side the SSTable file.
>>>>> Here's how it works:
>>>>> 1. randomly generate a key Kr
>>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>>>>> file on disk
>>>>> 3. derive a key encryption key KEK from the SSTable file's information
>>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>>>>> you have KEK = KDF(UUID+GEN, Km)
>>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>>>>> 5. hash the Km, the hash will used as a key ID to identify which master
>>>>> key was used to encrypt the key Kr if the server has multiple master
>>>>> keys in use
>>>>> 6. store the the WKr and the hash of Km in a separate file alone side
>>>>> the SSTable file
>>>>>
>>>>> In the read path, the Kr should be kept in memory to help improve
>>>>> performance and this will also allow zero-downtime master key rotation.
>>>>>
>>>>> During a key rotation:
>>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>>>>> 2. read the WKr from the encryption information file, and unwrap
>>>>> (decrypt) it using the KEK to get the Kr
>>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
>>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>>>>> 5. hash the new master key Km', and store it together with the WKr' in
>>>>> the encryption info file
>>>>>
>>>>> Since the key rotation only involves rewriting the encryption info file,
>>>>> the operation should take only a few milliseconds per SSTable file, it
>>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>>>>
>>>>>
>>>>>
>>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> there are two points we are not completely sure about.
>>>>>>>
>>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>>>>> which has a different key - it would not be able to decrypt that. Our
>>>>>>> idea is to actually send data over the wire _decrypted_ however it
>>>>>>> would be still secure if internode communication is done via TLS. Is
>>>>>>> this approach good with you?
>>>>>>>
>>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>>>>
>>>>>>> The second question is about key rotation. If an operator needs to
>>>>>>> roll the key because it was compromised or there is some policy around
>>>>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>>> command or a completely standalone one in tools) which would take the
>>>>>>> first, original key, the second, new key and dir with sstables as
>>>>>>> input and it would literally took the data and it would rewrite it to
>>>>>>> the second set of sstables which would be encrypted with the second
>>>>>>> key. What do you think about this?
>>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>>>>
>>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>>>>
>>>>>>>> – Scott
>>>>>>>>
>>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>>>>
>>>>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>>>>> obviously good improvement to have that many have been asking for for
>>>>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>>>>
>>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>>>>> Hi list,
>>>>>>>>>>
>>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>>>>> cosmetic.
>>>>>>>>>>
>>>>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>>>>
>>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "benedict@apache.org" <be...@apache.org>.

We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured.

It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest.


From: Bowen Song <bo...@bso.ng.INVALID>
Date: Tuesday, 16 November 2021 at 11:56
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I think authenticating a receiving node is important, but it is perhaps
not in the scope of this ticket (or CEP if it becomes one). This applies
to not only encrypted SSTables, but also unencrypted SSTables. A
malicious node can join the cluster and send bogus requests to other
nodes is a general problem not specific to the on-disk encryption.

On 16/11/2021 10:50, benedict@apache.org wrote:
> I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
>
> Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
>
>
> From: Stefan Miklosovic <st...@instaclustr.com>
> Date: Tuesday, 16 November 2021 at 10:45
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> Ok but this also means that Km would need to be the same for all nodes right?
>
> If we are rolling in node by node fashion, Km is changed at node 1, we
> change the wrapped key which is stored on disk and we stream this
> table to the other node which is still on the old Km. Would this work?
> I think we would need to rotate first before anything is streamed. Or
> no?
>
> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>> Yes, that's correct. The actual key used to encrypt the SSTable will
>> stay the same once the SSTable is created. This is a widely used
>> practice in many encrypt-at-rest applications. One good example is the
>> LUKS full disk encryption, which also supports multiple keys to unlock
>> (decrypt) the same data. Multiple unlocking keys is only possible
>> because the actual key used to encrypt the data is randomly generated
>> and then stored encrypted by (a key derived from) a user chosen key.
>>
>> If this approach is adopted, the streaming process can share the Kr
>> without disclosing the Km, therefore enableling zero-copy streaming.
>>
>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
>>> key for the actual sstable encryption would be always the same, it is
>>> just what is wrapped would differ. So if we rotate, we basically only
>>> change Km hence KEK hence the result of wrapping but there would still
>>> be the original Kr key used.
>>>
>>> Jeremiah - I will prepare that branch very soon.
>>>
>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>       The second question is about key rotation. If an operator needs to
>>>>>       roll the key because it was compromised or there is some policy around
>>>>>       that, we should be able to provide some way to rotate it. Our idea is
>>>>>       to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>       command or a completely standalone one in tools) which would take the
>>>>>       first, original key, the second, new key and dir with sstables as
>>>>>       input and it would literally took the data and it would rewrite it to
>>>>>       the second set of sstables which would be encrypted with the second
>>>>>       key. What do you think about this?
>>>>       I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>
>>>> There's a much better approach to solve this issue. You can stored a
>>>> wrapped key in an encryption info file alone side the SSTable file.
>>>> Here's how it works:
>>>> 1. randomly generate a key Kr
>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>>>> file on disk
>>>> 3. derive a key encryption key KEK from the SSTable file's information
>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>>>> you have KEK = KDF(UUID+GEN, Km)
>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>>>> 5. hash the Km, the hash will used as a key ID to identify which master
>>>> key was used to encrypt the key Kr if the server has multiple master
>>>> keys in use
>>>> 6. store the the WKr and the hash of Km in a separate file alone side
>>>> the SSTable file
>>>>
>>>> In the read path, the Kr should be kept in memory to help improve
>>>> performance and this will also allow zero-downtime master key rotation.
>>>>
>>>> During a key rotation:
>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>>>> 2. read the WKr from the encryption information file, and unwrap
>>>> (decrypt) it using the KEK to get the Kr
>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>>>> 5. hash the new master key Km', and store it together with the WKr' in
>>>> the encryption info file
>>>>
>>>> Since the key rotation only involves rewriting the encryption info file,
>>>> the operation should take only a few milliseconds per SSTable file, it
>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>>>
>>>>
>>>>
>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> there are two points we are not completely sure about.
>>>>>>
>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>>>> which has a different key - it would not be able to decrypt that. Our
>>>>>> idea is to actually send data over the wire _decrypted_ however it
>>>>>> would be still secure if internode communication is done via TLS. Is
>>>>>> this approach good with you?
>>>>>>
>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>>>
>>>>>> The second question is about key rotation. If an operator needs to
>>>>>> roll the key because it was compromised or there is some policy around
>>>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>> command or a completely standalone one in tools) which would take the
>>>>>> first, original key, the second, new key and dir with sstables as
>>>>>> input and it would literally took the data and it would rewrite it to
>>>>>> the second set of sstables which would be encrypted with the second
>>>>>> key. What do you think about this?
>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>
>>>>>> Regards
>>>>>>
>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>>>
>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>>>
>>>>>>> – Scott
>>>>>>>
>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>>>
>>>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>>>> obviously good improvement to have that many have been asking for for
>>>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>>>
>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>>>> Hi list,
>>>>>>>>>
>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>>>> cosmetic.
>>>>>>>>>
>>>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>>>
>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

I think authenticating a receiving node is important, but it is perhaps 
not in the scope of this ticket (or CEP if it becomes one). This applies 
to not only encrypted SSTables, but also unencrypted SSTables. A 
malicious node can join the cluster and send bogus requests to other 
nodes is a general problem not specific to the on-disk encryption.

On 16/11/2021 10:50, benedict@apache.org wrote:
> I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).
>
> Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.
>
>
> From: Stefan Miklosovic <st...@instaclustr.com>
> Date: Tuesday, 16 November 2021 at 10:45
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> Ok but this also means that Km would need to be the same for all nodes right?
>
> If we are rolling in node by node fashion, Km is changed at node 1, we
> change the wrapped key which is stored on disk and we stream this
> table to the other node which is still on the old Km. Would this work?
> I think we would need to rotate first before anything is streamed. Or
> no?
>
> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>> Yes, that's correct. The actual key used to encrypt the SSTable will
>> stay the same once the SSTable is created. This is a widely used
>> practice in many encrypt-at-rest applications. One good example is the
>> LUKS full disk encryption, which also supports multiple keys to unlock
>> (decrypt) the same data. Multiple unlocking keys is only possible
>> because the actual key used to encrypt the data is randomly generated
>> and then stored encrypted by (a key derived from) a user chosen key.
>>
>> If this approach is adopted, the streaming process can share the Kr
>> without disclosing the Km, therefore enableling zero-copy streaming.
>>
>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
>>> key for the actual sstable encryption would be always the same, it is
>>> just what is wrapped would differ. So if we rotate, we basically only
>>> change Km hence KEK hence the result of wrapping but there would still
>>> be the original Kr key used.
>>>
>>> Jeremiah - I will prepare that branch very soon.
>>>
>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>       The second question is about key rotation. If an operator needs to
>>>>>       roll the key because it was compromised or there is some policy around
>>>>>       that, we should be able to provide some way to rotate it. Our idea is
>>>>>       to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>       command or a completely standalone one in tools) which would take the
>>>>>       first, original key, the second, new key and dir with sstables as
>>>>>       input and it would literally took the data and it would rewrite it to
>>>>>       the second set of sstables which would be encrypted with the second
>>>>>       key. What do you think about this?
>>>>       I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>
>>>> There's a much better approach to solve this issue. You can stored a
>>>> wrapped key in an encryption info file alone side the SSTable file.
>>>> Here's how it works:
>>>> 1. randomly generate a key Kr
>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>>>> file on disk
>>>> 3. derive a key encryption key KEK from the SSTable file's information
>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>>>> you have KEK = KDF(UUID+GEN, Km)
>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>>>> 5. hash the Km, the hash will used as a key ID to identify which master
>>>> key was used to encrypt the key Kr if the server has multiple master
>>>> keys in use
>>>> 6. store the the WKr and the hash of Km in a separate file alone side
>>>> the SSTable file
>>>>
>>>> In the read path, the Kr should be kept in memory to help improve
>>>> performance and this will also allow zero-downtime master key rotation.
>>>>
>>>> During a key rotation:
>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>>>> 2. read the WKr from the encryption information file, and unwrap
>>>> (decrypt) it using the KEK to get the Kr
>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>>>> 5. hash the new master key Km', and store it together with the WKr' in
>>>> the encryption info file
>>>>
>>>> Since the key rotation only involves rewriting the encryption info file,
>>>> the operation should take only a few milliseconds per SSTable file, it
>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>>>
>>>>
>>>>
>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> there are two points we are not completely sure about.
>>>>>>
>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>>>> which has a different key - it would not be able to decrypt that. Our
>>>>>> idea is to actually send data over the wire _decrypted_ however it
>>>>>> would be still secure if internode communication is done via TLS. Is
>>>>>> this approach good with you?
>>>>>>
>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>>>
>>>>>> The second question is about key rotation. If an operator needs to
>>>>>> roll the key because it was compromised or there is some policy around
>>>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>> command or a completely standalone one in tools) which would take the
>>>>>> first, original key, the second, new key and dir with sstables as
>>>>>> input and it would literally took the data and it would rewrite it to
>>>>>> the second set of sstables which would be encrypted with the second
>>>>>> key. What do you think about this?
>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>
>>>>>> Regards
>>>>>>
>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>>>
>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>>>
>>>>>>> – Scott
>>>>>>>
>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>>>
>>>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>>>> obviously good improvement to have that many have been asking for for
>>>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>>>
>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>>>> Hi list,
>>>>>>>>>
>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>>>> cosmetic.
>>>>>>>>>
>>>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>>>
>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "benedict@apache.org" <be...@apache.org>.

I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s).

Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action.


From: Stefan Miklosovic <st...@instaclustr.com>
Date: Tuesday, 16 November 2021 at 10:45
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
Ok but this also means that Km would need to be the same for all nodes right?

If we are rolling in node by node fashion, Km is changed at node 1, we
change the wrapped key which is stored on disk and we stream this
table to the other node which is still on the old Km. Would this work?
I think we would need to rotate first before anything is streamed. Or
no?

On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>
> Yes, that's correct. The actual key used to encrypt the SSTable will
> stay the same once the SSTable is created. This is a widely used
> practice in many encrypt-at-rest applications. One good example is the
> LUKS full disk encryption, which also supports multiple keys to unlock
> (decrypt) the same data. Multiple unlocking keys is only possible
> because the actual key used to encrypt the data is randomly generated
> and then stored encrypted by (a key derived from) a user chosen key.
>
> If this approach is adopted, the streaming process can share the Kr
> without disclosing the Km, therefore enableling zero-copy streaming.
>
> On 16/11/2021 08:56, Stefan Miklosovic wrote:
> > Hi Bowen, Very interesting idea indeed. So if I got it right, the very
> > key for the actual sstable encryption would be always the same, it is
> > just what is wrapped would differ. So if we rotate, we basically only
> > change Km hence KEK hence the result of wrapping but there would still
> > be the original Kr key used.
> >
> > Jeremiah - I will prepare that branch very soon.
> >
> > On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
> >>>      The second question is about key rotation. If an operator needs to
> >>>      roll the key because it was compromised or there is some policy around
> >>>      that, we should be able to provide some way to rotate it. Our idea is
> >>>      to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>      command or a completely standalone one in tools) which would take the
> >>>      first, original key, the second, new key and dir with sstables as
> >>>      input and it would literally took the data and it would rewrite it to
> >>>      the second set of sstables which would be encrypted with the second
> >>>      key. What do you think about this?
> >>      I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>
> >> There's a much better approach to solve this issue. You can stored a
> >> wrapped key in an encryption info file alone side the SSTable file.
> >> Here's how it works:
> >> 1. randomly generate a key Kr
> >> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
> >> file on disk
> >> 3. derive a key encryption key KEK from the SSTable file's information
> >> (e.g.: table UUID + generation) and the user chosen master key Km, so
> >> you have KEK = KDF(UUID+GEN, Km)
> >> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
> >> 5. hash the Km, the hash will used as a key ID to identify which master
> >> key was used to encrypt the key Kr if the server has multiple master
> >> keys in use
> >> 6. store the the WKr and the hash of Km in a separate file alone side
> >> the SSTable file
> >>
> >> In the read path, the Kr should be kept in memory to help improve
> >> performance and this will also allow zero-downtime master key rotation.
> >>
> >> During a key rotation:
> >> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
> >> 2. read the WKr from the encryption information file, and unwrap
> >> (decrypt) it using the KEK to get the Kr
> >> 3. derive a new KEK' from the new master key Km' in the same way as above
> >> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
> >> 5. hash the new master key Km', and store it together with the WKr' in
> >> the encryption info file
> >>
> >> Since the key rotation only involves rewriting the encryption info file,
> >> the operation should take only a few milliseconds per SSTable file, it
> >> will be much faster than decrypting and then re-encrypting the SSTable data.
> >>
> >>
> >>
> >> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
> >>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
> >>>>
> >>>> Hey,
> >>>>
> >>>> there are two points we are not completely sure about.
> >>>>
> >>>> The first one is streaming. If there is a cluster of 5 nodes, each
> >>>> node has its own unique encryption key. Hence, if a SSTable is stored
> >>>> on a disk with the key for node 1 and this is streamed to node 2 -
> >>>> which has a different key - it would not be able to decrypt that. Our
> >>>> idea is to actually send data over the wire _decrypted_ however it
> >>>> would be still secure if internode communication is done via TLS. Is
> >>>> this approach good with you?
> >>>>
> >>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> >>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
> >>>
> >>>> The second question is about key rotation. If an operator needs to
> >>>> roll the key because it was compromised or there is some policy around
> >>>> that, we should be able to provide some way to rotate it. Our idea is
> >>>> to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>> command or a completely standalone one in tools) which would take the
> >>>> first, original key, the second, new key and dir with sstables as
> >>>> input and it would literally took the data and it would rewrite it to
> >>>> the second set of sstables which would be encrypted with the second
> >>>> key. What do you think about this?
> >>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>>
> >>>> Regards
> >>>>
> >>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
> >>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
> >>>>>
> >>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
> >>>>>
> >>>>> – Scott
> >>>>>
> >>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
> >>>>>>
> >>>>>> We already have a ticket and this predated CEPs, and being an
> >>>>>> obviously good improvement to have that many have been asking for for
> >>>>>> some time now, I don't see the need for a CEP here.
> >>>>>>
> >>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> >>>>>> <st...@instaclustr.com>  wrote:
> >>>>>>> Hi list,
> >>>>>>>
> >>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
> >>>>>>> cosmetic.
> >>>>>>>
> >>>>>>> I would like to know if there is a general consensus about me going to
> >>>>>>> create a CEP for this feature or what is your perception on this. I
> >>>>>>> know we have it a little bit backwards here as we should first discuss
> >>>>>>> and then code but I am super glad that we have some POC we can
> >>>>>>> elaborate further on and CEP would just cement  and summarise the
> >>>>>>> approach / other implementation aspects of this feature.
> >>>>>>>
> >>>>>>> I think that having 9633 merged will fill quite a big operational gap
> >>>>>>> when it comes to security. There are a lot of enterprises who desire
> >>>>>>> this feature so much. I can not remember when I last saw a ticket with
> >>>>>>> 50 watchers which was inactive for such a long time.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

No, the Km does not need to be the same across nodes. Each node can 
store their own encryption info file created by their own Km. The 
streaming process only requires the Kr is shared.

A quick description of the streaming process via an insecure connection:

1. the sender unwrap the wrapped key WKr with their Km, and get the key Kr

2. the sender and the receiver use DH key exchange to establish a shared 
secret Ks, so that sender and receiver both know the Ks

3. the sender derives a KEKs from the table info (SSTable gen is not 
persisted across nodes) & streaming info (TODO) and the shared secret 
Ks, so KEKs = KDF(Table UUID + TBD STREAMING INFO, Ks)

4. the sender wraps the Kr with KEKs to get WKrs = KW(Kr, KEKs)

5. the sender sends WKrs and the (encrypted) SSTable file to the receiver

6. the receiver derives the KEKs in the same way as the sender

7. the receiver unwraps WKrs using the the KEKs and get Kr

8. the receiver wraps the Kr with a KEK' derived from their own Km


This enables zero-copy streaming, and the Kr is never send in plaintext 
over an insecure communication channel. An passive observer cannot learn 
anything about the Kr. If the streaming is done over TLS, the Kr can be 
send over a TLS connection without all the additional work. The SSTable 
can be send via insecure connection to enable zero-copy streaming. An 
HMAC of the SSTable should also be send over TLS to ensure the SSTable 
has not been damaged or modified.


On 16/11/2021 10:45, Stefan Miklosovic wrote:
> Ok but this also means that Km would need to be the same for all nodes right?
>
> If we are rolling in node by node fashion, Km is changed at node 1, we
> change the wrapped key which is stored on disk and we stream this
> table to the other node which is still on the old Km. Would this work?
> I think we would need to rotate first before anything is streamed. Or
> no?
>
> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>> Yes, that's correct. The actual key used to encrypt the SSTable will
>> stay the same once the SSTable is created. This is a widely used
>> practice in many encrypt-at-rest applications. One good example is the
>> LUKS full disk encryption, which also supports multiple keys to unlock
>> (decrypt) the same data. Multiple unlocking keys is only possible
>> because the actual key used to encrypt the data is randomly generated
>> and then stored encrypted by (a key derived from) a user chosen key.
>>
>> If this approach is adopted, the streaming process can share the Kr
>> without disclosing the Km, therefore enableling zero-copy streaming.
>>
>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
>>> key for the actual sstable encryption would be always the same, it is
>>> just what is wrapped would differ. So if we rotate, we basically only
>>> change Km hence KEK hence the result of wrapping but there would still
>>> be the original Kr key used.
>>>
>>> Jeremiah - I will prepare that branch very soon.
>>>
>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>>>       The second question is about key rotation. If an operator needs to
>>>>>       roll the key because it was compromised or there is some policy around
>>>>>       that, we should be able to provide some way to rotate it. Our idea is
>>>>>       to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>       command or a completely standalone one in tools) which would take the
>>>>>       first, original key, the second, new key and dir with sstables as
>>>>>       input and it would literally took the data and it would rewrite it to
>>>>>       the second set of sstables which would be encrypted with the second
>>>>>       key. What do you think about this?
>>>>       I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>
>>>> There's a much better approach to solve this issue. You can stored a
>>>> wrapped key in an encryption info file alone side the SSTable file.
>>>> Here's how it works:
>>>> 1. randomly generate a key Kr
>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>>>> file on disk
>>>> 3. derive a key encryption key KEK from the SSTable file's information
>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>>>> you have KEK = KDF(UUID+GEN, Km)
>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>>>> 5. hash the Km, the hash will used as a key ID to identify which master
>>>> key was used to encrypt the key Kr if the server has multiple master
>>>> keys in use
>>>> 6. store the the WKr and the hash of Km in a separate file alone side
>>>> the SSTable file
>>>>
>>>> In the read path, the Kr should be kept in memory to help improve
>>>> performance and this will also allow zero-downtime master key rotation.
>>>>
>>>> During a key rotation:
>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>>>> 2. read the WKr from the encryption information file, and unwrap
>>>> (decrypt) it using the KEK to get the Kr
>>>> 3. derive a new KEK' from the new master key Km' in the same way as above
>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>>>> 5. hash the new master key Km', and store it together with the WKr' in
>>>> the encryption info file
>>>>
>>>> Since the key rotation only involves rewriting the encryption info file,
>>>> the operation should take only a few milliseconds per SSTable file, it
>>>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>>>
>>>>
>>>>
>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> there are two points we are not completely sure about.
>>>>>>
>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>>>> which has a different key - it would not be able to decrypt that. Our
>>>>>> idea is to actually send data over the wire _decrypted_ however it
>>>>>> would be still secure if internode communication is done via TLS. Is
>>>>>> this approach good with you?
>>>>>>
>>>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>>>
>>>>>> The second question is about key rotation. If an operator needs to
>>>>>> roll the key because it was compromised or there is some policy around
>>>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>>>> command or a completely standalone one in tools) which would take the
>>>>>> first, original key, the second, new key and dir with sstables as
>>>>>> input and it would literally took the data and it would rewrite it to
>>>>>> the second set of sstables which would be encrypted with the second
>>>>>> key. What do you think about this?
>>>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>>>
>>>>>> Regards
>>>>>>
>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>>>
>>>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>>>
>>>>>>> – Scott
>>>>>>>
>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>>>
>>>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>>>> obviously good improvement to have that many have been asking for for
>>>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>>>
>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>>>> Hi list,
>>>>>>>>>
>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>>>> cosmetic.
>>>>>>>>>
>>>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>>>
>>>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Ok but this also means that Km would need to be the same for all nodes right?

If we are rolling in node by node fashion, Km is changed at node 1, we
change the wrapped key which is stored on disk and we stream this
table to the other node which is still on the old Km. Would this work?
I think we would need to rotate first before anything is streamed. Or
no?

On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
>
> Yes, that's correct. The actual key used to encrypt the SSTable will
> stay the same once the SSTable is created. This is a widely used
> practice in many encrypt-at-rest applications. One good example is the
> LUKS full disk encryption, which also supports multiple keys to unlock
> (decrypt) the same data. Multiple unlocking keys is only possible
> because the actual key used to encrypt the data is randomly generated
> and then stored encrypted by (a key derived from) a user chosen key.
>
> If this approach is adopted, the streaming process can share the Kr
> without disclosing the Km, therefore enableling zero-copy streaming.
>
> On 16/11/2021 08:56, Stefan Miklosovic wrote:
> > Hi Bowen, Very interesting idea indeed. So if I got it right, the very
> > key for the actual sstable encryption would be always the same, it is
> > just what is wrapped would differ. So if we rotate, we basically only
> > change Km hence KEK hence the result of wrapping but there would still
> > be the original Kr key used.
> >
> > Jeremiah - I will prepare that branch very soon.
> >
> > On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
> >>>      The second question is about key rotation. If an operator needs to
> >>>      roll the key because it was compromised or there is some policy around
> >>>      that, we should be able to provide some way to rotate it. Our idea is
> >>>      to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>      command or a completely standalone one in tools) which would take the
> >>>      first, original key, the second, new key and dir with sstables as
> >>>      input and it would literally took the data and it would rewrite it to
> >>>      the second set of sstables which would be encrypted with the second
> >>>      key. What do you think about this?
> >>      I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>
> >> There's a much better approach to solve this issue. You can stored a
> >> wrapped key in an encryption info file alone side the SSTable file.
> >> Here's how it works:
> >> 1. randomly generate a key Kr
> >> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
> >> file on disk
> >> 3. derive a key encryption key KEK from the SSTable file's information
> >> (e.g.: table UUID + generation) and the user chosen master key Km, so
> >> you have KEK = KDF(UUID+GEN, Km)
> >> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
> >> 5. hash the Km, the hash will used as a key ID to identify which master
> >> key was used to encrypt the key Kr if the server has multiple master
> >> keys in use
> >> 6. store the the WKr and the hash of Km in a separate file alone side
> >> the SSTable file
> >>
> >> In the read path, the Kr should be kept in memory to help improve
> >> performance and this will also allow zero-downtime master key rotation.
> >>
> >> During a key rotation:
> >> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
> >> 2. read the WKr from the encryption information file, and unwrap
> >> (decrypt) it using the KEK to get the Kr
> >> 3. derive a new KEK' from the new master key Km' in the same way as above
> >> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
> >> 5. hash the new master key Km', and store it together with the WKr' in
> >> the encryption info file
> >>
> >> Since the key rotation only involves rewriting the encryption info file,
> >> the operation should take only a few milliseconds per SSTable file, it
> >> will be much faster than decrypting and then re-encrypting the SSTable data.
> >>
> >>
> >>
> >> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
> >>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
> >>>>
> >>>> Hey,
> >>>>
> >>>> there are two points we are not completely sure about.
> >>>>
> >>>> The first one is streaming. If there is a cluster of 5 nodes, each
> >>>> node has its own unique encryption key. Hence, if a SSTable is stored
> >>>> on a disk with the key for node 1 and this is streamed to node 2 -
> >>>> which has a different key - it would not be able to decrypt that. Our
> >>>> idea is to actually send data over the wire _decrypted_ however it
> >>>> would be still secure if internode communication is done via TLS. Is
> >>>> this approach good with you?
> >>>>
> >>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> >>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
> >>>
> >>>> The second question is about key rotation. If an operator needs to
> >>>> roll the key because it was compromised or there is some policy around
> >>>> that, we should be able to provide some way to rotate it. Our idea is
> >>>> to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>> command or a completely standalone one in tools) which would take the
> >>>> first, original key, the second, new key and dir with sstables as
> >>>> input and it would literally took the data and it would rewrite it to
> >>>> the second set of sstables which would be encrypted with the second
> >>>> key. What do you think about this?
> >>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >>>
> >>>> Regards
> >>>>
> >>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
> >>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
> >>>>>
> >>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
> >>>>>
> >>>>> – Scott
> >>>>>
> >>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
> >>>>>>
> >>>>>> We already have a ticket and this predated CEPs, and being an
> >>>>>> obviously good improvement to have that many have been asking for for
> >>>>>> some time now, I don't see the need for a CEP here.
> >>>>>>
> >>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> >>>>>> <st...@instaclustr.com>  wrote:
> >>>>>>> Hi list,
> >>>>>>>
> >>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
> >>>>>>> cosmetic.
> >>>>>>>
> >>>>>>> I would like to know if there is a general consensus about me going to
> >>>>>>> create a CEP for this feature or what is your perception on this. I
> >>>>>>> know we have it a little bit backwards here as we should first discuss
> >>>>>>> and then code but I am super glad that we have some POC we can
> >>>>>>> elaborate further on and CEP would just cement  and summarise the
> >>>>>>> approach / other implementation aspects of this feature.
> >>>>>>>
> >>>>>>> I think that having 9633 merged will fill quite a big operational gap
> >>>>>>> when it comes to security. There are a lot of enterprises who desire
> >>>>>>> this feature so much. I can not remember when I last saw a ticket with
> >>>>>>> 50 watchers which was inactive for such a long time.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

Yes, that's correct. The actual key used to encrypt the SSTable will 
stay the same once the SSTable is created. This is a widely used 
practice in many encrypt-at-rest applications. One good example is the 
LUKS full disk encryption, which also supports multiple keys to unlock 
(decrypt) the same data. Multiple unlocking keys is only possible 
because the actual key used to encrypt the data is randomly generated 
and then stored encrypted by (a key derived from) a user chosen key.

If this approach is adopted, the streaming process can share the Kr 
without disclosing the Km, therefore enableling zero-copy streaming.

On 16/11/2021 08:56, Stefan Miklosovic wrote:
> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
> key for the actual sstable encryption would be always the same, it is
> just what is wrapped would differ. So if we rotate, we basically only
> change Km hence KEK hence the result of wrapping but there would still
> be the original Kr key used.
>
> Jeremiah - I will prepare that branch very soon.
>
> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>>>      The second question is about key rotation. If an operator needs to
>>>      roll the key because it was compromised or there is some policy around
>>>      that, we should be able to provide some way to rotate it. Our idea is
>>>      to write a tool (either a subcommand of nodetool (rewritesstables)
>>>      command or a completely standalone one in tools) which would take the
>>>      first, original key, the second, new key and dir with sstables as
>>>      input and it would literally took the data and it would rewrite it to
>>>      the second set of sstables which would be encrypted with the second
>>>      key. What do you think about this?
>>      I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>
>> There's a much better approach to solve this issue. You can stored a
>> wrapped key in an encryption info file alone side the SSTable file.
>> Here's how it works:
>> 1. randomly generate a key Kr
>> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
>> file on disk
>> 3. derive a key encryption key KEK from the SSTable file's information
>> (e.g.: table UUID + generation) and the user chosen master key Km, so
>> you have KEK = KDF(UUID+GEN, Km)
>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
>> 5. hash the Km, the hash will used as a key ID to identify which master
>> key was used to encrypt the key Kr if the server has multiple master
>> keys in use
>> 6. store the the WKr and the hash of Km in a separate file alone side
>> the SSTable file
>>
>> In the read path, the Kr should be kept in memory to help improve
>> performance and this will also allow zero-downtime master key rotation.
>>
>> During a key rotation:
>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
>> 2. read the WKr from the encryption information file, and unwrap
>> (decrypt) it using the KEK to get the Kr
>> 3. derive a new KEK' from the new master key Km' in the same way as above
>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
>> 5. hash the new master key Km', and store it together with the WKr' in
>> the encryption info file
>>
>> Since the key rotation only involves rewriting the encryption info file,
>> the operation should take only a few milliseconds per SSTable file, it
>> will be much faster than decrypting and then re-encrypting the SSTable data.
>>
>>
>>
>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>>>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>>>
>>>> Hey,
>>>>
>>>> there are two points we are not completely sure about.
>>>>
>>>> The first one is streaming. If there is a cluster of 5 nodes, each
>>>> node has its own unique encryption key. Hence, if a SSTable is stored
>>>> on a disk with the key for node 1 and this is streamed to node 2 -
>>>> which has a different key - it would not be able to decrypt that. Our
>>>> idea is to actually send data over the wire _decrypted_ however it
>>>> would be still secure if internode communication is done via TLS. Is
>>>> this approach good with you?
>>>>
>>> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
>>> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>>>
>>>> The second question is about key rotation. If an operator needs to
>>>> roll the key because it was compromised or there is some policy around
>>>> that, we should be able to provide some way to rotate it. Our idea is
>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
>>>> command or a completely standalone one in tools) which would take the
>>>> first, original key, the second, new key and dir with sstables as
>>>> input and it would literally took the data and it would rewrite it to
>>>> the second set of sstables which would be encrypted with the second
>>>> key. What do you think about this?
>>> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>>>
>>>> Regards
>>>>
>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>>>
>>>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>>>
>>>>> – Scott
>>>>>
>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>>>
>>>>>> We already have a ticket and this predated CEPs, and being an
>>>>>> obviously good improvement to have that many have been asking for for
>>>>>> some time now, I don't see the need for a CEP here.
>>>>>>
>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>>>> <st...@instaclustr.com>  wrote:
>>>>>>> Hi list,
>>>>>>>
>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>>>> cosmetic.
>>>>>>>
>>>>>>> I would like to know if there is a general consensus about me going to
>>>>>>> create a CEP for this feature or what is your perception on this. I
>>>>>>> know we have it a little bit backwards here as we should first discuss
>>>>>>> and then code but I am super glad that we have some POC we can
>>>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>>>> approach / other implementation aspects of this feature.
>>>>>>>
>>>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>>>> 50 watchers which was inactive for such a long time.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

I really believe we likely need a CEP for this. This gets complicated
pretty fast with all the details attached and I do not want to have
endless discussions about this in the ticket.

I can clearly see this is something a broader audience needs to vote
on eventually.

On Tue, 16 Nov 2021 at 09:56, Stefan Miklosovic
<st...@instaclustr.com> wrote:
>
> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
> key for the actual sstable encryption would be always the same, it is
> just what is wrapped would differ. So if we rotate, we basically only
> change Km hence KEK hence the result of wrapping but there would still
> be the original Kr key used.
>
> Jeremiah - I will prepare that branch very soon.
>
> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
> >
> > >     The second question is about key rotation. If an operator needs to
> > >     roll the key because it was compromised or there is some policy around
> > >     that, we should be able to provide some way to rotate it. Our idea is
> > >     to write a tool (either a subcommand of nodetool (rewritesstables)
> > >     command or a completely standalone one in tools) which would take the
> > >     first, original key, the second, new key and dir with sstables as
> > >     input and it would literally took the data and it would rewrite it to
> > >     the second set of sstables which would be encrypted with the second
> > >     key. What do you think about this?
> >
> >     I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >
> > There's a much better approach to solve this issue. You can stored a
> > wrapped key in an encryption info file alone side the SSTable file.
> > Here's how it works:
> > 1. randomly generate a key Kr
> > 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
> > file on disk
> > 3. derive a key encryption key KEK from the SSTable file's information
> > (e.g.: table UUID + generation) and the user chosen master key Km, so
> > you have KEK = KDF(UUID+GEN, Km)
> > 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
> > 5. hash the Km, the hash will used as a key ID to identify which master
> > key was used to encrypt the key Kr if the server has multiple master
> > keys in use
> > 6. store the the WKr and the hash of Km in a separate file alone side
> > the SSTable file
> >
> > In the read path, the Kr should be kept in memory to help improve
> > performance and this will also allow zero-downtime master key rotation.
> >
> > During a key rotation:
> > 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
> > 2. read the WKr from the encryption information file, and unwrap
> > (decrypt) it using the KEK to get the Kr
> > 3. derive a new KEK' from the new master key Km' in the same way as above
> > 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
> > 5. hash the new master key Km', and store it together with the WKr' in
> > the encryption info file
> >
> > Since the key rotation only involves rewriting the encryption info file,
> > the operation should take only a few milliseconds per SSTable file, it
> > will be much faster than decrypting and then re-encrypting the SSTable data.
> >
> >
> >
> > On 15/11/2021 18:42, Jeremiah D Jordan wrote:
> > >
> > >> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
> > >>
> > >> Hey,
> > >>
> > >> there are two points we are not completely sure about.
> > >>
> > >> The first one is streaming. If there is a cluster of 5 nodes, each
> > >> node has its own unique encryption key. Hence, if a SSTable is stored
> > >> on a disk with the key for node 1 and this is streamed to node 2 -
> > >> which has a different key - it would not be able to decrypt that. Our
> > >> idea is to actually send data over the wire _decrypted_ however it
> > >> would be still secure if internode communication is done via TLS. Is
> > >> this approach good with you?
> > >>
> > > So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> > > Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
> > >
> > >> The second question is about key rotation. If an operator needs to
> > >> roll the key because it was compromised or there is some policy around
> > >> that, we should be able to provide some way to rotate it. Our idea is
> > >> to write a tool (either a subcommand of nodetool (rewritesstables)
> > >> command or a completely standalone one in tools) which would take the
> > >> first, original key, the second, new key and dir with sstables as
> > >> input and it would literally took the data and it would rewrite it to
> > >> the second set of sstables which would be encrypted with the second
> > >> key. What do you think about this?
> > > I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> > >
> > >> Regards
> > >>
> > >> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
> > >>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
> > >>>
> > >>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
> > >>>
> > >>> – Scott
> > >>>
> > >>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
> > >>>>
> > >>>> We already have a ticket and this predated CEPs, and being an
> > >>>> obviously good improvement to have that many have been asking for for
> > >>>> some time now, I don't see the need for a CEP here.
> > >>>>
> > >>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> > >>>> <st...@instaclustr.com>  wrote:
> > >>>>> Hi list,
> > >>>>>
> > >>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
> > >>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
> > >>>>> times of 3.4 to the current trunk with my help here and there, mostly
> > >>>>> cosmetic.
> > >>>>>
> > >>>>> I would like to know if there is a general consensus about me going to
> > >>>>> create a CEP for this feature or what is your perception on this. I
> > >>>>> know we have it a little bit backwards here as we should first discuss
> > >>>>> and then code but I am super glad that we have some POC we can
> > >>>>> elaborate further on and CEP would just cement  and summarise the
> > >>>>> approach / other implementation aspects of this feature.
> > >>>>>
> > >>>>> I think that having 9633 merged will fill quite a big operational gap
> > >>>>> when it comes to security. There are a lot of enterprises who desire
> > >>>>> this feature so much. I can not remember when I last saw a ticket with
> > >>>>> 50 watchers which was inactive for such a long time.
> > >>>>>
> > >>>>> Regards
> > >>>>>
> > >>>>> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > >>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> > >>>>>
> > >>>> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > >>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> > >>>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands, e-mail:dev-help@cassandra.apache.org
> > >>>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail:dev-help@cassandra.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail:dev-help@cassandra.apache.org
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Hi Bowen, Very interesting idea indeed. So if I got it right, the very
key for the actual sstable encryption would be always the same, it is
just what is wrapped would differ. So if we rotate, we basically only
change Km hence KEK hence the result of wrapping but there would still
be the original Kr key used.

Jeremiah - I will prepare that branch very soon.

On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
>
> >     The second question is about key rotation. If an operator needs to
> >     roll the key because it was compromised or there is some policy around
> >     that, we should be able to provide some way to rotate it. Our idea is
> >     to write a tool (either a subcommand of nodetool (rewritesstables)
> >     command or a completely standalone one in tools) which would take the
> >     first, original key, the second, new key and dir with sstables as
> >     input and it would literally took the data and it would rewrite it to
> >     the second set of sstables which would be encrypted with the second
> >     key. What do you think about this?
>
>     I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>
> There's a much better approach to solve this issue. You can stored a
> wrapped key in an encryption info file alone side the SSTable file.
> Here's how it works:
> 1. randomly generate a key Kr
> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable
> file on disk
> 3. derive a key encryption key KEK from the SSTable file's information
> (e.g.: table UUID + generation) and the user chosen master key Km, so
> you have KEK = KDF(UUID+GEN, Km)
> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
> 5. hash the Km, the hash will used as a key ID to identify which master
> key was used to encrypt the key Kr if the server has multiple master
> keys in use
> 6. store the the WKr and the hash of Km in a separate file alone side
> the SSTable file
>
> In the read path, the Kr should be kept in memory to help improve
> performance and this will also allow zero-downtime master key rotation.
>
> During a key rotation:
> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
> 2. read the WKr from the encryption information file, and unwrap
> (decrypt) it using the KEK to get the Kr
> 3. derive a new KEK' from the new master key Km' in the same way as above
> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
> 5. hash the new master key Km', and store it together with the WKr' in
> the encryption info file
>
> Since the key rotation only involves rewriting the encryption info file,
> the operation should take only a few milliseconds per SSTable file, it
> will be much faster than decrypting and then re-encrypting the SSTable data.
>
>
>
> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
> >
> >> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
> >>
> >> Hey,
> >>
> >> there are two points we are not completely sure about.
> >>
> >> The first one is streaming. If there is a cluster of 5 nodes, each
> >> node has its own unique encryption key. Hence, if a SSTable is stored
> >> on a disk with the key for node 1 and this is streamed to node 2 -
> >> which has a different key - it would not be able to decrypt that. Our
> >> idea is to actually send data over the wire _decrypted_ however it
> >> would be still secure if internode communication is done via TLS. Is
> >> this approach good with you?
> >>
> > So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> > Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
> >
> >> The second question is about key rotation. If an operator needs to
> >> roll the key because it was compromised or there is some policy around
> >> that, we should be able to provide some way to rotate it. Our idea is
> >> to write a tool (either a subcommand of nodetool (rewritesstables)
> >> command or a completely standalone one in tools) which would take the
> >> first, original key, the second, new key and dir with sstables as
> >> input and it would literally took the data and it would rewrite it to
> >> the second set of sstables which would be encrypted with the second
> >> key. What do you think about this?
> > I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
> >
> >> Regards
> >>
> >> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
> >>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
> >>>
> >>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
> >>>
> >>> – Scott
> >>>
> >>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
> >>>>
> >>>> We already have a ticket and this predated CEPs, and being an
> >>>> obviously good improvement to have that many have been asking for for
> >>>> some time now, I don't see the need for a CEP here.
> >>>>
> >>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> >>>> <st...@instaclustr.com>  wrote:
> >>>>> Hi list,
> >>>>>
> >>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >>>>> times of 3.4 to the current trunk with my help here and there, mostly
> >>>>> cosmetic.
> >>>>>
> >>>>> I would like to know if there is a general consensus about me going to
> >>>>> create a CEP for this feature or what is your perception on this. I
> >>>>> know we have it a little bit backwards here as we should first discuss
> >>>>> and then code but I am super glad that we have some POC we can
> >>>>> elaborate further on and CEP would just cement  and summarise the
> >>>>> approach / other implementation aspects of this feature.
> >>>>>
> >>>>> I think that having 9633 merged will fill quite a big operational gap
> >>>>> when it comes to security. There are a lot of enterprises who desire
> >>>>> this feature so much. I can not remember when I last saw a ticket with
> >>>>> 50 watchers which was inactive for such a long time.
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail:dev-help@cassandra.apache.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

>     The second question is about key rotation. If an operator needs to
>     roll the key because it was compromised or there is some policy around
>     that, we should be able to provide some way to rotate it. Our idea is
>     to write a tool (either a subcommand of nodetool (rewritesstables)
>     command or a completely standalone one in tools) which would take the
>     first, original key, the second, new key and dir with sstables as
>     input and it would literally took the data and it would rewrite it to
>     the second set of sstables which would be encrypted with the second
>     key. What do you think about this?

    I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.

There's a much better approach to solve this issue. You can stored a 
wrapped key in an encryption info file alone side the SSTable file. 
Here's how it works:
1. randomly generate a key Kr
2. encrypt the SSTable file with the key Kr, store the encrypted SSTable 
file on disk
3. derive a key encryption key KEK from the SSTable file's information 
(e.g.: table UUID + generation) and the user chosen master key Km, so 
you have KEK = KDF(UUID+GEN, Km)
4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK)
5. hash the Km, the hash will used as a key ID to identify which master 
key was used to encrypt the key Kr if the server has multiple master 
keys in use
6. store the the WKr and the hash of Km in a separate file alone side 
the SSTable file

In the read path, the Kr should be kept in memory to help improve 
performance and this will also allow zero-downtime master key rotation.

During a key rotation:
1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
2. read the WKr from the encryption information file, and unwrap 
(decrypt) it using the KEK to get the Kr
3. derive a new KEK' from the new master key Km' in the same way as above
4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
5. hash the new master key Km', and store it together with the WKr' in 
the encryption info file

Since the key rotation only involves rewriting the encryption info file, 
the operation should take only a few milliseconds per SSTable file, it 
will be much faster than decrypting and then re-encrypting the SSTable data.



On 15/11/2021 18:42, Jeremiah D Jordan wrote:
>
>> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic<st...@instaclustr.com>  wrote:
>>
>> Hey,
>>
>> there are two points we are not completely sure about.
>>
>> The first one is streaming. If there is a cluster of 5 nodes, each
>> node has its own unique encryption key. Hence, if a SSTable is stored
>> on a disk with the key for node 1 and this is streamed to node 2 -
>> which has a different key - it would not be able to decrypt that. Our
>> idea is to actually send data over the wire _decrypted_ however it
>> would be still secure if internode communication is done via TLS. Is
>> this approach good with you?
>>
> So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
> Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.
>
>> The second question is about key rotation. If an operator needs to
>> roll the key because it was compromised or there is some policy around
>> that, we should be able to provide some way to rotate it. Our idea is
>> to write a tool (either a subcommand of nodetool (rewritesstables)
>> command or a completely standalone one in tools) which would take the
>> first, original key, the second, new key and dir with sstables as
>> input and it would literally took the data and it would rewrite it to
>> the second set of sstables which would be encrypted with the second
>> key. What do you think about this?
> I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.
>
>> Regards
>>
>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
>>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>>>
>>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>>>
>>> – Scott
>>>
>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dr...@gmail.com>  wrote:
>>>>
>>>> We already have a ticket and this predated CEPs, and being an
>>>> obviously good improvement to have that many have been asking for for
>>>> some time now, I don't see the need for a CEP here.
>>>>
>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>>> <st...@instaclustr.com>  wrote:
>>>>> Hi list,
>>>>>
>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>>> cosmetic.
>>>>>
>>>>> I would like to know if there is a general consensus about me going to
>>>>> create a CEP for this feature or what is your perception on this. I
>>>>> know we have it a little bit backwards here as we should first discuss
>>>>> and then code but I am super glad that we have some POC we can
>>>>> elaborate further on and CEP would just cement  and summarise the
>>>>> approach / other implementation aspects of this feature.
>>>>>
>>>>> I think that having 9633 merged will fill quite a big operational gap
>>>>> when it comes to security. There are a lot of enterprises who desire
>>>>> this feature so much. I can not remember when I last saw a ticket with
>>>>> 50 watchers which was inactive for such a long time.
>>>>>
>>>>> Regards
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail:dev-help@cassandra.apache.org
>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Jeremiah D Jordan <je...@gmail.com>.


> On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic <st...@instaclustr.com> wrote:
> 
> Hey,
> 
> there are two points we are not completely sure about.
> 
> The first one is streaming. If there is a cluster of 5 nodes, each
> node has its own unique encryption key. Hence, if a SSTable is stored
> on a disk with the key for node 1 and this is streamed to node 2 -
> which has a different key - it would not be able to decrypt that. Our
> idea is to actually send data over the wire _decrypted_ however it
> would be still secure if internode communication is done via TLS. Is
> this approach good with you?
> 

So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication?  Another concern here is making sure zero copy streaming does not get triggered for this case.
Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side?  Having to do this will mean a lot more overhead for the sending side of a streaming operation.

> The second question is about key rotation. If an operator needs to
> roll the key because it was compromised or there is some policy around
> that, we should be able to provide some way to rotate it. Our idea is
> to write a tool (either a subcommand of nodetool (rewritesstables)
> command or a completely standalone one in tools) which would take the
> first, original key, the second, new key and dir with sstables as
> input and it would literally took the data and it would rewrite it to
> the second set of sstables which would be encrypted with the second
> key. What do you think about this?

I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system.  This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key.  An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”.

> 
> Regards
> 
> On Sat, 13 Nov 2021 at 19:35, <sc...@paradoxica.net> wrote:
>> 
>> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>> 
>> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>> 
>> – Scott
>> 
>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams <dr...@gmail.com> wrote:
>>> 
>>> We already have a ticket and this predated CEPs, and being an
>>> obviously good improvement to have that many have been asking for for
>>> some time now, I don't see the need for a CEP here.
>>> 
>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
>>> <st...@instaclustr.com> wrote:
>>>> 
>>>> Hi list,
>>>> 
>>>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>>>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>>>> times of 3.4 to the current trunk with my help here and there, mostly
>>>> cosmetic.
>>>> 
>>>> I would like to know if there is a general consensus about me going to
>>>> create a CEP for this feature or what is your perception on this. I
>>>> know we have it a little bit backwards here as we should first discuss
>>>> and then code but I am super glad that we have some POC we can
>>>> elaborate further on and CEP would just cement  and summarise the
>>>> approach / other implementation aspects of this feature.
>>>> 
>>>> I think that having 9633 merged will fill quite a big operational gap
>>>> when it comes to security. There are a lot of enterprises who desire
>>>> this feature so much. I can not remember when I last saw a ticket with
>>>> 50 watchers which was inactive for such a long time.
>>>> 
>>>> Regards
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Hey,

there are two points we are not completely sure about.

The first one is streaming. If there is a cluster of 5 nodes, each
node has its own unique encryption key. Hence, if a SSTable is stored
on a disk with the key for node 1 and this is streamed to node 2 -
which has a different key - it would not be able to decrypt that. Our
idea is to actually send data over the wire _decrypted_ however it
would be still secure if internode communication is done via TLS. Is
this approach good with you?

The second question is about key rotation. If an operator needs to
roll the key because it was compromised or there is some policy around
that, we should be able to provide some way to rotate it. Our idea is
to write a tool (either a subcommand of nodetool (rewritesstables)
command or a completely standalone one in tools) which would take the
first, original key, the second, new key and dir with sstables as
input and it would literally took the data and it would rewrite it to
the second set of sstables which would be encrypted with the second
key. What do you think about this?

Regards

On Sat, 13 Nov 2021 at 19:35, <sc...@paradoxica.net> wrote:
>
> Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.
>
> One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.
>
> – Scott
>
> > On Nov 13, 2021, at 7:53 AM, Brandon Williams <dr...@gmail.com> wrote:
> >
> > We already have a ticket and this predated CEPs, and being an
> > obviously good improvement to have that many have been asking for for
> > some time now, I don't see the need for a CEP here.
> >
> > On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> > <st...@instaclustr.com> wrote:
> >>
> >> Hi list,
> >>
> >> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >> times of 3.4 to the current trunk with my help here and there, mostly
> >> cosmetic.
> >>
> >> I would like to know if there is a general consensus about me going to
> >> create a CEP for this feature or what is your perception on this. I
> >> know we have it a little bit backwards here as we should first discuss
> >> and then code but I am super glad that we have some POC we can
> >> elaborate further on and CEP would just cement  and summarise the
> >> approach / other implementation aspects of this feature.
> >>
> >> I think that having 9633 merged will fill quite a big operational gap
> >> when it comes to security. There are a lot of enterprises who desire
> >> this feature so much. I can not remember when I last saw a ticket with
> >> 50 watchers which was inactive for such a long time.
> >>
> >> Regards
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by sc...@paradoxica.net.

Same reaction here - great to have traction on this ticket. Shylaja, thanks for your work on this and to Stefan as well! It would be wonderful to have the feature complete.

One thing I’d mention is that a lot’s changed about the project’s testing strategy since the original patch was written. I see that the 2016 version adds a couple round-trip unit tests with a small amount of static data. It would be good to see randomized tests fleshed out that exercise more of the read/write path; or which add variants of existing read/write path tests that enable encryption.

– Scott

> On Nov 13, 2021, at 7:53 AM, Brandon Williams <dr...@gmail.com> wrote:
> 
> We already have a ticket and this predated CEPs, and being an
> obviously good improvement to have that many have been asking for for
> some time now, I don't see the need for a CEP here.
> 
> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> <st...@instaclustr.com> wrote:
>> 
>> Hi list,
>> 
>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>> times of 3.4 to the current trunk with my help here and there, mostly
>> cosmetic.
>> 
>> I would like to know if there is a general consensus about me going to
>> create a CEP for this feature or what is your perception on this. I
>> know we have it a little bit backwards here as we should first discuss
>> and then code but I am super glad that we have some POC we can
>> elaborate further on and CEP would just cement  and summarise the
>> approach / other implementation aspects of this feature.
>> 
>> I think that having 9633 merged will fill quite a big operational gap
>> when it comes to security. There are a lot of enterprises who desire
>> this feature so much. I can not remember when I last saw a ticket with
>> 50 watchers which was inactive for such a long time.
>> 
>> Regards
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Brandon Williams <dr...@gmail.com>.

We already have a ticket and this predated CEPs, and being an
obviously good improvement to have that many have been asking for for
some time now, I don't see the need for a CEP here.

On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
<st...@instaclustr.com> wrote:
>
> Hi list,
>
> an engineer from Intel - Shylaja Kokoori (who is watching this list
> closely) has retrofitted the original code from CASSANDRA-9633 work in
> times of 3.4 to the current trunk with my help here and there, mostly
> cosmetic.
>
> I would like to know if there is a general consensus about me going to
> create a CEP for this feature or what is your perception on this. I
> know we have it a little bit backwards here as we should first discuss
> and then code but I am super glad that we have some POC we can
> elaborate further on and CEP would just cement  and summarise the
> approach / other implementation aspects of this feature.
>
> I think that having 9633 merged will fill quite a big operational gap
> when it comes to security. There are a lot of enterprises who desire
> this feature so much. I can not remember when I last saw a ticket with
> 50 watchers which was inactive for such a long time.
>
> Regards
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

On Tue, 16 Nov 2021 at 16:17, Joseph Lynch <jo...@gmail.com> wrote:
>
> > I find it rather strange to offer commit log and hints
> encryption at rest but for some reason sstable encryption would be
> omitted.
>
> I also think file/disk encryption may be superior in those cases

Just for the record, I do not have any particular opinion / I am not
leaning towards any solution as of now when it comes to superiority /
inferiority of file system encryption.

It would be very beneficial if more people expressed their views on this matter.

but
> I imagine they were easier to implement in that you don't have to
> worry nearly as much about key management since both commit logs and
> hints are short lived files that should never leave the box (except
> maybe for CDC but I feel like that's similar to backup in terms of
> "exfiltration by design").
>
> To be clear, I think in 2015 this feature would have been extremely
> useful, but with operating systems and cloud providers often offering
> full disk encryption by default now and doing it with really good
> (performant and secure) implementations ... I question if it's
> something we want to sink cycles into.
>
> -Joey
>
> On Tue, Nov 16, 2021 at 7:01 AM Stefan Miklosovic
> <st...@instaclustr.com> wrote:
> >
> > I don't object to having the discussion about whether we actually need
> > this feature at all :)
> >
> > Let's hear from people in the field what their perception is on this.
> >
> > Btw, if we should rely on file system encryption, for what reason is
> > there encryption of commit logs and hints already? So this should be
> > removed? I find it rather strange to offer commit log and hints
> > encryption at rest but for some reason sstable encryption would be
> > omitted.
> >
> > On Tue, 16 Nov 2021 at 15:46, Joseph Lynch <jo...@gmail.com> wrote:
> > >
> > > I think a CEP is wise (or a more thorough design document on the
> > > ticket) given how easy it is to do security incorrectly and key
> > > management, rotation and key derivation are not particularly
> > > straightforward.
> > >
> > > I am curious what advantage Cassandra implementing encryption has over
> > > asking the user to use an encrypted filesystem or disks instead where
> > > the kernel or device will undoubtedly be able to do the crypto more
> > > efficiently than we can in the JVM and we wouldn't have to further
> > > complicate the storage engine? I think the state of encrypted
> > > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > > these days than it was in 2015 when that ticket was created.
> > >
> > > If the application has existing exfiltration paths (e.g. backups) it's
> > > probably better to encrypt/decrypt in the backup/restore process via
> > > something extremely fast (and modern) like piping through age [1]
> > > isn't it?
> > >
> > > [1] https://github.com/FiloSottile/age
> > >
> > > -Joey
> > >
> > >
> > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> > > <st...@instaclustr.com> wrote:
> > > >
> > > > Hi list,
> > > >
> > > > an engineer from Intel - Shylaja Kokoori (who is watching this list
> > > > closely) has retrofitted the original code from CASSANDRA-9633 work in
> > > > times of 3.4 to the current trunk with my help here and there, mostly
> > > > cosmetic.
> > > >
> > > > I would like to know if there is a general consensus about me going to
> > > > create a CEP for this feature or what is your perception on this. I
> > > > know we have it a little bit backwards here as we should first discuss
> > > > and then code but I am super glad that we have some POC we can
> > > > elaborate further on and CEP would just cement  and summarise the
> > > > approach / other implementation aspects of this feature.
> > > >
> > > > I think that having 9633 merged will fill quite a big operational gap
> > > > when it comes to security. There are a lot of enterprises who desire
> > > > this feature so much. I can not remember when I last saw a ticket with
> > > > 50 watchers which was inactive for such a long time.
> > > >
> > > > Regards
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

> I find it rather strange to offer commit log and hints
encryption at rest but for some reason sstable encryption would be
omitted.

I also think file/disk encryption may be superior in those cases, but
I imagine they were easier to implement in that you don't have to
worry nearly as much about key management since both commit logs and
hints are short lived files that should never leave the box (except
maybe for CDC but I feel like that's similar to backup in terms of
"exfiltration by design").

To be clear, I think in 2015 this feature would have been extremely
useful, but with operating systems and cloud providers often offering
full disk encryption by default now and doing it with really good
(performant and secure) implementations ... I question if it's
something we want to sink cycles into.

-Joey

On Tue, Nov 16, 2021 at 7:01 AM Stefan Miklosovic
<st...@instaclustr.com> wrote:
>
> I don't object to having the discussion about whether we actually need
> this feature at all :)
>
> Let's hear from people in the field what their perception is on this.
>
> Btw, if we should rely on file system encryption, for what reason is
> there encryption of commit logs and hints already? So this should be
> removed? I find it rather strange to offer commit log and hints
> encryption at rest but for some reason sstable encryption would be
> omitted.
>
> On Tue, 16 Nov 2021 at 15:46, Joseph Lynch <jo...@gmail.com> wrote:
> >
> > I think a CEP is wise (or a more thorough design document on the
> > ticket) given how easy it is to do security incorrectly and key
> > management, rotation and key derivation are not particularly
> > straightforward.
> >
> > I am curious what advantage Cassandra implementing encryption has over
> > asking the user to use an encrypted filesystem or disks instead where
> > the kernel or device will undoubtedly be able to do the crypto more
> > efficiently than we can in the JVM and we wouldn't have to further
> > complicate the storage engine? I think the state of encrypted
> > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > these days than it was in 2015 when that ticket was created.
> >
> > If the application has existing exfiltration paths (e.g. backups) it's
> > probably better to encrypt/decrypt in the backup/restore process via
> > something extremely fast (and modern) like piping through age [1]
> > isn't it?
> >
> > [1] https://github.com/FiloSottile/age
> >
> > -Joey
> >
> >
> > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> > <st...@instaclustr.com> wrote:
> > >
> > > Hi list,
> > >
> > > an engineer from Intel - Shylaja Kokoori (who is watching this list
> > > closely) has retrofitted the original code from CASSANDRA-9633 work in
> > > times of 3.4 to the current trunk with my help here and there, mostly
> > > cosmetic.
> > >
> > > I would like to know if there is a general consensus about me going to
> > > create a CEP for this feature or what is your perception on this. I
> > > know we have it a little bit backwards here as we should first discuss
> > > and then code but I am super glad that we have some POC we can
> > > elaborate further on and CEP would just cement  and summarise the
> > > approach / other implementation aspects of this feature.
> > >
> > > I think that having 9633 merged will fill quite a big operational gap
> > > when it comes to security. There are a lot of enterprises who desire
> > > this feature so much. I can not remember when I last saw a ticket with
> > > 50 watchers which was inactive for such a long time.
> > >
> > > Regards
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

I don't object to having the discussion about whether we actually need
this feature at all :)

Let's hear from people in the field what their perception is on this.

Btw, if we should rely on file system encryption, for what reason is
there encryption of commit logs and hints already? So this should be
removed? I find it rather strange to offer commit log and hints
encryption at rest but for some reason sstable encryption would be
omitted.

On Tue, 16 Nov 2021 at 15:46, Joseph Lynch <jo...@gmail.com> wrote:
>
> I think a CEP is wise (or a more thorough design document on the
> ticket) given how easy it is to do security incorrectly and key
> management, rotation and key derivation are not particularly
> straightforward.
>
> I am curious what advantage Cassandra implementing encryption has over
> asking the user to use an encrypted filesystem or disks instead where
> the kernel or device will undoubtedly be able to do the crypto more
> efficiently than we can in the JVM and we wouldn't have to further
> complicate the storage engine? I think the state of encrypted
> filesystems (e.g. LUKS on Linux) is significantly more user friendly
> these days than it was in 2015 when that ticket was created.
>
> If the application has existing exfiltration paths (e.g. backups) it's
> probably better to encrypt/decrypt in the backup/restore process via
> something extremely fast (and modern) like piping through age [1]
> isn't it?
>
> [1] https://github.com/FiloSottile/age
>
> -Joey
>
>
> On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> <st...@instaclustr.com> wrote:
> >
> > Hi list,
> >
> > an engineer from Intel - Shylaja Kokoori (who is watching this list
> > closely) has retrofitted the original code from CASSANDRA-9633 work in
> > times of 3.4 to the current trunk with my help here and there, mostly
> > cosmetic.
> >
> > I would like to know if there is a general consensus about me going to
> > create a CEP for this feature or what is your perception on this. I
> > know we have it a little bit backwards here as we should first discuss
> > and then code but I am super glad that we have some POC we can
> > elaborate further on and CEP would just cement  and summarise the
> > approach / other implementation aspects of this feature.
> >
> > I think that having 9633 merged will fill quite a big operational gap
> > when it comes to security. There are a lot of enterprises who desire
> > this feature so much. I can not remember when I last saw a ticket with
> > 50 watchers which was inactive for such a long time.
> >
> > Regards
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

Sorry, but IMHO setting performance requirements on this regard is a 
nonsense. As long as it's reasonably usable in real world, and Cassandra 
makes the estimated effects on performance available, it will be up to 
the operators to decide whether to turn on the feature. It's a trade off 
between security and performance, and everyone has different needs.

On 19/11/2021 01:50, Joseph Lynch wrote:
> On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <sh...@intel.com>
> wrote:
>
>> To address Joey's concern, the OpenJDK JVM and its derivatives optimize
>> Java crypto based on the underlying HW capabilities. For example, if the
>> underlying HW supports AES-NI, JVM intrinsics will use those for crypto
>> operations. Likewise, the new vector AES available on the latest Intel
>> platform is utilized by the JVM while running on that platform to make
>> crypto operations faster.
>>
> Which JDK version were you running? We have had a number of issues with the
> JVM being 2-10x slower than native crypto on Java 8 (especially MD5, SHA1,
> and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again I
> think we could get the JVM crypto penalty down to ~2x native if we linked
> in e.g. ACCP by default [1, 2] but even the very best Java crypto I've seen
> (fully utilizing hardware instructions) is still ~2x slower than native
> code. The operating system has a number of advantages here in that they
> don't pay JVM allocation costs or the JNI barrier (in the case of ACCP) and
> the kernel also takes advantage of hardware instructions.
>
>
>>  From our internal experiments, we see single digit % regression when
>> transparent data encryption is enabled.
>>
> Which workloads are you testing and how are you measuring the regression? I
> suspect that compaction, repair (validation compaction), streaming, and
> quorum reads are probably much slower (probably ~10x slower for the
> throughput bound operations and ~2x slower on the read path). As
> compaction/repair/streaming usually take up between 10-20% of available CPU
> cycles making them 2x slower might show up as <10% overall utilization
> increase when you've really regressed 100% or more on key metrics
> (compaction throughput, streaming throughput, memory allocation rate, etc
> ...). For example, if compaction was able to achieve 2 MiBps of throughput
> before encryption and it was only able to achieve 1MiBps of throughput
> afterwards, that would be a huge real world impact to operators as
> compactions now take twice as long.
>
> I think a CEP or details on the ticket that indicate the performance tests
> and workloads that will be run might be wise? Perhaps something like
> "encryption creates no more than a 1% regression of: compaction throughput
> (MiBps), streaming throughput (MiBps), repair validation throughput
> (duration of full repair on the entire cluster), read throughput at 10ms
> p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO of
> 10ms), etc ... while a sustained load is applied to a multi-node cluster"?
> Even a microbenchmark that just sees how long it takes to encrypt and
> decrypt a 500MiB dataset using the proposed JVM implementation versus
> encrypting it with a native implementation might be enough to confirm/deny.
> For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
> AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6 GiBps
> encryption and 1.0 GiBps decryption; from my past experiences with Java
> crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.
>
> Cheers,
> -Joey
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
> [2] https://github.com/corretto/amazon-corretto-crypto-provider
> [3] https://github.com/FiloSottile/age
> [4] https://github.com/hashbrowncipher/keypipe#encryption
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

>
> Are you for real here?Nobody will ever guarantee you these %1 numbers
> ... come on. I think we are
> super paranoid about performance when we are not paranoid enough about
> security. This is a two way street.
> People are willing to give up on performance if security is a must.
>

I am for real that we should aspire to test performance (in addition to
correctness) when we implement complex features that impact performance of
the database. Given that the alternatives (e.g. using your cloud providers
out of the box encrypted ephemeral drives) have essentially no performance
penalty I think it's important to document how/if we are worse (we might
not be).

I don't actually think the 1% number is important, and I certainly don't
think we give any kind of guarantee, I'm just trying to say that if we
invest in encryption of the storage engine I hope we have clear metrics we
will measure that implementation by so we can try to gauge whether it is
worth doing and maintaining generally.


> You do not need to use it if you do not want to,
> it is not like we are going to turn it on and you have to stick with
> that. Are you just saying that we are going to
> protect people from using some security features because their db
> might be slow? What if they just dont care?
>

Certainly we can add this to the list of features that Cassandra supports
but few if any users can actually use due to either correctness issues
(e.g. not actually secure), usability (e.g. you have to configure 4
different properties to setup the various encryption options and restart
the database every week to rotate keys) or performance issues (e.g.
compaction slows down by 25x). I am just saying having a bit of design
(either on the ticket or the CEP) for how we might avoid that situation
might help.

-Joey

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Jeff Jirsa <jj...@gmail.com>.


> On Nov 19, 2021, at 2:53 PM, Joseph Lynch <jo...@gmail.com> wrote:
> 
> 
>> 
>> For better or worse, different threat models mean that it’s not strictly better to do FDE and some use cases definitely want this at the db layer instead of file system.
> 
> Do you mind elaborating which threat models? The only one I can think
> of is users can log onto the database machine and have read access to
> the cassandra data directory but not read access to wherever the keys
> are?

Basically that - one where shell access is more likely (with or without LPE being required to get to the mounted volume). LPE being required in the common case for either makes them effectively the same, one just makes auditors much happier than the other.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

> For better or worse, different threat models mean that it’s not strictly better to do FDE and some use cases definitely want this at the db layer instead of file system.

Do you mind elaborating which threat models? The only one I can think
of is users can log onto the database machine and have read access to
the cassandra data directory but not read access to wherever the keys
are?

-Joey

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Derek Chen-Becker <de...@chen-becker.org>.

Thanks, that's really helpful to have some code to look at!

Derek

On Fri, Nov 19, 2021 at 9:35 AM Joseph Lynch <jo...@gmail.com> wrote:

> On Fri, Nov 19, 2021 at 9:52 AM Derek Chen-Becker <de...@chen-becker.org>
> wrote:
> >
> > https://bugs.openjdk.java.net/browse/JDK-7184394 added AES intrinsics in
> > Java 8, in 2012. While it's always possible to have a regression, and
> it's
> > important to understand the performance impact, stories of 2-10x sound
> > apocryphal. If they're all using the same intrinsics, the performance
> > should be roughly the same. I think that the real challenge will be key
> > management, not performance.
> >
> > Derek
>
> > On Fri, Nov 19, 2021 at 7:41 AM Bowen Song <bo...@bso.ng.invalid> wrote:
> >
> > > On the performance note, I copy & pasted a small piece of Java code to
> > > do AES256-CBC on the stdin and write the result to stdout. I then ran
> > > the following two commands on the same machine (with AES-NI) for
> > > comparison:
> > >
> > >     $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
> > >     /usr/lib/jvm/java-11-openjdk/bin/java -jar aes-bench.jar >/dev/null
> > >     36.24s user 5.96s system 100% cpu 41.912 total
> > >     $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
> > >     openssl enc -aes-256-cbc -e -K
> > >     "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
> > >     -iv "0123456789abcdef0123456789abcdef" >/dev/null
> > >     31.09s user 3.92s system 99% cpu 35.043 total
> > >
> > > This is not an accurate test of the AES performance, as the Java test
> > > includes the JVM start up time and the key and IV generation in the
> Java
> > > code. But this gives us a pretty good idea that the total performance
> > > regression is definitely far from the 2x to 10x slower claimed in some
> > > previous emails.
> > >
> > >
>
> I am aware that Java added AES intrinsics support in Java 8, but it is
> still painfully slow doing authenticated AES-GCM and many other forms
> of crypto [1]. Native AES-GCM on my laptop running at 4GHz [2]
> achieves 3.7 GiB/s while Java 8 can manage a mere 289 MiB/s (13x
> slower) and Java 11 manages 768 MiB/s (5x slower) [3]. AWS literally
> funded an entire project [4] to speed up slow Java crypto which has
> sped up basic crypto from 2-10x [5, 6] on real world workloads at
> scale.
>
> I don't think my claims are apocryphal when I and others have spent so
> much time on this project and other JVM projects debugging why they
> are so slow, including most recently determining the root cause to the
> initial serious performance regressions in 4.0's networking code was
> due to native Java 8's TLS stack and specifically AES-GCM
> implementation being painfully slow (the fix we settled on was to use
> tcnative with native AES-GCM) [6] as well as speeding up quorum reads
> by 2x through using faster MD5 crypto [8, 9, 10].
>
> -Joey
>
> [1]
> https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-summary-txt
> [2]
> https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-benchmarkon-sh
> [3]
> https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-authenticated_encryption_perf-txt
> [4] https://github.com/corretto/amazon-corretto-crypto-provider
> [5] https://github.com/corretto/amazon-corretto-crypto-provider/pull/54
> [6] https://github.com/corretto/amazon-corretto-crypto-provider/issues/52
> [7] https://issues.apache.org/jira/browse/CASSANDRA-15175
> [8] https://issues.apache.org/jira/browse/CASSANDRA-14611
> [9] https://issues.apache.org/jira/browse/CASSANDRA-15294
> [10]
> https://github.com/corretto/amazon-corretto-crypto-provider/issues/52#issuecomment-531921577
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

On Fri, Nov 19, 2021 at 9:52 AM Derek Chen-Becker <de...@chen-becker.org> wrote:
>
> https://bugs.openjdk.java.net/browse/JDK-7184394 added AES intrinsics in
> Java 8, in 2012. While it's always possible to have a regression, and it's
> important to understand the performance impact, stories of 2-10x sound
> apocryphal. If they're all using the same intrinsics, the performance
> should be roughly the same. I think that the real challenge will be key
> management, not performance.
>
> Derek

> On Fri, Nov 19, 2021 at 7:41 AM Bowen Song <bo...@bso.ng.invalid> wrote:
>
> > On the performance note, I copy & pasted a small piece of Java code to
> > do AES256-CBC on the stdin and write the result to stdout. I then ran
> > the following two commands on the same machine (with AES-NI) for
> > comparison:
> >
> >     $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
> >     /usr/lib/jvm/java-11-openjdk/bin/java -jar aes-bench.jar >/dev/null
> >     36.24s user 5.96s system 100% cpu 41.912 total
> >     $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
> >     openssl enc -aes-256-cbc -e -K
> >     "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
> >     -iv "0123456789abcdef0123456789abcdef" >/dev/null
> >     31.09s user 3.92s system 99% cpu 35.043 total
> >
> > This is not an accurate test of the AES performance, as the Java test
> > includes the JVM start up time and the key and IV generation in the Java
> > code. But this gives us a pretty good idea that the total performance
> > regression is definitely far from the 2x to 10x slower claimed in some
> > previous emails.
> >
> >

I am aware that Java added AES intrinsics support in Java 8, but it is
still painfully slow doing authenticated AES-GCM and many other forms
of crypto [1]. Native AES-GCM on my laptop running at 4GHz [2]
achieves 3.7 GiB/s while Java 8 can manage a mere 289 MiB/s (13x
slower) and Java 11 manages 768 MiB/s (5x slower) [3]. AWS literally
funded an entire project [4] to speed up slow Java crypto which has
sped up basic crypto from 2-10x [5, 6] on real world workloads at
scale.

I don't think my claims are apocryphal when I and others have spent so
much time on this project and other JVM projects debugging why they
are so slow, including most recently determining the root cause to the
initial serious performance regressions in 4.0's networking code was
due to native Java 8's TLS stack and specifically AES-GCM
implementation being painfully slow (the fix we settled on was to use
tcnative with native AES-GCM) [6] as well as speeding up quorum reads
by 2x through using faster MD5 crypto [8, 9, 10].

-Joey

[1] https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-summary-txt
[2] https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-benchmarkon-sh
[3] https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-authenticated_encryption_perf-txt
[4] https://github.com/corretto/amazon-corretto-crypto-provider
[5] https://github.com/corretto/amazon-corretto-crypto-provider/pull/54
[6] https://github.com/corretto/amazon-corretto-crypto-provider/issues/52
[7] https://issues.apache.org/jira/browse/CASSANDRA-15175
[8] https://issues.apache.org/jira/browse/CASSANDRA-14611
[9] https://issues.apache.org/jira/browse/CASSANDRA-15294
[10] https://github.com/corretto/amazon-corretto-crypto-provider/issues/52#issuecomment-531921577

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Derek Chen-Becker <de...@chen-becker.org>.

https://bugs.openjdk.java.net/browse/JDK-7184394 added AES intrinsics in
Java 8, in 2012. While it's always possible to have a regression, and it's
important to understand the performance impact, stories of 2-10x sound
apocryphal. If they're all using the same intrinsics, the performance
should be roughly the same. I think that the real challenge will be key
management, not performance.

Derek

On Fri, Nov 19, 2021 at 7:41 AM Bowen Song <bo...@bso.ng.invalid> wrote:

> On the performance note, I copy & pasted a small piece of Java code to
> do AES256-CBC on the stdin and write the result to stdout. I then ran
> the following two commands on the same machine (with AES-NI) for
> comparison:
>
>     $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
>     /usr/lib/jvm/java-11-openjdk/bin/java -jar aes-bench.jar >/dev/null
>     36.24s user 5.96s system 100% cpu 41.912 total
>     $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
>     openssl enc -aes-256-cbc -e -K
>     "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
>     -iv "0123456789abcdef0123456789abcdef" >/dev/null
>     31.09s user 3.92s system 99% cpu 35.043 total
>
> This is not an accurate test of the AES performance, as the Java test
> includes the JVM start up time and the key and IV generation in the Java
> code. But this gives us a pretty good idea that the total performance
> regression is definitely far from the 2x to 10x slower claimed in some
> previous emails.
>
>
> The Java code I used:
>
>     package com.example.AesBenchmark;
>
>     import java.security.Security;
>     import java.io.File;
>     import java.io.FileInputStream;
>     import java.io.FileOutputStream;
>     import java.security.SecureRandom;
>
>     import javax.crypto.Cipher;
>     import javax.crypto.KeyGenerator;
>     import javax.crypto.SecretKey;
>     import javax.crypto.spec.IvParameterSpec;
>     import javax.crypto.spec.SecretKeySpec;
>
>     public class AesBenchmark {
>          static {
>              try {
>                  Security.setProperty("crypto.policy", "unlimited");
>              } catch (Exception e) {
>                  e.printStackTrace();
>              }
>          }
>
>          static final int BUF_LEN = 4096;
>
>          public static void main(String[] args) throws Exception
>          {
>              KeyGenerator keyGenerator = KeyGenerator.getInstance("AES");
>              keyGenerator.init(256);
>
>              // Generate Key
>              SecretKey key = keyGenerator.generateKey();
>
>              // Generating IV.
>              byte[] IV = new byte[16];
>              SecureRandom random = new SecureRandom();
>              random.nextBytes(IV);
>
>              //Get Cipher Instance
>              Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
>
>              //Create SecretKeySpec
>              SecretKeySpec keySpec = new SecretKeySpec(key.getEncoded(),
>     "AES");
>
>              //Create IvParameterSpec
>              IvParameterSpec ivSpec = new IvParameterSpec(IV);
>
>              //Initialize Cipher for ENCRYPT_MODE
>              cipher.init(Cipher.ENCRYPT_MODE, keySpec, ivSpec);
>
>              byte[] bufInput = new byte[BUF_LEN];
>              FileInputStream fis = new FileInputStream(new
>     File("/dev/stdin"));
>              FileOutputStream fos = new FileOutputStream(new
>     File("/dev/stdout"));
>              int nBytes;
>              while ((nBytes = fis.read(bufInput, 0, BUF_LEN)) != -1)
>              {
>                  fos.write(cipher.update(bufInput, 0, nBytes));
>              }
>              fos.write(cipher.doFinal());
>          }
>     }
>
> On 19/11/2021 13:28, Jeff Jirsa wrote:
> >
> > For better or worse, different threat models mean that it’s not strictly
> better to do FDE and some use cases definitely want this at the db layer
> instead of file system.
> >
> >> On Nov 19, 2021, at 12:54 PM, Joshua McKenzie<jm...@apache.org>
> wrote:
> >>
> >> 
> >>>
> >>> setting performance requirements on this regard is a
> >>> nonsense. As long as it's reasonably usable in real world, and
> Cassandra
> >>> makes the estimated effects on performance available, it will be up to
> >>> the operators to decide whether to turn on the feature
> >> I think Joey's argument, and correct me if I'm wrong, is that
> implementing
> >> a complex feature in Cassandra that we then have to manage that's
> >> essentially worse in every way compared to a built-in full-disk
> encryption
> >> option via LUKS+LVM etc is a poor use of our time and energy.
> >>
> >> i.e. we'd be better off investing our time into documenting how to do
> full
> >> disk encryption in a variety of scenarios + explaining why that is our
> >> recommended approach instead of taking the time and energy to design,
> >> implement, debug, and then maintain an inferior solution.
> >>
> >>> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie<jm...@apache.org>
> >>> wrote:
> >>>
> >>> Are you for real here?
> >>>
> >>> Please keep things cordial. Statements like this don't help move the
> >>> conversation along.
> >>>
> >>>
> >>> On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic <
> >>> stefan.miklosovic@instaclustr.com> wrote:
> >>>
> >>>> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch<jo...@gmail.com>
> wrote:
> >>>>> On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
> >>>> shylaja.kokoori@intel.com>
> >>>>> wrote:
> >>>>>
> >>>>>> To address Joey's concern, the OpenJDK JVM and its derivatives
> >>>> optimize
> >>>>>> Java crypto based on the underlying HW capabilities. For example, if
> >>>> the
> >>>>>> underlying HW supports AES-NI, JVM intrinsics will use those for
> >>>> crypto
> >>>>>> operations. Likewise, the new vector AES available on the latest
> Intel
> >>>>>> platform is utilized by the JVM while running on that platform to
> make
> >>>>>> crypto operations faster.
> >>>>>>
> >>>>> Which JDK version were you running? We have had a number of issues
> with
> >>>> the
> >>>>> JVM being 2-10x slower than native crypto on Java 8 (especially MD5,
> >>>> SHA1,
> >>>>> and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower).
> Again
> >>>> I
> >>>>> think we could get the JVM crypto penalty down to ~2x native if we
> >>>> linked
> >>>>> in e.g. ACCP by default [1, 2] but even the very best Java crypto
> I've
> >>>> seen
> >>>>> (fully utilizing hardware instructions) is still ~2x slower than
> native
> >>>>> code. The operating system has a number of advantages here in that
> they
> >>>>> don't pay JVM allocation costs or the JNI barrier (in the case of
> ACCP)
> >>>> and
> >>>>> the kernel also takes advantage of hardware instructions.
> >>>>>
> >>>>>
> >>>>>>  From our internal experiments, we see single digit % regression
> when
> >>>>>> transparent data encryption is enabled.
> >>>>>>
> >>>>> Which workloads are you testing and how are you measuring the
> >>>> regression? I
> >>>>> suspect that compaction, repair (validation compaction), streaming,
> and
> >>>>> quorum reads are probably much slower (probably ~10x slower for the
> >>>>> throughput bound operations and ~2x slower on the read path). As
> >>>>> compaction/repair/streaming usually take up between 10-20% of
> available
> >>>> CPU
> >>>>> cycles making them 2x slower might show up as <10% overall
> utilization
> >>>>> increase when you've really regressed 100% or more on key metrics
> >>>>> (compaction throughput, streaming throughput, memory allocation rate,
> >>>> etc
> >>>>> ...). For example, if compaction was able to achieve 2 MiBps of
> >>>> throughput
> >>>>> before encryption and it was only able to achieve 1MiBps of
> throughput
> >>>>> afterwards, that would be a huge real world impact to operators as
> >>>>> compactions now take twice as long.
> >>>>>
> >>>>> I think a CEP or details on the ticket that indicate the performance
> >>>> tests
> >>>>> and workloads that will be run might be wise? Perhaps something like
> >>>>> "encryption creates no more than a 1% regression of: compaction
> >>>> throughput
> >>>>> (MiBps), streaming throughput (MiBps), repair validation throughput
> >>>>> (duration of full repair on the entire cluster), read throughput at
> 10ms
> >>>>> p99 tail at quorum consistency (QPS handled while not exceeding P99
> SLO
> >>>> of
> >>>>> 10ms), etc ... while a sustained load is applied to a multi-node
> >>>> cluster"?
> >>>>
> >>>> Are you for real here?Nobody will ever guarantee you these %1 numbers
> >>>> ... come on. I think we are
> >>>> super paranoid about performance when we are not paranoid enough about
> >>>> security. This is a two way street.
> >>>> People are willing to give up on performance if security is a must.
> >>>> You do not need to use it if you do not want to,
> >>>> it is not like we are going to turn it on and you have to stick with
> >>>> that. Are you just saying that we are going to
> >>>> protect people from using some security features because their db
> >>>> might be slow? What if they just dont care?
> >>>>
> >>>>> Even a microbenchmark that just sees how long it takes to encrypt and
> >>>>> decrypt a 500MiB dataset using the proposed JVM implementation versus
> >>>>> encrypting it with a native implementation might be enough to
> >>>> confirm/deny.
> >>>>> For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
> >>>>> AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6
> >>>> GiBps
> >>>>> encryption and 1.0 GiBps decryption; from my past experiences with
> Java
> >>>>> crypto is it would achieve maybe 200 MiBps of _non-authenticated_
> AES.
> >>>>>
> >>>>> Cheers,
> >>>>> -Joey
> >>>>>
> >>>>> [1]https://issues.apache.org/jira/browse/CASSANDRA-15294
> >>>>> [2]https://github.com/corretto/amazon-corretto-crypto-provider
> >>>>> [3]https://github.com/FiloSottile/age
> >>>>> [4]https://github.com/hashbrowncipher/keypipe#encryption
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>
> >>>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail:dev-help@cassandra.apache.org
> >



-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

On the performance note, I copy & pasted a small piece of Java code to 
do AES256-CBC on the stdin and write the result to stdout. I then ran 
the following two commands on the same machine (with AES-NI) for comparison:

    $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
    /usr/lib/jvm/java-11-openjdk/bin/java -jar aes-bench.jar >/dev/null
    36.24s user 5.96s system 100% cpu 41.912 total
    $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
    openssl enc -aes-256-cbc -e -K
    "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
    -iv "0123456789abcdef0123456789abcdef" >/dev/null
    31.09s user 3.92s system 99% cpu 35.043 total

This is not an accurate test of the AES performance, as the Java test 
includes the JVM start up time and the key and IV generation in the Java 
code. But this gives us a pretty good idea that the total performance 
regression is definitely far from the 2x to 10x slower claimed in some 
previous emails.


The Java code I used:

    package com.example.AesBenchmark;

    import java.security.Security;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.security.SecureRandom;

    import javax.crypto.Cipher;
    import javax.crypto.KeyGenerator;
    import javax.crypto.SecretKey;
    import javax.crypto.spec.IvParameterSpec;
    import javax.crypto.spec.SecretKeySpec;

    public class AesBenchmark {
         static {
             try {
                 Security.setProperty("crypto.policy", "unlimited");
             } catch (Exception e) {
                 e.printStackTrace();
             }
         }

         static final int BUF_LEN = 4096;

         public static void main(String[] args) throws Exception
         {
             KeyGenerator keyGenerator = KeyGenerator.getInstance("AES");
             keyGenerator.init(256);

             // Generate Key
             SecretKey key = keyGenerator.generateKey();

             // Generating IV.
             byte[] IV = new byte[16];
             SecureRandom random = new SecureRandom();
             random.nextBytes(IV);

             //Get Cipher Instance
             Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");

             //Create SecretKeySpec
             SecretKeySpec keySpec = new SecretKeySpec(key.getEncoded(),
    "AES");

             //Create IvParameterSpec
             IvParameterSpec ivSpec = new IvParameterSpec(IV);

             //Initialize Cipher for ENCRYPT_MODE
             cipher.init(Cipher.ENCRYPT_MODE, keySpec, ivSpec);

             byte[] bufInput = new byte[BUF_LEN];
             FileInputStream fis = new FileInputStream(new
    File("/dev/stdin"));
             FileOutputStream fos = new FileOutputStream(new
    File("/dev/stdout"));
             int nBytes;
             while ((nBytes = fis.read(bufInput, 0, BUF_LEN)) != -1)
             {
                 fos.write(cipher.update(bufInput, 0, nBytes));
             }
             fos.write(cipher.doFinal());
         }
    }

On 19/11/2021 13:28, Jeff Jirsa wrote:
>
> For better or worse, different threat models mean that it’s not strictly better to do FDE and some use cases definitely want this at the db layer instead of file system.
>
>> On Nov 19, 2021, at 12:54 PM, Joshua McKenzie<jm...@apache.org>  wrote:
>>
>> 
>>>
>>> setting performance requirements on this regard is a
>>> nonsense. As long as it's reasonably usable in real world, and Cassandra
>>> makes the estimated effects on performance available, it will be up to
>>> the operators to decide whether to turn on the feature
>> I think Joey's argument, and correct me if I'm wrong, is that implementing
>> a complex feature in Cassandra that we then have to manage that's
>> essentially worse in every way compared to a built-in full-disk encryption
>> option via LUKS+LVM etc is a poor use of our time and energy.
>>
>> i.e. we'd be better off investing our time into documenting how to do full
>> disk encryption in a variety of scenarios + explaining why that is our
>> recommended approach instead of taking the time and energy to design,
>> implement, debug, and then maintain an inferior solution.
>>
>>> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie<jm...@apache.org>
>>> wrote:
>>>
>>> Are you for real here?
>>>
>>> Please keep things cordial. Statements like this don't help move the
>>> conversation along.
>>>
>>>
>>> On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic <
>>> stefan.miklosovic@instaclustr.com> wrote:
>>>
>>>> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch<jo...@gmail.com>  wrote:
>>>>> On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
>>>> shylaja.kokoori@intel.com>
>>>>> wrote:
>>>>>
>>>>>> To address Joey's concern, the OpenJDK JVM and its derivatives
>>>> optimize
>>>>>> Java crypto based on the underlying HW capabilities. For example, if
>>>> the
>>>>>> underlying HW supports AES-NI, JVM intrinsics will use those for
>>>> crypto
>>>>>> operations. Likewise, the new vector AES available on the latest Intel
>>>>>> platform is utilized by the JVM while running on that platform to make
>>>>>> crypto operations faster.
>>>>>>
>>>>> Which JDK version were you running? We have had a number of issues with
>>>> the
>>>>> JVM being 2-10x slower than native crypto on Java 8 (especially MD5,
>>>> SHA1,
>>>>> and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again
>>>> I
>>>>> think we could get the JVM crypto penalty down to ~2x native if we
>>>> linked
>>>>> in e.g. ACCP by default [1, 2] but even the very best Java crypto I've
>>>> seen
>>>>> (fully utilizing hardware instructions) is still ~2x slower than native
>>>>> code. The operating system has a number of advantages here in that they
>>>>> don't pay JVM allocation costs or the JNI barrier (in the case of ACCP)
>>>> and
>>>>> the kernel also takes advantage of hardware instructions.
>>>>>
>>>>>
>>>>>>  From our internal experiments, we see single digit % regression when
>>>>>> transparent data encryption is enabled.
>>>>>>
>>>>> Which workloads are you testing and how are you measuring the
>>>> regression? I
>>>>> suspect that compaction, repair (validation compaction), streaming, and
>>>>> quorum reads are probably much slower (probably ~10x slower for the
>>>>> throughput bound operations and ~2x slower on the read path). As
>>>>> compaction/repair/streaming usually take up between 10-20% of available
>>>> CPU
>>>>> cycles making them 2x slower might show up as <10% overall utilization
>>>>> increase when you've really regressed 100% or more on key metrics
>>>>> (compaction throughput, streaming throughput, memory allocation rate,
>>>> etc
>>>>> ...). For example, if compaction was able to achieve 2 MiBps of
>>>> throughput
>>>>> before encryption and it was only able to achieve 1MiBps of throughput
>>>>> afterwards, that would be a huge real world impact to operators as
>>>>> compactions now take twice as long.
>>>>>
>>>>> I think a CEP or details on the ticket that indicate the performance
>>>> tests
>>>>> and workloads that will be run might be wise? Perhaps something like
>>>>> "encryption creates no more than a 1% regression of: compaction
>>>> throughput
>>>>> (MiBps), streaming throughput (MiBps), repair validation throughput
>>>>> (duration of full repair on the entire cluster), read throughput at 10ms
>>>>> p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO
>>>> of
>>>>> 10ms), etc ... while a sustained load is applied to a multi-node
>>>> cluster"?
>>>>
>>>> Are you for real here?Nobody will ever guarantee you these %1 numbers
>>>> ... come on. I think we are
>>>> super paranoid about performance when we are not paranoid enough about
>>>> security. This is a two way street.
>>>> People are willing to give up on performance if security is a must.
>>>> You do not need to use it if you do not want to,
>>>> it is not like we are going to turn it on and you have to stick with
>>>> that. Are you just saying that we are going to
>>>> protect people from using some security features because their db
>>>> might be slow? What if they just dont care?
>>>>
>>>>> Even a microbenchmark that just sees how long it takes to encrypt and
>>>>> decrypt a 500MiB dataset using the proposed JVM implementation versus
>>>>> encrypting it with a native implementation might be enough to
>>>> confirm/deny.
>>>>> For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
>>>>> AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6
>>>> GiBps
>>>>> encryption and 1.0 GiBps decryption; from my past experiences with Java
>>>>> crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.
>>>>>
>>>>> Cheers,
>>>>> -Joey
>>>>>
>>>>> [1]https://issues.apache.org/jira/browse/CASSANDRA-15294
>>>>> [2]https://github.com/corretto/amazon-corretto-crypto-provider
>>>>> [3]https://github.com/FiloSottile/age
>>>>> [4]https://github.com/hashbrowncipher/keypipe#encryption
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>
>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail:dev-help@cassandra.apache.org
>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Jeff Jirsa <jj...@gmail.com>.


For better or worse, different threat models mean that it’s not strictly better to do FDE and some use cases definitely want this at the db layer instead of file system. 

> On Nov 19, 2021, at 12:54 PM, Joshua McKenzie <jm...@apache.org> wrote:
> 
> 
>> 
>> 
>> setting performance requirements on this regard is a
>> nonsense. As long as it's reasonably usable in real world, and Cassandra
>> makes the estimated effects on performance available, it will be up to
>> the operators to decide whether to turn on the feature
> 
> I think Joey's argument, and correct me if I'm wrong, is that implementing
> a complex feature in Cassandra that we then have to manage that's
> essentially worse in every way compared to a built-in full-disk encryption
> option via LUKS+LVM etc is a poor use of our time and energy.
> 
> i.e. we'd be better off investing our time into documenting how to do full
> disk encryption in a variety of scenarios + explaining why that is our
> recommended approach instead of taking the time and energy to design,
> implement, debug, and then maintain an inferior solution.
> 
>> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie <jm...@apache.org>
>> wrote:
>> 
>> Are you for real here?
>> 
>> Please keep things cordial. Statements like this don't help move the
>> conversation along.
>> 
>> 
>> On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic <
>> stefan.miklosovic@instaclustr.com> wrote:
>> 
>>> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <jo...@gmail.com> wrote:
>>>> 
>>>> On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
>>> shylaja.kokoori@intel.com>
>>>> wrote:
>>>> 
>>>>> To address Joey's concern, the OpenJDK JVM and its derivatives
>>> optimize
>>>>> Java crypto based on the underlying HW capabilities. For example, if
>>> the
>>>>> underlying HW supports AES-NI, JVM intrinsics will use those for
>>> crypto
>>>>> operations. Likewise, the new vector AES available on the latest Intel
>>>>> platform is utilized by the JVM while running on that platform to make
>>>>> crypto operations faster.
>>>>> 
>>>> 
>>>> Which JDK version were you running? We have had a number of issues with
>>> the
>>>> JVM being 2-10x slower than native crypto on Java 8 (especially MD5,
>>> SHA1,
>>>> and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again
>>> I
>>>> think we could get the JVM crypto penalty down to ~2x native if we
>>> linked
>>>> in e.g. ACCP by default [1, 2] but even the very best Java crypto I've
>>> seen
>>>> (fully utilizing hardware instructions) is still ~2x slower than native
>>>> code. The operating system has a number of advantages here in that they
>>>> don't pay JVM allocation costs or the JNI barrier (in the case of ACCP)
>>> and
>>>> the kernel also takes advantage of hardware instructions.
>>>> 
>>>> 
>>>>> From our internal experiments, we see single digit % regression when
>>>>> transparent data encryption is enabled.
>>>>> 
>>>> 
>>>> Which workloads are you testing and how are you measuring the
>>> regression? I
>>>> suspect that compaction, repair (validation compaction), streaming, and
>>>> quorum reads are probably much slower (probably ~10x slower for the
>>>> throughput bound operations and ~2x slower on the read path). As
>>>> compaction/repair/streaming usually take up between 10-20% of available
>>> CPU
>>>> cycles making them 2x slower might show up as <10% overall utilization
>>>> increase when you've really regressed 100% or more on key metrics
>>>> (compaction throughput, streaming throughput, memory allocation rate,
>>> etc
>>>> ...). For example, if compaction was able to achieve 2 MiBps of
>>> throughput
>>>> before encryption and it was only able to achieve 1MiBps of throughput
>>>> afterwards, that would be a huge real world impact to operators as
>>>> compactions now take twice as long.
>>>> 
>>>> I think a CEP or details on the ticket that indicate the performance
>>> tests
>>>> and workloads that will be run might be wise? Perhaps something like
>>>> "encryption creates no more than a 1% regression of: compaction
>>> throughput
>>>> (MiBps), streaming throughput (MiBps), repair validation throughput
>>>> (duration of full repair on the entire cluster), read throughput at 10ms
>>>> p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO
>>> of
>>>> 10ms), etc ... while a sustained load is applied to a multi-node
>>> cluster"?
>>> 
>>> Are you for real here?Nobody will ever guarantee you these %1 numbers
>>> ... come on. I think we are
>>> super paranoid about performance when we are not paranoid enough about
>>> security. This is a two way street.
>>> People are willing to give up on performance if security is a must.
>>> You do not need to use it if you do not want to,
>>> it is not like we are going to turn it on and you have to stick with
>>> that. Are you just saying that we are going to
>>> protect people from using some security features because their db
>>> might be slow? What if they just dont care?
>>> 
>>>> Even a microbenchmark that just sees how long it takes to encrypt and
>>>> decrypt a 500MiB dataset using the proposed JVM implementation versus
>>>> encrypting it with a native implementation might be enough to
>>> confirm/deny.
>>>> For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
>>>> AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6
>>> GiBps
>>>> encryption and 1.0 GiBps decryption; from my past experiences with Java
>>>> crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.
>>>> 
>>>> Cheers,
>>>> -Joey
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
>>>> [2] https://github.com/corretto/amazon-corretto-crypto-provider
>>>> [3] https://github.com/FiloSottile/age
>>>> [4] https://github.com/hashbrowncipher/keypipe#encryption
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

> I think Joey's argument, and correct me if I'm wrong, is that implementing
> a complex feature in Cassandra that we then have to manage that's
> essentially worse in every way compared to a built-in full-disk encryption
> option via LUKS+LVM etc is a poor use of our time and energy.
>
> i.e. we'd be better off investing our time into documenting how to do full
> disk encryption in a variety of scenarios + explaining why that is our
> recommended approach instead of taking the time and energy to design,
> implement, debug, and then maintain an inferior solution.
>

Yes this is my argument. I also worry we're underestimating how hard
this is to do.

-Joey

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Maulin Vasavada <ma...@gmail.com>.

Basically, we also have to think about how operable these changes will be
for operators in multi-tenant, multi-cluster/dc environments w.r.t. key
rotations, security, key deployments etc.

On Fri, Nov 19, 2021 at 8:03 PM Maulin Vasavada <ma...@gmail.com>
wrote:

> Hi all
>
> Really interesting discussion. I started reading this thread and still
> have to catch-up a lot but based on my experience many big organizations
> ultimately settle on having over-the-wire encryption combined with OS/disk
> encryption to comply with the security requirements for various reasons
> like,
> 1. Potential performance challenges at high scale of data
> movement/mirroring
> 2. Internal security groups/zoning structures and restrictions (like
> restrictions on key sharing between zones etc which makes
> mirroring/replication for cross-zone impossible)
> 3. Management/maintenance of the in-house Key Management System is quite a
> challenging overhead for on-prem installations and when things move to
> cloud, ultimately organizations opt-in for the cloud provider's on-disk
> encryption and having over-the-wire encryption with TLS or using SASL over
> SSL because the whole application migration/adoption becomes multi-year
> challenging program.
>
> We experienced challenges even with AES-NI/JDK9+/Kernel TLS on Linux but
> that was because we were looking at per-message (in Kafka world) encryption
> with asymmetric envelope so it could be off the context here.
>
> None-the-less I will read the thread in more detail just to gain more
> knowledge, it has been really a great technical discussion.
>
> Thanks
> Maulin
>
>
>
>
> On Fri, Nov 19, 2021 at 2:05 PM Kokoori, Shylaja <
> shylaja.kokoori@intel.com> wrote:
>
>> I agree with Joey, kernel also should be able to take advantage of the
>> crypto acceleration.
>>
>> I also want to add, since performance of JDK is a concern here, newer
>> Intel Icelake server platforms supports VAES and SHA-NI which further
>> accelerates AES-GCM perf by 2x and SHA1 perf by ~6x using JDK 11.
>>
>> Some configuration information for the tests I ran.
>>
>>     - JDK version used was JDK14 (should behave similarly with JDK11
>> also).
>>     - Since the tests were done before 4.0 GA'd, Cassandra version used
>> was 4.0-beta3. Dataset size was ~500G
>>     - Workloads tested were 100% reads, 100% updates & 80:20 mix with
>> cassandra-stress. I have not tested streaming yet.
>>
>> I would be happy to provide additional data points or make necessary code
>> changes based on recommendations from folks here.
>>
>> Thanks,
>> Shylaja
>>
>> -----Original Message-----
>> From: Joshua McKenzie <jm...@apache.org>
>> Sent: Friday, November 19, 2021 4:53 AM
>> To: dev@cassandra.apache.org
>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>>
>> >
>> > setting performance requirements on this regard is a nonsense. As long
>> > as it's reasonably usable in real world, and Cassandra makes the
>> > estimated effects on performance available, it will be up to the
>> > operators to decide whether to turn on the feature
>>
>> I think Joey's argument, and correct me if I'm wrong, is that
>> implementing a complex feature in Cassandra that we then have to manage
>> that's essentially worse in every way compared to a built-in full-disk
>> encryption option via LUKS+LVM etc is a poor use of our time and energy.
>>
>> i.e. we'd be better off investing our time into documenting how to do
>> full disk encryption in a variety of scenarios + explaining why that is our
>> recommended approach instead of taking the time and energy to design,
>> implement, debug, and then maintain an inferior solution.
>>
>> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie <jm...@apache.org>
>> wrote:
>>
>> > Are you for real here?
>> >
>> > Please keep things cordial. Statements like this don't help move the
>> > conversation along.
>> >
>> >
>> > On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic <
>> > stefan.miklosovic@instaclustr.com> wrote:
>> >
>> >> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <jo...@gmail.com>
>> wrote:
>> >> >
>> >> > On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
>> >> shylaja.kokoori@intel.com>
>> >> > wrote:
>> >> >
>> >> > > To address Joey's concern, the OpenJDK JVM and its derivatives
>> >> optimize
>> >> > > Java crypto based on the underlying HW capabilities. For example,
>> >> > > if
>> >> the
>> >> > > underlying HW supports AES-NI, JVM intrinsics will use those for
>> >> crypto
>> >> > > operations. Likewise, the new vector AES available on the latest
>> >> > > Intel platform is utilized by the JVM while running on that
>> >> > > platform to make crypto operations faster.
>> >> > >
>> >> >
>> >> > Which JDK version were you running? We have had a number of issues
>> >> > with
>> >> the
>> >> > JVM being 2-10x slower than native crypto on Java 8 (especially
>> >> > MD5,
>> >> SHA1,
>> >> > and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower).
>> >> > Again
>> >> I
>> >> > think we could get the JVM crypto penalty down to ~2x native if we
>> >> linked
>> >> > in e.g. ACCP by default [1, 2] but even the very best Java crypto
>> >> > I've
>> >> seen
>> >> > (fully utilizing hardware instructions) is still ~2x slower than
>> >> > native code. The operating system has a number of advantages here
>> >> > in that they don't pay JVM allocation costs or the JNI barrier (in
>> >> > the case of ACCP)
>> >> and
>> >> > the kernel also takes advantage of hardware instructions.
>> >> >
>> >> >
>> >> > > From our internal experiments, we see single digit % regression
>> >> > > when transparent data encryption is enabled.
>> >> > >
>> >> >
>> >> > Which workloads are you testing and how are you measuring the
>> >> regression? I
>> >> > suspect that compaction, repair (validation compaction), streaming,
>> >> > and quorum reads are probably much slower (probably ~10x slower for
>> >> > the throughput bound operations and ~2x slower on the read path).
>> >> > As compaction/repair/streaming usually take up between 10-20% of
>> >> > available
>> >> CPU
>> >> > cycles making them 2x slower might show up as <10% overall
>> >> > utilization increase when you've really regressed 100% or more on
>> >> > key metrics (compaction throughput, streaming throughput, memory
>> >> > allocation rate,
>> >> etc
>> >> > ...). For example, if compaction was able to achieve 2 MiBps of
>> >> throughput
>> >> > before encryption and it was only able to achieve 1MiBps of
>> >> > throughput afterwards, that would be a huge real world impact to
>> >> > operators as compactions now take twice as long.
>> >> >
>> >> > I think a CEP or details on the ticket that indicate the
>> >> > performance
>> >> tests
>> >> > and workloads that will be run might be wise? Perhaps something
>> >> > like "encryption creates no more than a 1% regression of:
>> >> > compaction
>> >> throughput
>> >> > (MiBps), streaming throughput (MiBps), repair validation throughput
>> >> > (duration of full repair on the entire cluster), read throughput at
>> >> > 10ms
>> >> > p99 tail at quorum consistency (QPS handled while not exceeding P99
>> >> > SLO
>> >> of
>> >> > 10ms), etc ... while a sustained load is applied to a multi-node
>> >> cluster"?
>> >>
>> >> Are you for real here?Nobody will ever guarantee you these %1 numbers
>> >> ... come on. I think we are super paranoid about performance when we
>> >> are not paranoid enough about security. This is a two way street.
>> >> People are willing to give up on performance if security is a must.
>> >> You do not need to use it if you do not want to, it is not like we
>> >> are going to turn it on and you have to stick with that. Are you just
>> >> saying that we are going to protect people from using some security
>> >> features because their db might be slow? What if they just dont care?
>> >>
>> >> > Even a microbenchmark that just sees how long it takes to encrypt
>> >> > and decrypt a 500MiB dataset using the proposed JVM implementation
>> >> > versus encrypting it with a native implementation might be enough
>> >> > to
>> >> confirm/deny.
>> >> > For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric
>> >> > of AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about
>> >> > 1.6
>> >> GiBps
>> >> > encryption and 1.0 GiBps decryption; from my past experiences with
>> >> > Java crypto is it would achieve maybe 200 MiBps of
>> _non-authenticated_ AES.
>> >> >
>> >> > Cheers,
>> >> > -Joey
>> >> >
>> >> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
>> >> > [2] https://github.com/corretto/amazon-corretto-crypto-provider
>> >> > [3] https://github.com/FiloSottile/age
>> >> > [4] https://github.com/hashbrowncipher/keypipe#encryption
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> >> For additional commands, e-mail: dev-help@cassandra.apache.org
>> >>
>> >>
>>
>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Maulin Vasavada <ma...@gmail.com>.

Hi all

Really interesting discussion. I started reading this thread and still have
to catch-up a lot but based on my experience many big organizations
ultimately settle on having over-the-wire encryption combined with OS/disk
encryption to comply with the security requirements for various reasons
like,
1. Potential performance challenges at high scale of data movement/mirroring
2. Internal security groups/zoning structures and restrictions (like
restrictions on key sharing between zones etc which makes
mirroring/replication for cross-zone impossible)
3. Management/maintenance of the in-house Key Management System is quite a
challenging overhead for on-prem installations and when things move to
cloud, ultimately organizations opt-in for the cloud provider's on-disk
encryption and having over-the-wire encryption with TLS or using SASL over
SSL because the whole application migration/adoption becomes multi-year
challenging program.

We experienced challenges even with AES-NI/JDK9+/Kernel TLS on Linux but
that was because we were looking at per-message (in Kafka world) encryption
with asymmetric envelope so it could be off the context here.

None-the-less I will read the thread in more detail just to gain more
knowledge, it has been really a great technical discussion.

Thanks
Maulin




On Fri, Nov 19, 2021 at 2:05 PM Kokoori, Shylaja <sh...@intel.com>
wrote:

> I agree with Joey, kernel also should be able to take advantage of the
> crypto acceleration.
>
> I also want to add, since performance of JDK is a concern here, newer
> Intel Icelake server platforms supports VAES and SHA-NI which further
> accelerates AES-GCM perf by 2x and SHA1 perf by ~6x using JDK 11.
>
> Some configuration information for the tests I ran.
>
>     - JDK version used was JDK14 (should behave similarly with JDK11
> also).
>     - Since the tests were done before 4.0 GA'd, Cassandra version used
> was 4.0-beta3. Dataset size was ~500G
>     - Workloads tested were 100% reads, 100% updates & 80:20 mix with
> cassandra-stress. I have not tested streaming yet.
>
> I would be happy to provide additional data points or make necessary code
> changes based on recommendations from folks here.
>
> Thanks,
> Shylaja
>
> -----Original Message-----
> From: Joshua McKenzie <jm...@apache.org>
> Sent: Friday, November 19, 2021 4:53 AM
> To: dev@cassandra.apache.org
> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>
> >
> > setting performance requirements on this regard is a nonsense. As long
> > as it's reasonably usable in real world, and Cassandra makes the
> > estimated effects on performance available, it will be up to the
> > operators to decide whether to turn on the feature
>
> I think Joey's argument, and correct me if I'm wrong, is that implementing
> a complex feature in Cassandra that we then have to manage that's
> essentially worse in every way compared to a built-in full-disk encryption
> option via LUKS+LVM etc is a poor use of our time and energy.
>
> i.e. we'd be better off investing our time into documenting how to do full
> disk encryption in a variety of scenarios + explaining why that is our
> recommended approach instead of taking the time and energy to design,
> implement, debug, and then maintain an inferior solution.
>
> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie <jm...@apache.org>
> wrote:
>
> > Are you for real here?
> >
> > Please keep things cordial. Statements like this don't help move the
> > conversation along.
> >
> >
> > On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic <
> > stefan.miklosovic@instaclustr.com> wrote:
> >
> >> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <jo...@gmail.com>
> wrote:
> >> >
> >> > On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
> >> shylaja.kokoori@intel.com>
> >> > wrote:
> >> >
> >> > > To address Joey's concern, the OpenJDK JVM and its derivatives
> >> optimize
> >> > > Java crypto based on the underlying HW capabilities. For example,
> >> > > if
> >> the
> >> > > underlying HW supports AES-NI, JVM intrinsics will use those for
> >> crypto
> >> > > operations. Likewise, the new vector AES available on the latest
> >> > > Intel platform is utilized by the JVM while running on that
> >> > > platform to make crypto operations faster.
> >> > >
> >> >
> >> > Which JDK version were you running? We have had a number of issues
> >> > with
> >> the
> >> > JVM being 2-10x slower than native crypto on Java 8 (especially
> >> > MD5,
> >> SHA1,
> >> > and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower).
> >> > Again
> >> I
> >> > think we could get the JVM crypto penalty down to ~2x native if we
> >> linked
> >> > in e.g. ACCP by default [1, 2] but even the very best Java crypto
> >> > I've
> >> seen
> >> > (fully utilizing hardware instructions) is still ~2x slower than
> >> > native code. The operating system has a number of advantages here
> >> > in that they don't pay JVM allocation costs or the JNI barrier (in
> >> > the case of ACCP)
> >> and
> >> > the kernel also takes advantage of hardware instructions.
> >> >
> >> >
> >> > > From our internal experiments, we see single digit % regression
> >> > > when transparent data encryption is enabled.
> >> > >
> >> >
> >> > Which workloads are you testing and how are you measuring the
> >> regression? I
> >> > suspect that compaction, repair (validation compaction), streaming,
> >> > and quorum reads are probably much slower (probably ~10x slower for
> >> > the throughput bound operations and ~2x slower on the read path).
> >> > As compaction/repair/streaming usually take up between 10-20% of
> >> > available
> >> CPU
> >> > cycles making them 2x slower might show up as <10% overall
> >> > utilization increase when you've really regressed 100% or more on
> >> > key metrics (compaction throughput, streaming throughput, memory
> >> > allocation rate,
> >> etc
> >> > ...). For example, if compaction was able to achieve 2 MiBps of
> >> throughput
> >> > before encryption and it was only able to achieve 1MiBps of
> >> > throughput afterwards, that would be a huge real world impact to
> >> > operators as compactions now take twice as long.
> >> >
> >> > I think a CEP or details on the ticket that indicate the
> >> > performance
> >> tests
> >> > and workloads that will be run might be wise? Perhaps something
> >> > like "encryption creates no more than a 1% regression of:
> >> > compaction
> >> throughput
> >> > (MiBps), streaming throughput (MiBps), repair validation throughput
> >> > (duration of full repair on the entire cluster), read throughput at
> >> > 10ms
> >> > p99 tail at quorum consistency (QPS handled while not exceeding P99
> >> > SLO
> >> of
> >> > 10ms), etc ... while a sustained load is applied to a multi-node
> >> cluster"?
> >>
> >> Are you for real here?Nobody will ever guarantee you these %1 numbers
> >> ... come on. I think we are super paranoid about performance when we
> >> are not paranoid enough about security. This is a two way street.
> >> People are willing to give up on performance if security is a must.
> >> You do not need to use it if you do not want to, it is not like we
> >> are going to turn it on and you have to stick with that. Are you just
> >> saying that we are going to protect people from using some security
> >> features because their db might be slow? What if they just dont care?
> >>
> >> > Even a microbenchmark that just sees how long it takes to encrypt
> >> > and decrypt a 500MiB dataset using the proposed JVM implementation
> >> > versus encrypting it with a native implementation might be enough
> >> > to
> >> confirm/deny.
> >> > For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric
> >> > of AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about
> >> > 1.6
> >> GiBps
> >> > encryption and 1.0 GiBps decryption; from my past experiences with
> >> > Java crypto is it would achieve maybe 200 MiBps of
> _non-authenticated_ AES.
> >> >
> >> > Cheers,
> >> > -Joey
> >> >
> >> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
> >> > [2] https://github.com/corretto/amazon-corretto-crypto-provider
> >> > [3] https://github.com/FiloSottile/age
> >> > [4] https://github.com/hashbrowncipher/keypipe#encryption
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>

RE: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "Kokoori, Shylaja" <sh...@intel.com>.

I agree with Joey, kernel also should be able to take advantage of the crypto acceleration. 

I also want to add, since performance of JDK is a concern here, newer Intel Icelake server platforms supports VAES and SHA-NI which further accelerates AES-GCM perf by 2x and SHA1 perf by ~6x using JDK 11.
 
Some configuration information for the tests I ran. 

    - JDK version used was JDK14 (should behave similarly with JDK11 also). 
    - Since the tests were done before 4.0 GA'd, Cassandra version used was 4.0-beta3. Dataset size was ~500G
    - Workloads tested were 100% reads, 100% updates & 80:20 mix with cassandra-stress. I have not tested streaming yet.

I would be happy to provide additional data points or make necessary code changes based on recommendations from folks here.

Thanks,
Shylaja

-----Original Message-----
From: Joshua McKenzie <jm...@apache.org> 
Sent: Friday, November 19, 2021 4:53 AM
To: dev@cassandra.apache.org
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption

>
> setting performance requirements on this regard is a nonsense. As long 
> as it's reasonably usable in real world, and Cassandra makes the 
> estimated effects on performance available, it will be up to the 
> operators to decide whether to turn on the feature

I think Joey's argument, and correct me if I'm wrong, is that implementing a complex feature in Cassandra that we then have to manage that's essentially worse in every way compared to a built-in full-disk encryption option via LUKS+LVM etc is a poor use of our time and energy.

i.e. we'd be better off investing our time into documenting how to do full disk encryption in a variety of scenarios + explaining why that is our recommended approach instead of taking the time and energy to design, implement, debug, and then maintain an inferior solution.

On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie <jm...@apache.org>
wrote:

> Are you for real here?
>
> Please keep things cordial. Statements like this don't help move the 
> conversation along.
>
>
> On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic < 
> stefan.miklosovic@instaclustr.com> wrote:
>
>> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <jo...@gmail.com> wrote:
>> >
>> > On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
>> shylaja.kokoori@intel.com>
>> > wrote:
>> >
>> > > To address Joey's concern, the OpenJDK JVM and its derivatives
>> optimize
>> > > Java crypto based on the underlying HW capabilities. For example, 
>> > > if
>> the
>> > > underlying HW supports AES-NI, JVM intrinsics will use those for
>> crypto
>> > > operations. Likewise, the new vector AES available on the latest 
>> > > Intel platform is utilized by the JVM while running on that 
>> > > platform to make crypto operations faster.
>> > >
>> >
>> > Which JDK version were you running? We have had a number of issues 
>> > with
>> the
>> > JVM being 2-10x slower than native crypto on Java 8 (especially 
>> > MD5,
>> SHA1,
>> > and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). 
>> > Again
>> I
>> > think we could get the JVM crypto penalty down to ~2x native if we
>> linked
>> > in e.g. ACCP by default [1, 2] but even the very best Java crypto 
>> > I've
>> seen
>> > (fully utilizing hardware instructions) is still ~2x slower than 
>> > native code. The operating system has a number of advantages here 
>> > in that they don't pay JVM allocation costs or the JNI barrier (in 
>> > the case of ACCP)
>> and
>> > the kernel also takes advantage of hardware instructions.
>> >
>> >
>> > > From our internal experiments, we see single digit % regression 
>> > > when transparent data encryption is enabled.
>> > >
>> >
>> > Which workloads are you testing and how are you measuring the
>> regression? I
>> > suspect that compaction, repair (validation compaction), streaming, 
>> > and quorum reads are probably much slower (probably ~10x slower for 
>> > the throughput bound operations and ~2x slower on the read path). 
>> > As compaction/repair/streaming usually take up between 10-20% of 
>> > available
>> CPU
>> > cycles making them 2x slower might show up as <10% overall 
>> > utilization increase when you've really regressed 100% or more on 
>> > key metrics (compaction throughput, streaming throughput, memory 
>> > allocation rate,
>> etc
>> > ...). For example, if compaction was able to achieve 2 MiBps of
>> throughput
>> > before encryption and it was only able to achieve 1MiBps of 
>> > throughput afterwards, that would be a huge real world impact to 
>> > operators as compactions now take twice as long.
>> >
>> > I think a CEP or details on the ticket that indicate the 
>> > performance
>> tests
>> > and workloads that will be run might be wise? Perhaps something 
>> > like "encryption creates no more than a 1% regression of: 
>> > compaction
>> throughput
>> > (MiBps), streaming throughput (MiBps), repair validation throughput 
>> > (duration of full repair on the entire cluster), read throughput at 
>> > 10ms
>> > p99 tail at quorum consistency (QPS handled while not exceeding P99 
>> > SLO
>> of
>> > 10ms), etc ... while a sustained load is applied to a multi-node
>> cluster"?
>>
>> Are you for real here?Nobody will ever guarantee you these %1 numbers 
>> ... come on. I think we are super paranoid about performance when we 
>> are not paranoid enough about security. This is a two way street.
>> People are willing to give up on performance if security is a must.
>> You do not need to use it if you do not want to, it is not like we 
>> are going to turn it on and you have to stick with that. Are you just 
>> saying that we are going to protect people from using some security 
>> features because their db might be slow? What if they just dont care?
>>
>> > Even a microbenchmark that just sees how long it takes to encrypt 
>> > and decrypt a 500MiB dataset using the proposed JVM implementation 
>> > versus encrypting it with a native implementation might be enough 
>> > to
>> confirm/deny.
>> > For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric 
>> > of AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 
>> > 1.6
>> GiBps
>> > encryption and 1.0 GiBps decryption; from my past experiences with 
>> > Java crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.
>> >
>> > Cheers,
>> > -Joey
>> >
>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
>> > [2] https://github.com/corretto/amazon-corretto-crypto-provider
>> > [3] https://github.com/FiloSottile/age
>> > [4] https://github.com/hashbrowncipher/keypipe#encryption
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joshua McKenzie <jm...@apache.org>.

>
> setting performance requirements on this regard is a
> nonsense. As long as it's reasonably usable in real world, and Cassandra
> makes the estimated effects on performance available, it will be up to
> the operators to decide whether to turn on the feature

I think Joey's argument, and correct me if I'm wrong, is that implementing
a complex feature in Cassandra that we then have to manage that's
essentially worse in every way compared to a built-in full-disk encryption
option via LUKS+LVM etc is a poor use of our time and energy.

i.e. we'd be better off investing our time into documenting how to do full
disk encryption in a variety of scenarios + explaining why that is our
recommended approach instead of taking the time and energy to design,
implement, debug, and then maintain an inferior solution.

On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie <jm...@apache.org>
wrote:

> Are you for real here?
>
> Please keep things cordial. Statements like this don't help move the
> conversation along.
>
>
> On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
>> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <jo...@gmail.com> wrote:
>> >
>> > On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
>> shylaja.kokoori@intel.com>
>> > wrote:
>> >
>> > > To address Joey's concern, the OpenJDK JVM and its derivatives
>> optimize
>> > > Java crypto based on the underlying HW capabilities. For example, if
>> the
>> > > underlying HW supports AES-NI, JVM intrinsics will use those for
>> crypto
>> > > operations. Likewise, the new vector AES available on the latest Intel
>> > > platform is utilized by the JVM while running on that platform to make
>> > > crypto operations faster.
>> > >
>> >
>> > Which JDK version were you running? We have had a number of issues with
>> the
>> > JVM being 2-10x slower than native crypto on Java 8 (especially MD5,
>> SHA1,
>> > and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again
>> I
>> > think we could get the JVM crypto penalty down to ~2x native if we
>> linked
>> > in e.g. ACCP by default [1, 2] but even the very best Java crypto I've
>> seen
>> > (fully utilizing hardware instructions) is still ~2x slower than native
>> > code. The operating system has a number of advantages here in that they
>> > don't pay JVM allocation costs or the JNI barrier (in the case of ACCP)
>> and
>> > the kernel also takes advantage of hardware instructions.
>> >
>> >
>> > > From our internal experiments, we see single digit % regression when
>> > > transparent data encryption is enabled.
>> > >
>> >
>> > Which workloads are you testing and how are you measuring the
>> regression? I
>> > suspect that compaction, repair (validation compaction), streaming, and
>> > quorum reads are probably much slower (probably ~10x slower for the
>> > throughput bound operations and ~2x slower on the read path). As
>> > compaction/repair/streaming usually take up between 10-20% of available
>> CPU
>> > cycles making them 2x slower might show up as <10% overall utilization
>> > increase when you've really regressed 100% or more on key metrics
>> > (compaction throughput, streaming throughput, memory allocation rate,
>> etc
>> > ...). For example, if compaction was able to achieve 2 MiBps of
>> throughput
>> > before encryption and it was only able to achieve 1MiBps of throughput
>> > afterwards, that would be a huge real world impact to operators as
>> > compactions now take twice as long.
>> >
>> > I think a CEP or details on the ticket that indicate the performance
>> tests
>> > and workloads that will be run might be wise? Perhaps something like
>> > "encryption creates no more than a 1% regression of: compaction
>> throughput
>> > (MiBps), streaming throughput (MiBps), repair validation throughput
>> > (duration of full repair on the entire cluster), read throughput at 10ms
>> > p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO
>> of
>> > 10ms), etc ... while a sustained load is applied to a multi-node
>> cluster"?
>>
>> Are you for real here?Nobody will ever guarantee you these %1 numbers
>> ... come on. I think we are
>> super paranoid about performance when we are not paranoid enough about
>> security. This is a two way street.
>> People are willing to give up on performance if security is a must.
>> You do not need to use it if you do not want to,
>> it is not like we are going to turn it on and you have to stick with
>> that. Are you just saying that we are going to
>> protect people from using some security features because their db
>> might be slow? What if they just dont care?
>>
>> > Even a microbenchmark that just sees how long it takes to encrypt and
>> > decrypt a 500MiB dataset using the proposed JVM implementation versus
>> > encrypting it with a native implementation might be enough to
>> confirm/deny.
>> > For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
>> > AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6
>> GiBps
>> > encryption and 1.0 GiBps decryption; from my past experiences with Java
>> > crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.
>> >
>> > Cheers,
>> > -Joey
>> >
>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
>> > [2] https://github.com/corretto/amazon-corretto-crypto-provider
>> > [3] https://github.com/FiloSottile/age
>> > [4] https://github.com/hashbrowncipher/keypipe#encryption
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <jo...@gmail.com> wrote:
>
> On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <sh...@intel.com>
> wrote:
>
> > To address Joey's concern, the OpenJDK JVM and its derivatives optimize
> > Java crypto based on the underlying HW capabilities. For example, if the
> > underlying HW supports AES-NI, JVM intrinsics will use those for crypto
> > operations. Likewise, the new vector AES available on the latest Intel
> > platform is utilized by the JVM while running on that platform to make
> > crypto operations faster.
> >
>
> Which JDK version were you running? We have had a number of issues with the
> JVM being 2-10x slower than native crypto on Java 8 (especially MD5, SHA1,
> and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again I
> think we could get the JVM crypto penalty down to ~2x native if we linked
> in e.g. ACCP by default [1, 2] but even the very best Java crypto I've seen
> (fully utilizing hardware instructions) is still ~2x slower than native
> code. The operating system has a number of advantages here in that they
> don't pay JVM allocation costs or the JNI barrier (in the case of ACCP) and
> the kernel also takes advantage of hardware instructions.
>
>
> > From our internal experiments, we see single digit % regression when
> > transparent data encryption is enabled.
> >
>
> Which workloads are you testing and how are you measuring the regression? I
> suspect that compaction, repair (validation compaction), streaming, and
> quorum reads are probably much slower (probably ~10x slower for the
> throughput bound operations and ~2x slower on the read path). As
> compaction/repair/streaming usually take up between 10-20% of available CPU
> cycles making them 2x slower might show up as <10% overall utilization
> increase when you've really regressed 100% or more on key metrics
> (compaction throughput, streaming throughput, memory allocation rate, etc
> ...). For example, if compaction was able to achieve 2 MiBps of throughput
> before encryption and it was only able to achieve 1MiBps of throughput
> afterwards, that would be a huge real world impact to operators as
> compactions now take twice as long.
>
> I think a CEP or details on the ticket that indicate the performance tests
> and workloads that will be run might be wise? Perhaps something like
> "encryption creates no more than a 1% regression of: compaction throughput
> (MiBps), streaming throughput (MiBps), repair validation throughput
> (duration of full repair on the entire cluster), read throughput at 10ms
> p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO of
> 10ms), etc ... while a sustained load is applied to a multi-node cluster"?

Are you for real here?Nobody will ever guarantee you these %1 numbers
... come on. I think we are
super paranoid about performance when we are not paranoid enough about
security. This is a two way street.
People are willing to give up on performance if security is a must.
You do not need to use it if you do not want to,
it is not like we are going to turn it on and you have to stick with
that. Are you just saying that we are going to
protect people from using some security features because their db
might be slow? What if they just dont care?

> Even a microbenchmark that just sees how long it takes to encrypt and
> decrypt a 500MiB dataset using the proposed JVM implementation versus
> encrypting it with a native implementation might be enough to confirm/deny.
> For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
> AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6 GiBps
> encryption and 1.0 GiBps decryption; from my past experiences with Java
> crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.
>
> Cheers,
> -Joey
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
> [2] https://github.com/corretto/amazon-corretto-crypto-provider
> [3] https://github.com/FiloSottile/age
> [4] https://github.com/hashbrowncipher/keypipe#encryption

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <sh...@intel.com>
wrote:

> To address Joey's concern, the OpenJDK JVM and its derivatives optimize
> Java crypto based on the underlying HW capabilities. For example, if the
> underlying HW supports AES-NI, JVM intrinsics will use those for crypto
> operations. Likewise, the new vector AES available on the latest Intel
> platform is utilized by the JVM while running on that platform to make
> crypto operations faster.
>

Which JDK version were you running? We have had a number of issues with the
JVM being 2-10x slower than native crypto on Java 8 (especially MD5, SHA1,
and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again I
think we could get the JVM crypto penalty down to ~2x native if we linked
in e.g. ACCP by default [1, 2] but even the very best Java crypto I've seen
(fully utilizing hardware instructions) is still ~2x slower than native
code. The operating system has a number of advantages here in that they
don't pay JVM allocation costs or the JNI barrier (in the case of ACCP) and
the kernel also takes advantage of hardware instructions.

> From our internal experiments, we see single digit % regression when
> transparent data encryption is enabled.
>

Which workloads are you testing and how are you measuring the regression? I
suspect that compaction, repair (validation compaction), streaming, and
quorum reads are probably much slower (probably ~10x slower for the
throughput bound operations and ~2x slower on the read path). As
compaction/repair/streaming usually take up between 10-20% of available CPU
cycles making them 2x slower might show up as <10% overall utilization
increase when you've really regressed 100% or more on key metrics
(compaction throughput, streaming throughput, memory allocation rate, etc
...). For example, if compaction was able to achieve 2 MiBps of throughput
before encryption and it was only able to achieve 1MiBps of throughput
afterwards, that would be a huge real world impact to operators as
compactions now take twice as long.

I think a CEP or details on the ticket that indicate the performance tests
and workloads that will be run might be wise? Perhaps something like
"encryption creates no more than a 1% regression of: compaction throughput
(MiBps), streaming throughput (MiBps), repair validation throughput
(duration of full repair on the entire cluster), read throughput at 10ms
p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO of
10ms), etc ... while a sustained load is applied to a multi-node cluster"?
Even a microbenchmark that just sees how long it takes to encrypt and
decrypt a 500MiB dataset using the proposed JVM implementation versus
encrypting it with a native implementation might be enough to confirm/deny.
For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6 GiBps
encryption and 1.0 GiBps decryption; from my past experiences with Java
crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.

Cheers,
-Joey

[1] https://issues.apache.org/jira/browse/CASSANDRA-15294
[2] https://github.com/corretto/amazon-corretto-crypto-provider
[3] https://github.com/FiloSottile/age
[4] https://github.com/hashbrowncipher/keypipe#encryption

RE: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "Kokoori, Shylaja" <sh...@intel.com>.

To address Joey's concern, the OpenJDK JVM and its derivatives optimize Java crypto based on the underlying HW capabilities. For example, if the underlying HW supports AES-NI, JVM intrinsics will use those for crypto operations. Likewise, the new vector AES available on the latest Intel platform is utilized by the JVM while running on that platform to make crypto operations faster.

From our internal experiments, we see single digit % regression when transparent data encryption is enabled.

-----Original Message-----
From: benedict@apache.org <be...@apache.org> 
Sent: Thursday, November 18, 2021 1:23 AM
To: dev@cassandra.apache.org
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption

I agree with Joey that most users may be better served by OS level encryption, but I also think this ticket can likely be delivered fairly easily. If we have a new contributor willing to produce a patch then the overhead for the project in shepherding it shouldn't be that onerous. If we also have known use cases in the community then on balance there's a good chance it will be a net positive investment for the project to enable users that desire in-database encryption. It might even spur further improvements to e.g. streaming performance.

I would scope the work to the minimum viable (but efficient) solution. So, in my view, that would mean encrypting per-sstable encryption keys with per-node master keys that can be rotated cheaply, requiring authentication to receive a stream containing both the unencrypted sstable encryption key and the encrypted sstable, and the receiving node encrypting the encryption key before serializing it to disk.

Since there are already compression hooks, this means only a little bit of special handling, and I _anticipate_ the patch should be quite modest for such a notable feature.

From: Ben Slater <be...@instaclustr.com>
Date: Thursday, 18 November 2021 at 09:07
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I wanted to provide a bit of background in the interest we've seen in this ticket/feature (at Instaclustr) - essentially it comes down to in-db encryption at rest being a feature that compliance people are used to seeing in databases and having a very hard time believing that operating system level encryption is an equivalent control (whatever the reality may be). I've seen this be a significant obstacle for people who want to adopt Apache Cassandra many times and an insurmountable obstacle on multiple occasions. From what I've seen, I think this is one of the most watched tickets with the most "is this coming soon" comments in the project backlog and it's something we pretty regularly get asked whether we know if/when it's coming.

That said, I completely agree that we don't want to be engaging in security theatre or " introducing something that is either insecure or too slow to be useful." and I think there are some really good suggestions in this thread to come up with a strong solution for what will undoubtedly be a pretty complex and major change.

Cheers
Ben

On Wed, 17 Nov 2021 at 03:34, Joseph Lynch <jo...@gmail.com> wrote:

> For FDE you'd probably have  the key file in a tmpfs pulled from a 
> remote secret manager and when the machine boots it mounts the 
> encrypted partition that contains your data files. I'm not aware of 
> anyone doing FDE with a password in production. If you wanted 
> selective encryption it would make sense to me to support placing 
> keyspaces on different data directories (this may already be possible) 
> but since crypto in the kernel is so cheap I don't know why you'd do 
> selective encryption. Also I think it's worth noting many hosting 
> providers (e.g. AWS) just encrypt the disks for you so you can check 
> the "data is encrypted at rest" box.
>
> I think Cassandra will be pretty handicapped by being in the JVM which 
> generally has very slow crypto. I'm slightly concerned that we're 
> already slow at streaming and compaction, and adding slow JVM crypto 
> will make C* even less competitive. For example, if we have to disable 
> full sstable streaming (zero copy or otherwise) I think that would be 
> very unfortunate (although Bowen's approach of sharing one secret 
> across the cluster and then having files use a key derivation function 
> may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to 
> try to offload to native crypto like how internode networking did with 
> tcnative to fix the perf issues with netty TLS with JVM crypto I'd 
> feel a little less concerned but ... crypto that is both secure and 
> performant in the JVM is a hard problem ...
>
> I guess I'm just concerned we're going to introduce something that is 
> either insecure or too slow to be useful.
>
> -Joey
>
> On Tue, Nov 16, 2021 at 8:10 AM Bowen Song <bo...@bso.ng.invalid> wrote:
> >
> > I don't like the idea that FDE Full Disk Encryption as an 
> > alternative to application managed encryption at rest. Each has 
> > their own advantages and disadvantages.
> >
> > For example, if the encryption key is the same across nodes in the 
> > same cluster, and Cassandra can share the key securely between 
> > authenticated nodes, rolling restart of the servers will be a lot 
> > simpler than if the servers were using FDE - someone will have to 
> > type in the passphrase on each reboot, or have a script to mount the 
> > encrypted device over SSH and then start Cassandra service after a reboot.
> >
> > Another valid use case of encryption implemented in Cassandra is 
> > selectively encrypt some tables, but leave others unencrypted. Doing 
> > this outside Cassandra on the filesystem level is very tedious and 
> > error-prone - a lots of symlinks and pretty hard to handle newly 
> > created tables or keyspaces.
> >
> > However, I don't know if there's enough demand to justify the above 
> > use cases.
> >
> >
> > On 16/11/2021 14:45, Joseph Lynch wrote:
> > > I think a CEP is wise (or a more thorough design document on the
> > > ticket) given how easy it is to do security incorrectly and key 
> > > management, rotation and key derivation are not particularly 
> > > straightforward.
> > >
> > > I am curious what advantage Cassandra implementing encryption has 
> > > over asking the user to use an encrypted filesystem or disks 
> > > instead where the kernel or device will undoubtedly be able to do 
> > > the crypto more efficiently than we can in the JVM and we wouldn't 
> > > have to further complicate the storage engine? I think the state 
> > > of encrypted filesystems (e.g. LUKS on Linux) is significantly 
> > > more user friendly these days than it was in 2015 when that ticket was created.
> > >
> > > If the application has existing exfiltration paths (e.g. backups) 
> > > it's probably better to encrypt/decrypt in the backup/restore 
> > > process via something extremely fast (and modern) like piping 
> > > through age [1] isn't it?
> > >
> > > [1] https://github.com/FiloSottile/age
> > >
> > > -Joey
> > >
> > >
> > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic 
> > > <st...@instaclustr.com> wrote:
> > >> Hi list,
> > >>
> > >> an engineer from Intel - Shylaja Kokoori (who is watching this 
> > >> list
> > >> closely) has retrofitted the original code from CASSANDRA-9633 
> > >> work in times of 3.4 to the current trunk with my help here and 
> > >> there, mostly cosmetic.
> > >>
> > >> I would like to know if there is a general consensus about me 
> > >> going to create a CEP for this feature or what is your perception 
> > >> on this. I know we have it a little bit backwards here as we 
> > >> should first discuss and then code but I am super glad that we 
> > >> have some POC we can elaborate further on and CEP would just 
> > >> cement  and summarise the approach / other implementation aspects of this feature.
> > >>
> > >> I think that having 9633 merged will fill quite a big operational 
> > >> gap when it comes to security. There are a lot of enterprises who 
> > >> desire this feature so much. I can not remember when I last saw a 
> > >> ticket with
> > >> 50 watchers which was inactive for such a long time.
> > >>
> > >> Regards
> > >>
> > >> -----------------------------------------------------------------
> > >> ---- To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > > ------------------------------------------------------------------
> > > --- To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by "benedict@apache.org" <be...@apache.org>.

I agree with Joey that most users may be better served by OS level encryption, but I also think this ticket can likely be delivered fairly easily. If we have a new contributor willing to produce a patch then the overhead for the project in shepherding it shouldn’t be that onerous. If we also have known use cases in the community then on balance there’s a good chance it will be a net positive investment for the project to enable users that desire in-database encryption. It might even spur further improvements to e.g. streaming performance.

I would scope the work to the minimum viable (but efficient) solution. So, in my view, that would mean encrypting per-sstable encryption keys with per-node master keys that can be rotated cheaply, requiring authentication to receive a stream containing both the unencrypted sstable encryption key and the encrypted sstable, and the receiving node encrypting the encryption key before serializing it to disk.

Since there are already compression hooks, this means only a little bit of special handling, and I _anticipate_ the patch should be quite modest for such a notable feature.


From: Ben Slater <be...@instaclustr.com>
Date: Thursday, 18 November 2021 at 09:07
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I wanted to provide a bit of background in the interest we've seen in this
ticket/feature (at Instaclustr) - essentially it comes down to in-db
encryption at rest being a feature that compliance people are used to
seeing in databases and having a very hard time believing that operating
system level encryption is an equivalent control (whatever the reality may
be). I've seen this be a significant obstacle for people who want to adopt
Apache Cassandra many times and an insurmountable obstacle on multiple
occasions. From what I've seen, I think this is one of the most watched
tickets with the most "is this coming soon" comments in the project backlog
and it's something we pretty regularly get asked whether we know if/when
it's coming.

That said, I completely agree that we don't want to be engaging in security
theatre or " introducing something that is either insecure or too slow to
be useful." and I think there are some really good suggestions in this
thread to come up with a strong solution for what will undoubtedly be a
pretty complex and major change.

Cheers
Ben




On Wed, 17 Nov 2021 at 03:34, Joseph Lynch <jo...@gmail.com> wrote:

> For FDE you'd probably have  the key file in a tmpfs pulled from a
> remote secret manager and when the machine boots it mounts the
> encrypted partition that contains your data files. I'm not aware of
> anyone doing FDE with a password in production. If you wanted
> selective encryption it would make sense to me to support placing
> keyspaces on different data directories (this may already be possible)
> but since crypto in the kernel is so cheap I don't know why you'd do
> selective encryption. Also I think it's worth noting many hosting
> providers (e.g. AWS) just encrypt the disks for you so you can check
> the "data is encrypted at rest" box.
>
> I think Cassandra will be pretty handicapped by being in the JVM which
> generally has very slow crypto. I'm slightly concerned that we're
> already slow at streaming and compaction, and adding slow JVM crypto
> will make C* even less competitive. For example, if we have to disable
> full sstable streaming (zero copy or otherwise) I think that would be
> very unfortunate (although Bowen's approach of sharing one secret
> across the cluster and then having files use a key derivation function
> may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to
> try to offload to native crypto like how internode networking did with
> tcnative to fix the perf issues with netty TLS with JVM crypto I'd
> feel a little less concerned but ... crypto that is both secure and
> performant in the JVM is a hard problem ...
>
> I guess I'm just concerned we're going to introduce something that is
> either insecure or too slow to be useful.
>
> -Joey
>
> On Tue, Nov 16, 2021 at 8:10 AM Bowen Song <bo...@bso.ng.invalid> wrote:
> >
> > I don't like the idea that FDE Full Disk Encryption as an alternative to
> > application managed encryption at rest. Each has their own advantages
> > and disadvantages.
> >
> > For example, if the encryption key is the same across nodes in the same
> > cluster, and Cassandra can share the key securely between authenticated
> > nodes, rolling restart of the servers will be a lot simpler than if the
> > servers were using FDE - someone will have to type in the passphrase on
> > each reboot, or have a script to mount the encrypted device over SSH and
> > then start Cassandra service after a reboot.
> >
> > Another valid use case of encryption implemented in Cassandra is
> > selectively encrypt some tables, but leave others unencrypted. Doing
> > this outside Cassandra on the filesystem level is very tedious and
> > error-prone - a lots of symlinks and pretty hard to handle newly created
> > tables or keyspaces.
> >
> > However, I don't know if there's enough demand to justify the above use
> > cases.
> >
> >
> > On 16/11/2021 14:45, Joseph Lynch wrote:
> > > I think a CEP is wise (or a more thorough design document on the
> > > ticket) given how easy it is to do security incorrectly and key
> > > management, rotation and key derivation are not particularly
> > > straightforward.
> > >
> > > I am curious what advantage Cassandra implementing encryption has over
> > > asking the user to use an encrypted filesystem or disks instead where
> > > the kernel or device will undoubtedly be able to do the crypto more
> > > efficiently than we can in the JVM and we wouldn't have to further
> > > complicate the storage engine? I think the state of encrypted
> > > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > > these days than it was in 2015 when that ticket was created.
> > >
> > > If the application has existing exfiltration paths (e.g. backups) it's
> > > probably better to encrypt/decrypt in the backup/restore process via
> > > something extremely fast (and modern) like piping through age [1]
> > > isn't it?
> > >
> > > [1] https://github.com/FiloSottile/age
> > >
> > > -Joey
> > >
> > >
> > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> > > <st...@instaclustr.com> wrote:
> > >> Hi list,
> > >>
> > >> an engineer from Intel - Shylaja Kokoori (who is watching this list
> > >> closely) has retrofitted the original code from CASSANDRA-9633 work in
> > >> times of 3.4 to the current trunk with my help here and there, mostly
> > >> cosmetic.
> > >>
> > >> I would like to know if there is a general consensus about me going to
> > >> create a CEP for this feature or what is your perception on this. I
> > >> know we have it a little bit backwards here as we should first discuss
> > >> and then code but I am super glad that we have some POC we can
> > >> elaborate further on and CEP would just cement  and summarise the
> > >> approach / other implementation aspects of this feature.
> > >>
> > >> I think that having 9633 merged will fill quite a big operational gap
> > >> when it comes to security. There are a lot of enterprises who desire
> > >> this feature so much. I can not remember when I last saw a ticket with
> > >> 50 watchers which was inactive for such a long time.
> > >>
> > >> Regards
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

>
> Yes, this needs to be done. The credentials for this stuff should be
> just fetched from wherever one wants. 100% agree with that and that
> maybe next iteration on top of that, should be rather easy. This was
> done in CEP-9 already for SSL context creation so we would just copy
> that approach here, more or less.
>
> I do not think you need to put the key in the yaml file. THE KEY? Why?
> Just a reference to it to read it from the beginning, no?
>
> What I do find quite ridiculous is to code up some tooling which would
> decrypt credentials in yaml. I hope we will avoid that approach here,
> that does not solve anything in my opinion.


+1 I think key management will be the main correctness challenge with this,
tooling will be the usability challenge, and the JVM will be the
performance challenge ...

-Joey

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Stefan Miklosovic <st...@instaclustr.com>.

On Fri, 19 Nov 2021 at 03:03, Joseph Lynch <jo...@gmail.com> wrote:
>
> >
> > I've seen this be a significant obstacle for people who want to adopt
> > Apache Cassandra many times and an insurmountable obstacle on multiple
> > occasions. From what I've seen, I think this is one of the most watched
> > tickets with the most "is this coming soon" comments in the project backlog
> > and it's something we pretty regularly get asked whether we know if/when
> > it's coming.
> >
>
> I agree encrypted data at rest is a very important feature, but in the six
> years since the ticket was originally proposed other systems kept getting
> better at a faster rate, especially easy to use full disk and filesystem
> encryption. LUKS+LVM in Linux is genuinely excellent and is relatively easy
> to setup today while that was _not_ true five years ago.
>
>
> > That said, I completely agree that we don't want to be engaging in security
> > theatre or " introducing something that is either insecure or too slow to
> > be useful." and I think there are some really good suggestions in this
> > thread to come up with a strong solution for what will undoubtedly be a
> > pretty complex and major change.
> >
>
> I think it's important to realize that for us to check the "data is
> encrypted at rest" box we have to do a lot more than what's currently been
> implemented. We have to design a pluggable key management system that
> either retrieves the keys from a remote system (e.g. KMS) or gives some way
> to load them directly into the process memory (virtual table? or maybe
> loads them from a tmpfs mounted directory?). We can't just put the key in
> the yaml file. This will also affect debuggability since we have to encrypt
> every file that is ever produced by Cassandra including logs (which contain
> primary keys) and heap dumps which are vital to debugging so we'll have to
> ship custom tools to decrypt those things so humans can actually read them
> to debug problems.

Yes, this needs to be done. The credentials for this stuff should be
just fetched from wherever one wants. 100% agree with that and that
maybe next iteration on top of that, should be rather easy. This was
done in CEP-9 already for SSL context creation so we would just copy
that approach here, more or less.

I do not think you need to put the key in the yaml file. THE KEY? Why?
Just a reference to it to read it from the beginning, no?

What I do find quite ridiculous is to code up some tooling which would
decrypt credentials in yaml. I hope we will avoid that approach here,
that does not solve anything in my opinion.

> If our primary goal is facilitating our users in being compliant with
> encryption at rest policies, I believe it is much easier to check that box
> by encrypting the entire disk or filesystem than building partial solutions
> into Cassandra.
>
> -Joey

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

>
> I've seen this be a significant obstacle for people who want to adopt
> Apache Cassandra many times and an insurmountable obstacle on multiple
> occasions. From what I've seen, I think this is one of the most watched
> tickets with the most "is this coming soon" comments in the project backlog
> and it's something we pretty regularly get asked whether we know if/when
> it's coming.
>

I agree encrypted data at rest is a very important feature, but in the six
years since the ticket was originally proposed other systems kept getting
better at a faster rate, especially easy to use full disk and filesystem
encryption. LUKS+LVM in Linux is genuinely excellent and is relatively easy
to setup today while that was _not_ true five years ago.


> That said, I completely agree that we don't want to be engaging in security
> theatre or " introducing something that is either insecure or too slow to
> be useful." and I think there are some really good suggestions in this
> thread to come up with a strong solution for what will undoubtedly be a
> pretty complex and major change.
>

I think it's important to realize that for us to check the "data is
encrypted at rest" box we have to do a lot more than what's currently been
implemented. We have to design a pluggable key management system that
either retrieves the keys from a remote system (e.g. KMS) or gives some way
to load them directly into the process memory (virtual table? or maybe
loads them from a tmpfs mounted directory?). We can't just put the key in
the yaml file. This will also affect debuggability since we have to encrypt
every file that is ever produced by Cassandra including logs (which contain
primary keys) and heap dumps which are vital to debugging so we'll have to
ship custom tools to decrypt those things so humans can actually read them
to debug problems.

If our primary goal is facilitating our users in being compliant with
encryption at rest policies, I believe it is much easier to check that box
by encrypting the entire disk or filesystem than building partial solutions
into Cassandra.

-Joey

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Ben Slater <be...@instaclustr.com>.

I wanted to provide a bit of background in the interest we've seen in this
ticket/feature (at Instaclustr) - essentially it comes down to in-db
encryption at rest being a feature that compliance people are used to
seeing in databases and having a very hard time believing that operating
system level encryption is an equivalent control (whatever the reality may
be). I've seen this be a significant obstacle for people who want to adopt
Apache Cassandra many times and an insurmountable obstacle on multiple
occasions. From what I've seen, I think this is one of the most watched
tickets with the most "is this coming soon" comments in the project backlog
and it's something we pretty regularly get asked whether we know if/when
it's coming.

That said, I completely agree that we don't want to be engaging in security
theatre or " introducing something that is either insecure or too slow to
be useful." and I think there are some really good suggestions in this
thread to come up with a strong solution for what will undoubtedly be a
pretty complex and major change.

Cheers
Ben




On Wed, 17 Nov 2021 at 03:34, Joseph Lynch <jo...@gmail.com> wrote:

> For FDE you'd probably have  the key file in a tmpfs pulled from a
> remote secret manager and when the machine boots it mounts the
> encrypted partition that contains your data files. I'm not aware of
> anyone doing FDE with a password in production. If you wanted
> selective encryption it would make sense to me to support placing
> keyspaces on different data directories (this may already be possible)
> but since crypto in the kernel is so cheap I don't know why you'd do
> selective encryption. Also I think it's worth noting many hosting
> providers (e.g. AWS) just encrypt the disks for you so you can check
> the "data is encrypted at rest" box.
>
> I think Cassandra will be pretty handicapped by being in the JVM which
> generally has very slow crypto. I'm slightly concerned that we're
> already slow at streaming and compaction, and adding slow JVM crypto
> will make C* even less competitive. For example, if we have to disable
> full sstable streaming (zero copy or otherwise) I think that would be
> very unfortunate (although Bowen's approach of sharing one secret
> across the cluster and then having files use a key derivation function
> may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to
> try to offload to native crypto like how internode networking did with
> tcnative to fix the perf issues with netty TLS with JVM crypto I'd
> feel a little less concerned but ... crypto that is both secure and
> performant in the JVM is a hard problem ...
>
> I guess I'm just concerned we're going to introduce something that is
> either insecure or too slow to be useful.
>
> -Joey
>
> On Tue, Nov 16, 2021 at 8:10 AM Bowen Song <bo...@bso.ng.invalid> wrote:
> >
> > I don't like the idea that FDE Full Disk Encryption as an alternative to
> > application managed encryption at rest. Each has their own advantages
> > and disadvantages.
> >
> > For example, if the encryption key is the same across nodes in the same
> > cluster, and Cassandra can share the key securely between authenticated
> > nodes, rolling restart of the servers will be a lot simpler than if the
> > servers were using FDE - someone will have to type in the passphrase on
> > each reboot, or have a script to mount the encrypted device over SSH and
> > then start Cassandra service after a reboot.
> >
> > Another valid use case of encryption implemented in Cassandra is
> > selectively encrypt some tables, but leave others unencrypted. Doing
> > this outside Cassandra on the filesystem level is very tedious and
> > error-prone - a lots of symlinks and pretty hard to handle newly created
> > tables or keyspaces.
> >
> > However, I don't know if there's enough demand to justify the above use
> > cases.
> >
> >
> > On 16/11/2021 14:45, Joseph Lynch wrote:
> > > I think a CEP is wise (or a more thorough design document on the
> > > ticket) given how easy it is to do security incorrectly and key
> > > management, rotation and key derivation are not particularly
> > > straightforward.
> > >
> > > I am curious what advantage Cassandra implementing encryption has over
> > > asking the user to use an encrypted filesystem or disks instead where
> > > the kernel or device will undoubtedly be able to do the crypto more
> > > efficiently than we can in the JVM and we wouldn't have to further
> > > complicate the storage engine? I think the state of encrypted
> > > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > > these days than it was in 2015 when that ticket was created.
> > >
> > > If the application has existing exfiltration paths (e.g. backups) it's
> > > probably better to encrypt/decrypt in the backup/restore process via
> > > something extremely fast (and modern) like piping through age [1]
> > > isn't it?
> > >
> > > [1] https://github.com/FiloSottile/age
> > >
> > > -Joey
> > >
> > >
> > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> > > <st...@instaclustr.com> wrote:
> > >> Hi list,
> > >>
> > >> an engineer from Intel - Shylaja Kokoori (who is watching this list
> > >> closely) has retrofitted the original code from CASSANDRA-9633 work in
> > >> times of 3.4 to the current trunk with my help here and there, mostly
> > >> cosmetic.
> > >>
> > >> I would like to know if there is a general consensus about me going to
> > >> create a CEP for this feature or what is your perception on this. I
> > >> know we have it a little bit backwards here as we should first discuss
> > >> and then code but I am super glad that we have some POC we can
> > >> elaborate further on and CEP would just cement  and summarise the
> > >> approach / other implementation aspects of this feature.
> > >>
> > >> I think that having 9633 merged will fill quite a big operational gap
> > >> when it comes to security. There are a lot of enterprises who desire
> > >> this feature so much. I can not remember when I last saw a ticket with
> > >> 50 watchers which was inactive for such a long time.
> > >>
> > >> Regards
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

For FDE you'd probably have  the key file in a tmpfs pulled from a
remote secret manager and when the machine boots it mounts the
encrypted partition that contains your data files. I'm not aware of
anyone doing FDE with a password in production. If you wanted
selective encryption it would make sense to me to support placing
keyspaces on different data directories (this may already be possible)
but since crypto in the kernel is so cheap I don't know why you'd do
selective encryption. Also I think it's worth noting many hosting
providers (e.g. AWS) just encrypt the disks for you so you can check
the "data is encrypted at rest" box.

I think Cassandra will be pretty handicapped by being in the JVM which
generally has very slow crypto. I'm slightly concerned that we're
already slow at streaming and compaction, and adding slow JVM crypto
will make C* even less competitive. For example, if we have to disable
full sstable streaming (zero copy or otherwise) I think that would be
very unfortunate (although Bowen's approach of sharing one secret
across the cluster and then having files use a key derivation function
may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to
try to offload to native crypto like how internode networking did with
tcnative to fix the perf issues with netty TLS with JVM crypto I'd
feel a little less concerned but ... crypto that is both secure and
performant in the JVM is a hard problem ...

I guess I'm just concerned we're going to introduce something that is
either insecure or too slow to be useful.

-Joey

On Tue, Nov 16, 2021 at 8:10 AM Bowen Song <bo...@bso.ng.invalid> wrote:
>
> I don't like the idea that FDE Full Disk Encryption as an alternative to
> application managed encryption at rest. Each has their own advantages
> and disadvantages.
>
> For example, if the encryption key is the same across nodes in the same
> cluster, and Cassandra can share the key securely between authenticated
> nodes, rolling restart of the servers will be a lot simpler than if the
> servers were using FDE - someone will have to type in the passphrase on
> each reboot, or have a script to mount the encrypted device over SSH and
> then start Cassandra service after a reboot.
>
> Another valid use case of encryption implemented in Cassandra is
> selectively encrypt some tables, but leave others unencrypted. Doing
> this outside Cassandra on the filesystem level is very tedious and
> error-prone - a lots of symlinks and pretty hard to handle newly created
> tables or keyspaces.
>
> However, I don't know if there's enough demand to justify the above use
> cases.
>
>
> On 16/11/2021 14:45, Joseph Lynch wrote:
> > I think a CEP is wise (or a more thorough design document on the
> > ticket) given how easy it is to do security incorrectly and key
> > management, rotation and key derivation are not particularly
> > straightforward.
> >
> > I am curious what advantage Cassandra implementing encryption has over
> > asking the user to use an encrypted filesystem or disks instead where
> > the kernel or device will undoubtedly be able to do the crypto more
> > efficiently than we can in the JVM and we wouldn't have to further
> > complicate the storage engine? I think the state of encrypted
> > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > these days than it was in 2015 when that ticket was created.
> >
> > If the application has existing exfiltration paths (e.g. backups) it's
> > probably better to encrypt/decrypt in the backup/restore process via
> > something extremely fast (and modern) like piping through age [1]
> > isn't it?
> >
> > [1] https://github.com/FiloSottile/age
> >
> > -Joey
> >
> >
> > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> > <st...@instaclustr.com> wrote:
> >> Hi list,
> >>
> >> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >> times of 3.4 to the current trunk with my help here and there, mostly
> >> cosmetic.
> >>
> >> I would like to know if there is a general consensus about me going to
> >> create a CEP for this feature or what is your perception on this. I
> >> know we have it a little bit backwards here as we should first discuss
> >> and then code but I am super glad that we have some POC we can
> >> elaborate further on and CEP would just cement  and summarise the
> >> approach / other implementation aspects of this feature.
> >>
> >> I think that having 9633 merged will fill quite a big operational gap
> >> when it comes to security. There are a lot of enterprises who desire
> >> this feature so much. I can not remember when I last saw a ticket with
> >> 50 watchers which was inactive for such a long time.
> >>
> >> Regards
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Bowen Song <bo...@bso.ng.INVALID>.

I don't like the idea that FDE Full Disk Encryption as an alternative to 
application managed encryption at rest. Each has their own advantages 
and disadvantages.

For example, if the encryption key is the same across nodes in the same 
cluster, and Cassandra can share the key securely between authenticated 
nodes, rolling restart of the servers will be a lot simpler than if the 
servers were using FDE - someone will have to type in the passphrase on 
each reboot, or have a script to mount the encrypted device over SSH and 
then start Cassandra service after a reboot.

Another valid use case of encryption implemented in Cassandra is 
selectively encrypt some tables, but leave others unencrypted. Doing 
this outside Cassandra on the filesystem level is very tedious and 
error-prone - a lots of symlinks and pretty hard to handle newly created 
tables or keyspaces.

However, I don't know if there's enough demand to justify the above use 
cases.


On 16/11/2021 14:45, Joseph Lynch wrote:
> I think a CEP is wise (or a more thorough design document on the
> ticket) given how easy it is to do security incorrectly and key
> management, rotation and key derivation are not particularly
> straightforward.
>
> I am curious what advantage Cassandra implementing encryption has over
> asking the user to use an encrypted filesystem or disks instead where
> the kernel or device will undoubtedly be able to do the crypto more
> efficiently than we can in the JVM and we wouldn't have to further
> complicate the storage engine? I think the state of encrypted
> filesystems (e.g. LUKS on Linux) is significantly more user friendly
> these days than it was in 2015 when that ticket was created.
>
> If the application has existing exfiltration paths (e.g. backups) it's
> probably better to encrypt/decrypt in the backup/restore process via
> something extremely fast (and modern) like piping through age [1]
> isn't it?
>
> [1] https://github.com/FiloSottile/age
>
> -Joey
>
>
> On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> <st...@instaclustr.com> wrote:
>> Hi list,
>>
>> an engineer from Intel - Shylaja Kokoori (who is watching this list
>> closely) has retrofitted the original code from CASSANDRA-9633 work in
>> times of 3.4 to the current trunk with my help here and there, mostly
>> cosmetic.
>>
>> I would like to know if there is a general consensus about me going to
>> create a CEP for this feature or what is your perception on this. I
>> know we have it a little bit backwards here as we should first discuss
>> and then code but I am super glad that we have some POC we can
>> elaborate further on and CEP would just cement  and summarise the
>> approach / other implementation aspects of this feature.
>>
>> I think that having 9633 merged will fill quite a big operational gap
>> when it comes to security. There are a lot of enterprises who desire
>> this feature so much. I can not remember when I last saw a ticket with
>> 50 watchers which was inactive for such a long time.
>>
>> Regards
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

Posted by Joseph Lynch <jo...@gmail.com>.

I think a CEP is wise (or a more thorough design document on the
ticket) given how easy it is to do security incorrectly and key
management, rotation and key derivation are not particularly
straightforward.

I am curious what advantage Cassandra implementing encryption has over
asking the user to use an encrypted filesystem or disks instead where
the kernel or device will undoubtedly be able to do the crypto more
efficiently than we can in the JVM and we wouldn't have to further
complicate the storage engine? I think the state of encrypted
filesystems (e.g. LUKS on Linux) is significantly more user friendly
these days than it was in 2015 when that ticket was created.

If the application has existing exfiltration paths (e.g. backups) it's
probably better to encrypt/decrypt in the backup/restore process via
something extremely fast (and modern) like piping through age [1]
isn't it?

[1] https://github.com/FiloSottile/age

-Joey

On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
<st...@instaclustr.com> wrote:
>
> Hi list,
>
> an engineer from Intel - Shylaja Kokoori (who is watching this list
> closely) has retrofitted the original code from CASSANDRA-9633 work in
> times of 3.4 to the current trunk with my help here and there, mostly
> cosmetic.
>
> I would like to know if there is a general consensus about me going to
> create a CEP for this feature or what is your perception on this. I
> know we have it a little bit backwards here as we should first discuss
> and then code but I am super glad that we have some POC we can
> elaborate further on and CEP would just cement  and summarise the
> approach / other implementation aspects of this feature.
>
> I think that having 9633 merged will fill quite a big operational gap
> when it comes to security. There are a lot of enterprises who desire
> this feature so much. I can not remember when I last saw a ticket with
> 50 watchers which was inactive for such a long time.
>
> Regards
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org