You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Pavel Pereslegin <xx...@gmail.com> on 2020/08/13 10:48:42 UTC

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hello all.

I'm working on TDE cache group key rotation [1] and I have a couple of
questions about partition re-encryption.

As described in the wiki [2], the process of re-encryption at the
moment consists of sequentially marking memory pages as dirty, this
process looks not resource-intensive.
Do you think it is necessary to do this in a multithreaded mode or
single thread is enough?
(We started testing re-encryption on dedicated servers (Xeon E5-2680
2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
single-threaded encryption loaded disk within 30%. At the same time,
the total re-encryption speed was around 60 MB/s, which allows one
node to re-encrypt 1 TB of data in about 5 hours, and it seems that
this performance is enough.)

The second question is about the approach to storing the re-encryption status.
At the moment, the re-encryption status includes two parameters - the
total number of pages in the partition at the time of the start of
re-encryption (int) and the index of the last re-encrypted page (int).
These 8 bytes are stored in the metapage on the checkpoint (which
ensures that if the checkpoint does not happen, we will continue the
process from the last page written to disk).
However, if multithread partition scanning does not make sense, then
it seems that it is possible to change the implementation and don't
change the metapage structure. Store only the "pointer" of the
partition (and the cache group) in the metastore and scan in strict
order.
The approach with storing the status in the metapage of the partition
seems to me more flexible, stable and has a number of advantages over
the "pointer" approach:
1. Since we saving the total number of pages at the re-encryption
startup - we will not scan extra pages that may be added to the
partition later.
2. We can move partitions between nodes and re-encryption should
continue from a certain point on the new node.
3. If a partition is (re)created during cache group re-encryption, it
will not be re-encrypted (since its re-encryption status will be reset
and all data is encrypted with the latest encryption key after
(re)creation.

Do you think single-threaded mode is enough?
Is it better to keep the re-encryption status in the metapage or store
the "pointer" in the metastore?

[1] https://issues.apache.org/jira/browse/IGNITE-12843
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption

пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
>
> Hello,
>
> I'll expand the answer a bit about calculating CRC, the problem is not
> that it is calculated twice, but that now for encrypted pages,
> EncryptedFileIO checks physical integrity, and FilePageStore checks
> the correctness of the encryption key, but from my point of view, it
> should be vice versa - the lower (delegated) FileIO should check the
> physical integrity and EncryptedFileIO should check the correctness of
> the encryption key.
>
> пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> >
> > Hello,
> >
> > > 10. Question - CRC is read in two places encryptionFileIO and
> > > filePageStore - what should we do with this?
> >
> > We need to calculate the CRC of encrypted data, because we may be
> > using the wrong encryption key to decrypt data, in which case we will
> > not understand if the physical integrity is violated or the wrong
> > encryption key is used.
> >
> > > 9. Question - How do we optimize when we can check that this page is
> > > already encrypted by parallel loading? Maybe we should do this in Phase 4?
> >
> > To do this, we need to store the encryption key ID in memory (at
> > least), but this is not easy to do right now without breaking binary
> > compatibility.
> >
> > > 7. Question -the current implementation does not use the throttling that
> > > is implemented in PDS. Users should set the throughput such as 5 MB per
> > > second, but not the timeout, packet size, or stream size.
> >
> > I've added a simple rate limiter for this.
> >
> > > 8. Question - why we add a lot of system properties?
> > >> Can you, please, list system properties that should be moved to the configuration?
> >
> > It's about the background re-encryption properties, for now, it is:
> > - re-encryption speed limit (in megabytes per second)
> > - threads count used for re-encryption
> > - count of pages in batch, processed under checkpoint lock
> > - flag to completely disable background re-encryption
> >
> > > 11. We should remember about complicated test scenarios with failover
> >
> > PR contains tests for re-encryption (and key rotation) on unstable
> > topology (with baseline change and without it). I'll expand them if I
> > missed some cases.
> >
> > > 13. Will re-encryption continue after the cluster is completely stopped?
> >
> > Yes, as I mentioned earlier, we save the re-encryption status in the
> > meta page of each re-encrypted partition and trigger re-encryption on
> > node startup if needed (more detailed description on the wiki).
> >
> > Thanks a lot for your comments, I am still working on PR and expanding
> > wiki documentation. I'll let you know when it will be ready for the
> > review.
> >
> > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <al...@gmail.com>:
> > >
> > > Hello Nikolay,
> > >
> > >
> > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > filePageStore - what should we do with this?
> > > >
> > > > filePageStore checks CRC of the encrypted page. This required to confirm
> > > > the page not corrupted on the disk.
> > > > encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the
> > > > encrypted data).
> > > > This required to be sure the decrypted page contains correct data and not
> > > > replaced with some malicious content.
> > > >
> > >
> > > I still do not see why we need CRC twice, can you please elaborate on this
> > > statement? If an attacker is able to replace the contents of an encrypted
> > > page, it means that they have access to the encryption key. What will
> > > prevent them from calculating the CRC of malicious content and then
> > > encrypting it?

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Andrey Gura <ag...@apache.org>.
continue of previous mail...

The same method rethrows an exception which will lead to failure of an
metrics exporter. The method should return some numeric value which
indicates the failure.

On Wed, Oct 28, 2020 at 3:09 PM Andrey Gura <ag...@apache.org> wrote:
>
> Hi there,
>
> I accidentally stumbled upon a potential performance problem in this commit.
>
> CacheGroupMetricImpls.getPagesLeftForReencryption method contains at
> least two problems:
>
>  - Relatively major: In order to calculate a value for one metric the
> method has O(N) complexity (N is number of partitions). It isn't good.
> Better approach is using some precalculated or estimated value during
> re-encryption process and just return this value.
>  - Major: For each partition in this method PageStore.exists() will be
> called. This invocation leads to N calls to the file system (may be
> cached, may be not, we can't just hope). So with a default affinity
> configuration this method will touch the file system 1024 times per
> one metrics value calculation. Just increase dramatism and multiply
> 1024 on the number of cache groups existing on a node.
>
> Finally, we have auxiliary functionality (metrics) which could affect
> the whole node (and potentially cluster) behavior.
>
> Please, fix this problem and be more careful in the future.
>
> On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin <xx...@gmail.com> wrote:
> >
> > Hello folks,
> >
> > thanks to everyone who joined the review, greatly appreciate your
> > helpful comments.
> >
> > If there is no objection, we will merge this patch [1] shortly.
> >
> > [1] https://github.com/apache/ignite/pull/7941
> >
> > пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev <ma...@gmail.com>:
> > >
> > > Hi,
> > >
> > > I'm going to do it.
> > >
> > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov <pl...@gmail.com>:
> > >
> > > > Hello guys,
> > > >
> > > > I've finished the review and approved the patch.
> > > > Anybody else would like to review it?
> > > >
> > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin <xx...@gmail.com>:
> > > >
> > > > > Hello, Maksim!
> > > > >
> > > > > I am currently working on a review notes from Alexey Plekhanov, will
> > > > > let you know when I finish.
> > > > >
> > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > > > maksim.stepachev@gmail.com
> > > > > >:
> > > > > >
> > > > > > Hi, Pavel.
> > > > > >
> > > > > > As I see, the ticket [
> > > > https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > ]
> > > > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > > > >
> > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:
> > > > > >
> > > > > > > Hello all.
> > > > > > >
> > > > > > > I'm working on TDE cache group key rotation [1] and I have a couple
> > > > of
> > > > > > > questions about partition re-encryption.
> > > > > > >
> > > > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > > > moment consists of sequentially marking memory pages as dirty, this
> > > > > > > process looks not resource-intensive.
> > > > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > > > single thread is enough?
> > > > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > > > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > > > > this performance is enough.)
> > > > > > >
> > > > > > > The second question is about the approach to storing the
> > > > re-encryption
> > > > > > > status.
> > > > > > > At the moment, the re-encryption status includes two parameters - the
> > > > > > > total number of pages in the partition at the time of the start of
> > > > > > > re-encryption (int) and the index of the last re-encrypted page
> > > > (int).
> > > > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > > > ensures that if the checkpoint does not happen, we will continue the
> > > > > > > process from the last page written to disk).
> > > > > > > However, if multithread partition scanning does not make sense, then
> > > > > > > it seems that it is possible to change the implementation and don't
> > > > > > > change the metapage structure. Store only the "pointer" of the
> > > > > > > partition (and the cache group) in the metastore and scan in strict
> > > > > > > order.
> > > > > > > The approach with storing the status in the metapage of the partition
> > > > > > > seems to me more flexible, stable and has a number of advantages over
> > > > > > > the "pointer" approach:
> > > > > > > 1. Since we saving the total number of pages at the re-encryption
> > > > > > > startup - we will not scan extra pages that may be added to the
> > > > > > > partition later.
> > > > > > > 2. We can move partitions between nodes and re-encryption should
> > > > > > > continue from a certain point on the new node.
> > > > > > > 3. If a partition is (re)created during cache group re-encryption, it
> > > > > > > will not be re-encrypted (since its re-encryption status will be
> > > > reset
> > > > > > > and all data is encrypted with the latest encryption key after
> > > > > > > (re)creation.
> > > > > > >
> > > > > > > Do you think single-threaded mode is enough?
> > > > > > > Is it better to keep the re-encryption status in the metapage or
> > > > store
> > > > > > > the "pointer" in the metastore?
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > > > [2]
> > > > > > >
> > > > >
> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > > > > > >
> > > > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I'll expand the answer a bit about calculating CRC, the problem is
> > > > > not
> > > > > > > > that it is calculated twice, but that now for encrypted pages,
> > > > > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > > > > > the correctness of the encryption key, but from my point of view,
> > > > it
> > > > > > > > should be vice versa - the lower (delegated) FileIO should check
> > > > the
> > > > > > > > physical integrity and EncryptedFileIO should check the correctness
> > > > > of
> > > > > > > > the encryption key.
> > > > > > > >
> > > > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > > > > filePageStore - what should we do with this?
> > > > > > > > >
> > > > > > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > > > > > using the wrong encryption key to decrypt data, in which case we
> > > > > will
> > > > > > > > > not understand if the physical integrity is violated or the wrong
> > > > > > > > > encryption key is used.
> > > > > > > > >
> > > > > > > > > > 9. Question - How do we optimize when we can check that this
> > > > > page is
> > > > > > > > > > already encrypted by parallel loading? Maybe we should do this
> > > > in
> > > > > > > Phase 4?
> > > > > > > > >
> > > > > > > > > To do this, we need to store the encryption key ID in memory (at
> > > > > > > > > least), but this is not easy to do right now without breaking
> > > > > binary
> > > > > > > > > compatibility.
> > > > > > > > >
> > > > > > > > > > 7. Question -the current implementation does not use the
> > > > > throttling
> > > > > > > that
> > > > > > > > > > is implemented in PDS. Users should set the throughput such as
> > > > 5
> > > > > MB
> > > > > > > per
> > > > > > > > > > second, but not the timeout, packet size, or stream size.
> > > > > > > > >
> > > > > > > > > I've added a simple rate limiter for this.
> > > > > > > > >
> > > > > > > > > > 8. Question - why we add a lot of system properties?
> > > > > > > > > >> Can you, please, list system properties that should be moved
> > > > to
> > > > > the
> > > > > > > configuration?
> > > > > > > > >
> > > > > > > > > It's about the background re-encryption properties, for now, it
> > > > is:
> > > > > > > > > - re-encryption speed limit (in megabytes per second)
> > > > > > > > > - threads count used for re-encryption
> > > > > > > > > - count of pages in batch, processed under checkpoint lock
> > > > > > > > > - flag to completely disable background re-encryption
> > > > > > > > >
> > > > > > > > > > 11. We should remember about complicated test scenarios with
> > > > > failover
> > > > > > > > >
> > > > > > > > > PR contains tests for re-encryption (and key rotation) on
> > > > unstable
> > > > > > > > > topology (with baseline change and without it). I'll expand them
> > > > > if I
> > > > > > > > > missed some cases.
> > > > > > > > >
> > > > > > > > > > 13. Will re-encryption continue after the cluster is completely
> > > > > > > stopped?
> > > > > > > > >
> > > > > > > > > Yes, as I mentioned earlier, we save the re-encryption status in
> > > > > the
> > > > > > > > > meta page of each re-encrypted partition and trigger
> > > > re-encryption
> > > > > on
> > > > > > > > > node startup if needed (more detailed description on the wiki).
> > > > > > > > >
> > > > > > > > > Thanks a lot for your comments, I am still working on PR and
> > > > > expanding
> > > > > > > > > wiki documentation. I'll let you know when it will be ready for
> > > > the
> > > > > > > > > review.
> > > > > > > > >
> > > > > > > > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> > > > > > > alexey.goncharuk@gmail.com>:
> > > > > > > > > >
> > > > > > > > > > Hello Nikolay,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO
> > > > and
> > > > > > > > > > > filePageStore - what should we do with this?
> > > > > > > > > > >
> > > > > > > > > > > filePageStore checks CRC of the encrypted page. This required
> > > > > to
> > > > > > > confirm
> > > > > > > > > > > the page not corrupted on the disk.
> > > > > > > > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> > > > > > > stored in the
> > > > > > > > > > > encrypted data).
> > > > > > > > > > > This required to be sure the decrypted page contains correct
> > > > > data
> > > > > > > and not
> > > > > > > > > > > replaced with some malicious content.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I still do not see why we need CRC twice, can you please
> > > > > elaborate
> > > > > > > on this
> > > > > > > > > > statement? If an attacker is able to replace the contents of an
> > > > > > > encrypted
> > > > > > > > > > page, it means that they have access to the encryption key.
> > > > What
> > > > > will
> > > > > > > > > > prevent them from calculating the CRC of malicious content and
> > > > > then
> > > > > > > > > > encrypting it?
> > > > > > >
> > > > >
> > > >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Pavel Pereslegin <xx...@gmail.com>.
Andrey,
thanks for your comment.

I will fix this problem shortly.

ср, 28 окт. 2020 г. в 15:10, Andrey Gura <ag...@apache.org>:
>
> Hi there,
>
> I accidentally stumbled upon a potential performance problem in this commit.
>
> CacheGroupMetricImpls.getPagesLeftForReencryption method contains at
> least two problems:
>
>  - Relatively major: In order to calculate a value for one metric the
> method has O(N) complexity (N is number of partitions). It isn't good.
> Better approach is using some precalculated or estimated value during
> re-encryption process and just return this value.
>  - Major: For each partition in this method PageStore.exists() will be
> called. This invocation leads to N calls to the file system (may be
> cached, may be not, we can't just hope). So with a default affinity
> configuration this method will touch the file system 1024 times per
> one metrics value calculation. Just increase dramatism and multiply
> 1024 on the number of cache groups existing on a node.
>
> Finally, we have auxiliary functionality (metrics) which could affect
> the whole node (and potentially cluster) behavior.
>
> Please, fix this problem and be more careful in the future.
>
> On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin <xx...@gmail.com> wrote:
> >
> > Hello folks,
> >
> > thanks to everyone who joined the review, greatly appreciate your
> > helpful comments.
> >
> > If there is no objection, we will merge this patch [1] shortly.
> >
> > [1] https://github.com/apache/ignite/pull/7941
> >
> > пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev <ma...@gmail.com>:
> > >
> > > Hi,
> > >
> > > I'm going to do it.
> > >
> > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov <pl...@gmail.com>:
> > >
> > > > Hello guys,
> > > >
> > > > I've finished the review and approved the patch.
> > > > Anybody else would like to review it?
> > > >
> > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin <xx...@gmail.com>:
> > > >
> > > > > Hello, Maksim!
> > > > >
> > > > > I am currently working on a review notes from Alexey Plekhanov, will
> > > > > let you know when I finish.
> > > > >
> > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > > > maksim.stepachev@gmail.com
> > > > > >:
> > > > > >
> > > > > > Hi, Pavel.
> > > > > >
> > > > > > As I see, the ticket [
> > > > https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > ]
> > > > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > > > >
> > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:
> > > > > >
> > > > > > > Hello all.
> > > > > > >
> > > > > > > I'm working on TDE cache group key rotation [1] and I have a couple
> > > > of
> > > > > > > questions about partition re-encryption.
> > > > > > >
> > > > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > > > moment consists of sequentially marking memory pages as dirty, this
> > > > > > > process looks not resource-intensive.
> > > > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > > > single thread is enough?
> > > > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > > > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > > > > this performance is enough.)
> > > > > > >
> > > > > > > The second question is about the approach to storing the
> > > > re-encryption
> > > > > > > status.
> > > > > > > At the moment, the re-encryption status includes two parameters - the
> > > > > > > total number of pages in the partition at the time of the start of
> > > > > > > re-encryption (int) and the index of the last re-encrypted page
> > > > (int).
> > > > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > > > ensures that if the checkpoint does not happen, we will continue the
> > > > > > > process from the last page written to disk).
> > > > > > > However, if multithread partition scanning does not make sense, then
> > > > > > > it seems that it is possible to change the implementation and don't
> > > > > > > change the metapage structure. Store only the "pointer" of the
> > > > > > > partition (and the cache group) in the metastore and scan in strict
> > > > > > > order.
> > > > > > > The approach with storing the status in the metapage of the partition
> > > > > > > seems to me more flexible, stable and has a number of advantages over
> > > > > > > the "pointer" approach:
> > > > > > > 1. Since we saving the total number of pages at the re-encryption
> > > > > > > startup - we will not scan extra pages that may be added to the
> > > > > > > partition later.
> > > > > > > 2. We can move partitions between nodes and re-encryption should
> > > > > > > continue from a certain point on the new node.
> > > > > > > 3. If a partition is (re)created during cache group re-encryption, it
> > > > > > > will not be re-encrypted (since its re-encryption status will be
> > > > reset
> > > > > > > and all data is encrypted with the latest encryption key after
> > > > > > > (re)creation.
> > > > > > >
> > > > > > > Do you think single-threaded mode is enough?
> > > > > > > Is it better to keep the re-encryption status in the metapage or
> > > > store
> > > > > > > the "pointer" in the metastore?
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > > > [2]
> > > > > > >
> > > > >
> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > > > > > >
> > > > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I'll expand the answer a bit about calculating CRC, the problem is
> > > > > not
> > > > > > > > that it is calculated twice, but that now for encrypted pages,
> > > > > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > > > > > the correctness of the encryption key, but from my point of view,
> > > > it
> > > > > > > > should be vice versa - the lower (delegated) FileIO should check
> > > > the
> > > > > > > > physical integrity and EncryptedFileIO should check the correctness
> > > > > of
> > > > > > > > the encryption key.
> > > > > > > >
> > > > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > > > > filePageStore - what should we do with this?
> > > > > > > > >
> > > > > > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > > > > > using the wrong encryption key to decrypt data, in which case we
> > > > > will
> > > > > > > > > not understand if the physical integrity is violated or the wrong
> > > > > > > > > encryption key is used.
> > > > > > > > >
> > > > > > > > > > 9. Question - How do we optimize when we can check that this
> > > > > page is
> > > > > > > > > > already encrypted by parallel loading? Maybe we should do this
> > > > in
> > > > > > > Phase 4?
> > > > > > > > >
> > > > > > > > > To do this, we need to store the encryption key ID in memory (at
> > > > > > > > > least), but this is not easy to do right now without breaking
> > > > > binary
> > > > > > > > > compatibility.
> > > > > > > > >
> > > > > > > > > > 7. Question -the current implementation does not use the
> > > > > throttling
> > > > > > > that
> > > > > > > > > > is implemented in PDS. Users should set the throughput such as
> > > > 5
> > > > > MB
> > > > > > > per
> > > > > > > > > > second, but not the timeout, packet size, or stream size.
> > > > > > > > >
> > > > > > > > > I've added a simple rate limiter for this.
> > > > > > > > >
> > > > > > > > > > 8. Question - why we add a lot of system properties?
> > > > > > > > > >> Can you, please, list system properties that should be moved
> > > > to
> > > > > the
> > > > > > > configuration?
> > > > > > > > >
> > > > > > > > > It's about the background re-encryption properties, for now, it
> > > > is:
> > > > > > > > > - re-encryption speed limit (in megabytes per second)
> > > > > > > > > - threads count used for re-encryption
> > > > > > > > > - count of pages in batch, processed under checkpoint lock
> > > > > > > > > - flag to completely disable background re-encryption
> > > > > > > > >
> > > > > > > > > > 11. We should remember about complicated test scenarios with
> > > > > failover
> > > > > > > > >
> > > > > > > > > PR contains tests for re-encryption (and key rotation) on
> > > > unstable
> > > > > > > > > topology (with baseline change and without it). I'll expand them
> > > > > if I
> > > > > > > > > missed some cases.
> > > > > > > > >
> > > > > > > > > > 13. Will re-encryption continue after the cluster is completely
> > > > > > > stopped?
> > > > > > > > >
> > > > > > > > > Yes, as I mentioned earlier, we save the re-encryption status in
> > > > > the
> > > > > > > > > meta page of each re-encrypted partition and trigger
> > > > re-encryption
> > > > > on
> > > > > > > > > node startup if needed (more detailed description on the wiki).
> > > > > > > > >
> > > > > > > > > Thanks a lot for your comments, I am still working on PR and
> > > > > expanding
> > > > > > > > > wiki documentation. I'll let you know when it will be ready for
> > > > the
> > > > > > > > > review.
> > > > > > > > >
> > > > > > > > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> > > > > > > alexey.goncharuk@gmail.com>:
> > > > > > > > > >
> > > > > > > > > > Hello Nikolay,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO
> > > > and
> > > > > > > > > > > filePageStore - what should we do with this?
> > > > > > > > > > >
> > > > > > > > > > > filePageStore checks CRC of the encrypted page. This required
> > > > > to
> > > > > > > confirm
> > > > > > > > > > > the page not corrupted on the disk.
> > > > > > > > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> > > > > > > stored in the
> > > > > > > > > > > encrypted data).
> > > > > > > > > > > This required to be sure the decrypted page contains correct
> > > > > data
> > > > > > > and not
> > > > > > > > > > > replaced with some malicious content.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I still do not see why we need CRC twice, can you please
> > > > > elaborate
> > > > > > > on this
> > > > > > > > > > statement? If an attacker is able to replace the contents of an
> > > > > > > encrypted
> > > > > > > > > > page, it means that they have access to the encryption key.
> > > > What
> > > > > will
> > > > > > > > > > prevent them from calculating the CRC of malicious content and
> > > > > then
> > > > > > > > > > encrypting it?
> > > > > > >
> > > > >
> > > >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Andrey Gura <ag...@apache.org>.
Hi there,

I accidentally stumbled upon a potential performance problem in this commit.

CacheGroupMetricImpls.getPagesLeftForReencryption method contains at
least two problems:

 - Relatively major: In order to calculate a value for one metric the
method has O(N) complexity (N is number of partitions). It isn't good.
Better approach is using some precalculated or estimated value during
re-encryption process and just return this value.
 - Major: For each partition in this method PageStore.exists() will be
called. This invocation leads to N calls to the file system (may be
cached, may be not, we can't just hope). So with a default affinity
configuration this method will touch the file system 1024 times per
one metrics value calculation. Just increase dramatism and multiply
1024 on the number of cache groups existing on a node.

Finally, we have auxiliary functionality (metrics) which could affect
the whole node (and potentially cluster) behavior.

Please, fix this problem and be more careful in the future.

On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin <xx...@gmail.com> wrote:
>
> Hello folks,
>
> thanks to everyone who joined the review, greatly appreciate your
> helpful comments.
>
> If there is no objection, we will merge this patch [1] shortly.
>
> [1] https://github.com/apache/ignite/pull/7941
>
> пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev <ma...@gmail.com>:
> >
> > Hi,
> >
> > I'm going to do it.
> >
> > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov <pl...@gmail.com>:
> >
> > > Hello guys,
> > >
> > > I've finished the review and approved the patch.
> > > Anybody else would like to review it?
> > >
> > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin <xx...@gmail.com>:
> > >
> > > > Hello, Maksim!
> > > >
> > > > I am currently working on a review notes from Alexey Plekhanov, will
> > > > let you know when I finish.
> > > >
> > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > > maksim.stepachev@gmail.com
> > > > >:
> > > > >
> > > > > Hi, Pavel.
> > > > >
> > > > > As I see, the ticket [
> > > https://issues.apache.org/jira/browse/IGNITE-12843
> > > > ]
> > > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > > >
> > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:
> > > > >
> > > > > > Hello all.
> > > > > >
> > > > > > I'm working on TDE cache group key rotation [1] and I have a couple
> > > of
> > > > > > questions about partition re-encryption.
> > > > > >
> > > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > > moment consists of sequentially marking memory pages as dirty, this
> > > > > > process looks not resource-intensive.
> > > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > > single thread is enough?
> > > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > > > this performance is enough.)
> > > > > >
> > > > > > The second question is about the approach to storing the
> > > re-encryption
> > > > > > status.
> > > > > > At the moment, the re-encryption status includes two parameters - the
> > > > > > total number of pages in the partition at the time of the start of
> > > > > > re-encryption (int) and the index of the last re-encrypted page
> > > (int).
> > > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > > ensures that if the checkpoint does not happen, we will continue the
> > > > > > process from the last page written to disk).
> > > > > > However, if multithread partition scanning does not make sense, then
> > > > > > it seems that it is possible to change the implementation and don't
> > > > > > change the metapage structure. Store only the "pointer" of the
> > > > > > partition (and the cache group) in the metastore and scan in strict
> > > > > > order.
> > > > > > The approach with storing the status in the metapage of the partition
> > > > > > seems to me more flexible, stable and has a number of advantages over
> > > > > > the "pointer" approach:
> > > > > > 1. Since we saving the total number of pages at the re-encryption
> > > > > > startup - we will not scan extra pages that may be added to the
> > > > > > partition later.
> > > > > > 2. We can move partitions between nodes and re-encryption should
> > > > > > continue from a certain point on the new node.
> > > > > > 3. If a partition is (re)created during cache group re-encryption, it
> > > > > > will not be re-encrypted (since its re-encryption status will be
> > > reset
> > > > > > and all data is encrypted with the latest encryption key after
> > > > > > (re)creation.
> > > > > >
> > > > > > Do you think single-threaded mode is enough?
> > > > > > Is it better to keep the re-encryption status in the metapage or
> > > store
> > > > > > the "pointer" in the metastore?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > > [2]
> > > > > >
> > > >
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > > > > >
> > > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I'll expand the answer a bit about calculating CRC, the problem is
> > > > not
> > > > > > > that it is calculated twice, but that now for encrypted pages,
> > > > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > > > > the correctness of the encryption key, but from my point of view,
> > > it
> > > > > > > should be vice versa - the lower (delegated) FileIO should check
> > > the
> > > > > > > physical integrity and EncryptedFileIO should check the correctness
> > > > of
> > > > > > > the encryption key.
> > > > > > >
> > > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > > > filePageStore - what should we do with this?
> > > > > > > >
> > > > > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > > > > using the wrong encryption key to decrypt data, in which case we
> > > > will
> > > > > > > > not understand if the physical integrity is violated or the wrong
> > > > > > > > encryption key is used.
> > > > > > > >
> > > > > > > > > 9. Question - How do we optimize when we can check that this
> > > > page is
> > > > > > > > > already encrypted by parallel loading? Maybe we should do this
> > > in
> > > > > > Phase 4?
> > > > > > > >
> > > > > > > > To do this, we need to store the encryption key ID in memory (at
> > > > > > > > least), but this is not easy to do right now without breaking
> > > > binary
> > > > > > > > compatibility.
> > > > > > > >
> > > > > > > > > 7. Question -the current implementation does not use the
> > > > throttling
> > > > > > that
> > > > > > > > > is implemented in PDS. Users should set the throughput such as
> > > 5
> > > > MB
> > > > > > per
> > > > > > > > > second, but not the timeout, packet size, or stream size.
> > > > > > > >
> > > > > > > > I've added a simple rate limiter for this.
> > > > > > > >
> > > > > > > > > 8. Question - why we add a lot of system properties?
> > > > > > > > >> Can you, please, list system properties that should be moved
> > > to
> > > > the
> > > > > > configuration?
> > > > > > > >
> > > > > > > > It's about the background re-encryption properties, for now, it
> > > is:
> > > > > > > > - re-encryption speed limit (in megabytes per second)
> > > > > > > > - threads count used for re-encryption
> > > > > > > > - count of pages in batch, processed under checkpoint lock
> > > > > > > > - flag to completely disable background re-encryption
> > > > > > > >
> > > > > > > > > 11. We should remember about complicated test scenarios with
> > > > failover
> > > > > > > >
> > > > > > > > PR contains tests for re-encryption (and key rotation) on
> > > unstable
> > > > > > > > topology (with baseline change and without it). I'll expand them
> > > > if I
> > > > > > > > missed some cases.
> > > > > > > >
> > > > > > > > > 13. Will re-encryption continue after the cluster is completely
> > > > > > stopped?
> > > > > > > >
> > > > > > > > Yes, as I mentioned earlier, we save the re-encryption status in
> > > > the
> > > > > > > > meta page of each re-encrypted partition and trigger
> > > re-encryption
> > > > on
> > > > > > > > node startup if needed (more detailed description on the wiki).
> > > > > > > >
> > > > > > > > Thanks a lot for your comments, I am still working on PR and
> > > > expanding
> > > > > > > > wiki documentation. I'll let you know when it will be ready for
> > > the
> > > > > > > > review.
> > > > > > > >
> > > > > > > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> > > > > > alexey.goncharuk@gmail.com>:
> > > > > > > > >
> > > > > > > > > Hello Nikolay,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO
> > > and
> > > > > > > > > > filePageStore - what should we do with this?
> > > > > > > > > >
> > > > > > > > > > filePageStore checks CRC of the encrypted page. This required
> > > > to
> > > > > > confirm
> > > > > > > > > > the page not corrupted on the disk.
> > > > > > > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> > > > > > stored in the
> > > > > > > > > > encrypted data).
> > > > > > > > > > This required to be sure the decrypted page contains correct
> > > > data
> > > > > > and not
> > > > > > > > > > replaced with some malicious content.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I still do not see why we need CRC twice, can you please
> > > > elaborate
> > > > > > on this
> > > > > > > > > statement? If an attacker is able to replace the contents of an
> > > > > > encrypted
> > > > > > > > > page, it means that they have access to the encryption key.
> > > What
> > > > will
> > > > > > > > > prevent them from calculating the CRC of malicious content and
> > > > then
> > > > > > > > > encrypting it?
> > > > > >
> > > >
> > >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Pavel Pereslegin <xx...@gmail.com>.
Hello folks,

thanks to everyone who joined the review, greatly appreciate your
helpful comments.

If there is no objection, we will merge this patch [1] shortly.

[1] https://github.com/apache/ignite/pull/7941

пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev <ma...@gmail.com>:
>
> Hi,
>
> I'm going to do it.
>
> сб, 3 окт. 2020 г. в 21:47, Alex Plehanov <pl...@gmail.com>:
>
> > Hello guys,
> >
> > I've finished the review and approved the patch.
> > Anybody else would like to review it?
> >
> > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin <xx...@gmail.com>:
> >
> > > Hello, Maksim!
> > >
> > > I am currently working on a review notes from Alexey Plekhanov, will
> > > let you know when I finish.
> > >
> > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> > maksim.stepachev@gmail.com
> > > >:
> > > >
> > > > Hi, Pavel.
> > > >
> > > > As I see, the ticket [
> > https://issues.apache.org/jira/browse/IGNITE-12843
> > > ]
> > > > is "PATCH AVAILABLE". Is this ticket finished?
> > > >
> > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:
> > > >
> > > > > Hello all.
> > > > >
> > > > > I'm working on TDE cache group key rotation [1] and I have a couple
> > of
> > > > > questions about partition re-encryption.
> > > > >
> > > > > As described in the wiki [2], the process of re-encryption at the
> > > > > moment consists of sequentially marking memory pages as dirty, this
> > > > > process looks not resource-intensive.
> > > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > > single thread is enough?
> > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > > this performance is enough.)
> > > > >
> > > > > The second question is about the approach to storing the
> > re-encryption
> > > > > status.
> > > > > At the moment, the re-encryption status includes two parameters - the
> > > > > total number of pages in the partition at the time of the start of
> > > > > re-encryption (int) and the index of the last re-encrypted page
> > (int).
> > > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > > ensures that if the checkpoint does not happen, we will continue the
> > > > > process from the last page written to disk).
> > > > > However, if multithread partition scanning does not make sense, then
> > > > > it seems that it is possible to change the implementation and don't
> > > > > change the metapage structure. Store only the "pointer" of the
> > > > > partition (and the cache group) in the metastore and scan in strict
> > > > > order.
> > > > > The approach with storing the status in the metapage of the partition
> > > > > seems to me more flexible, stable and has a number of advantages over
> > > > > the "pointer" approach:
> > > > > 1. Since we saving the total number of pages at the re-encryption
> > > > > startup - we will not scan extra pages that may be added to the
> > > > > partition later.
> > > > > 2. We can move partitions between nodes and re-encryption should
> > > > > continue from a certain point on the new node.
> > > > > 3. If a partition is (re)created during cache group re-encryption, it
> > > > > will not be re-encrypted (since its re-encryption status will be
> > reset
> > > > > and all data is encrypted with the latest encryption key after
> > > > > (re)creation.
> > > > >
> > > > > Do you think single-threaded mode is enough?
> > > > > Is it better to keep the re-encryption status in the metapage or
> > store
> > > > > the "pointer" in the metastore?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > > > [2]
> > > > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > > > >
> > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'll expand the answer a bit about calculating CRC, the problem is
> > > not
> > > > > > that it is calculated twice, but that now for encrypted pages,
> > > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > > > the correctness of the encryption key, but from my point of view,
> > it
> > > > > > should be vice versa - the lower (delegated) FileIO should check
> > the
> > > > > > physical integrity and EncryptedFileIO should check the correctness
> > > of
> > > > > > the encryption key.
> > > > > >
> > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > > filePageStore - what should we do with this?
> > > > > > >
> > > > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > > > using the wrong encryption key to decrypt data, in which case we
> > > will
> > > > > > > not understand if the physical integrity is violated or the wrong
> > > > > > > encryption key is used.
> > > > > > >
> > > > > > > > 9. Question - How do we optimize when we can check that this
> > > page is
> > > > > > > > already encrypted by parallel loading? Maybe we should do this
> > in
> > > > > Phase 4?
> > > > > > >
> > > > > > > To do this, we need to store the encryption key ID in memory (at
> > > > > > > least), but this is not easy to do right now without breaking
> > > binary
> > > > > > > compatibility.
> > > > > > >
> > > > > > > > 7. Question -the current implementation does not use the
> > > throttling
> > > > > that
> > > > > > > > is implemented in PDS. Users should set the throughput such as
> > 5
> > > MB
> > > > > per
> > > > > > > > second, but not the timeout, packet size, or stream size.
> > > > > > >
> > > > > > > I've added a simple rate limiter for this.
> > > > > > >
> > > > > > > > 8. Question - why we add a lot of system properties?
> > > > > > > >> Can you, please, list system properties that should be moved
> > to
> > > the
> > > > > configuration?
> > > > > > >
> > > > > > > It's about the background re-encryption properties, for now, it
> > is:
> > > > > > > - re-encryption speed limit (in megabytes per second)
> > > > > > > - threads count used for re-encryption
> > > > > > > - count of pages in batch, processed under checkpoint lock
> > > > > > > - flag to completely disable background re-encryption
> > > > > > >
> > > > > > > > 11. We should remember about complicated test scenarios with
> > > failover
> > > > > > >
> > > > > > > PR contains tests for re-encryption (and key rotation) on
> > unstable
> > > > > > > topology (with baseline change and without it). I'll expand them
> > > if I
> > > > > > > missed some cases.
> > > > > > >
> > > > > > > > 13. Will re-encryption continue after the cluster is completely
> > > > > stopped?
> > > > > > >
> > > > > > > Yes, as I mentioned earlier, we save the re-encryption status in
> > > the
> > > > > > > meta page of each re-encrypted partition and trigger
> > re-encryption
> > > on
> > > > > > > node startup if needed (more detailed description on the wiki).
> > > > > > >
> > > > > > > Thanks a lot for your comments, I am still working on PR and
> > > expanding
> > > > > > > wiki documentation. I'll let you know when it will be ready for
> > the
> > > > > > > review.
> > > > > > >
> > > > > > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> > > > > alexey.goncharuk@gmail.com>:
> > > > > > > >
> > > > > > > > Hello Nikolay,
> > > > > > > >
> > > > > > > >
> > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO
> > and
> > > > > > > > > filePageStore - what should we do with this?
> > > > > > > > >
> > > > > > > > > filePageStore checks CRC of the encrypted page. This required
> > > to
> > > > > confirm
> > > > > > > > > the page not corrupted on the disk.
> > > > > > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> > > > > stored in the
> > > > > > > > > encrypted data).
> > > > > > > > > This required to be sure the decrypted page contains correct
> > > data
> > > > > and not
> > > > > > > > > replaced with some malicious content.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I still do not see why we need CRC twice, can you please
> > > elaborate
> > > > > on this
> > > > > > > > statement? If an attacker is able to replace the contents of an
> > > > > encrypted
> > > > > > > > page, it means that they have access to the encryption key.
> > What
> > > will
> > > > > > > > prevent them from calculating the CRC of malicious content and
> > > then
> > > > > > > > encrypting it?
> > > > >
> > >
> >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Maksim Stepachev <ma...@gmail.com>.
Hi,

I'm going to do it.

сб, 3 окт. 2020 г. в 21:47, Alex Plehanov <pl...@gmail.com>:

> Hello guys,
>
> I've finished the review and approved the patch.
> Anybody else would like to review it?
>
> пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin <xx...@gmail.com>:
>
> > Hello, Maksim!
> >
> > I am currently working on a review notes from Alexey Plekhanov, will
> > let you know when I finish.
> >
> > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <
> maksim.stepachev@gmail.com
> > >:
> > >
> > > Hi, Pavel.
> > >
> > > As I see, the ticket [
> https://issues.apache.org/jira/browse/IGNITE-12843
> > ]
> > > is "PATCH AVAILABLE". Is this ticket finished?
> > >
> > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:
> > >
> > > > Hello all.
> > > >
> > > > I'm working on TDE cache group key rotation [1] and I have a couple
> of
> > > > questions about partition re-encryption.
> > > >
> > > > As described in the wiki [2], the process of re-encryption at the
> > > > moment consists of sequentially marking memory pages as dirty, this
> > > > process looks not resource-intensive.
> > > > Do you think it is necessary to do this in a multithreaded mode or
> > > > single thread is enough?
> > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > > single-threaded encryption loaded disk within 30%. At the same time,
> > > > the total re-encryption speed was around 60 MB/s, which allows one
> > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > > this performance is enough.)
> > > >
> > > > The second question is about the approach to storing the
> re-encryption
> > > > status.
> > > > At the moment, the re-encryption status includes two parameters - the
> > > > total number of pages in the partition at the time of the start of
> > > > re-encryption (int) and the index of the last re-encrypted page
> (int).
> > > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > > ensures that if the checkpoint does not happen, we will continue the
> > > > process from the last page written to disk).
> > > > However, if multithread partition scanning does not make sense, then
> > > > it seems that it is possible to change the implementation and don't
> > > > change the metapage structure. Store only the "pointer" of the
> > > > partition (and the cache group) in the metastore and scan in strict
> > > > order.
> > > > The approach with storing the status in the metapage of the partition
> > > > seems to me more flexible, stable and has a number of advantages over
> > > > the "pointer" approach:
> > > > 1. Since we saving the total number of pages at the re-encryption
> > > > startup - we will not scan extra pages that may be added to the
> > > > partition later.
> > > > 2. We can move partitions between nodes and re-encryption should
> > > > continue from a certain point on the new node.
> > > > 3. If a partition is (re)created during cache group re-encryption, it
> > > > will not be re-encrypted (since its re-encryption status will be
> reset
> > > > and all data is encrypted with the latest encryption key after
> > > > (re)creation.
> > > >
> > > > Do you think single-threaded mode is enough?
> > > > Is it better to keep the re-encryption status in the metapage or
> store
> > > > the "pointer" in the metastore?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > > [2]
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > > >
> > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I'll expand the answer a bit about calculating CRC, the problem is
> > not
> > > > > that it is calculated twice, but that now for encrypted pages,
> > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > > the correctness of the encryption key, but from my point of view,
> it
> > > > > should be vice versa - the lower (delegated) FileIO should check
> the
> > > > > physical integrity and EncryptedFileIO should check the correctness
> > of
> > > > > the encryption key.
> > > > >
> > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > filePageStore - what should we do with this?
> > > > > >
> > > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > > using the wrong encryption key to decrypt data, in which case we
> > will
> > > > > > not understand if the physical integrity is violated or the wrong
> > > > > > encryption key is used.
> > > > > >
> > > > > > > 9. Question - How do we optimize when we can check that this
> > page is
> > > > > > > already encrypted by parallel loading? Maybe we should do this
> in
> > > > Phase 4?
> > > > > >
> > > > > > To do this, we need to store the encryption key ID in memory (at
> > > > > > least), but this is not easy to do right now without breaking
> > binary
> > > > > > compatibility.
> > > > > >
> > > > > > > 7. Question -the current implementation does not use the
> > throttling
> > > > that
> > > > > > > is implemented in PDS. Users should set the throughput such as
> 5
> > MB
> > > > per
> > > > > > > second, but not the timeout, packet size, or stream size.
> > > > > >
> > > > > > I've added a simple rate limiter for this.
> > > > > >
> > > > > > > 8. Question - why we add a lot of system properties?
> > > > > > >> Can you, please, list system properties that should be moved
> to
> > the
> > > > configuration?
> > > > > >
> > > > > > It's about the background re-encryption properties, for now, it
> is:
> > > > > > - re-encryption speed limit (in megabytes per second)
> > > > > > - threads count used for re-encryption
> > > > > > - count of pages in batch, processed under checkpoint lock
> > > > > > - flag to completely disable background re-encryption
> > > > > >
> > > > > > > 11. We should remember about complicated test scenarios with
> > failover
> > > > > >
> > > > > > PR contains tests for re-encryption (and key rotation) on
> unstable
> > > > > > topology (with baseline change and without it). I'll expand them
> > if I
> > > > > > missed some cases.
> > > > > >
> > > > > > > 13. Will re-encryption continue after the cluster is completely
> > > > stopped?
> > > > > >
> > > > > > Yes, as I mentioned earlier, we save the re-encryption status in
> > the
> > > > > > meta page of each re-encrypted partition and trigger
> re-encryption
> > on
> > > > > > node startup if needed (more detailed description on the wiki).
> > > > > >
> > > > > > Thanks a lot for your comments, I am still working on PR and
> > expanding
> > > > > > wiki documentation. I'll let you know when it will be ready for
> the
> > > > > > review.
> > > > > >
> > > > > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> > > > alexey.goncharuk@gmail.com>:
> > > > > > >
> > > > > > > Hello Nikolay,
> > > > > > >
> > > > > > >
> > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO
> and
> > > > > > > > filePageStore - what should we do with this?
> > > > > > > >
> > > > > > > > filePageStore checks CRC of the encrypted page. This required
> > to
> > > > confirm
> > > > > > > > the page not corrupted on the disk.
> > > > > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> > > > stored in the
> > > > > > > > encrypted data).
> > > > > > > > This required to be sure the decrypted page contains correct
> > data
> > > > and not
> > > > > > > > replaced with some malicious content.
> > > > > > > >
> > > > > > >
> > > > > > > I still do not see why we need CRC twice, can you please
> > elaborate
> > > > on this
> > > > > > > statement? If an attacker is able to replace the contents of an
> > > > encrypted
> > > > > > > page, it means that they have access to the encryption key.
> What
> > will
> > > > > > > prevent them from calculating the CRC of malicious content and
> > then
> > > > > > > encrypting it?
> > > >
> >
>

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Alex Plehanov <pl...@gmail.com>.
Hello guys,

I've finished the review and approved the patch.
Anybody else would like to review it?

пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin <xx...@gmail.com>:

> Hello, Maksim!
>
> I am currently working on a review notes from Alexey Plekhanov, will
> let you know when I finish.
>
> пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <maksim.stepachev@gmail.com
> >:
> >
> > Hi, Pavel.
> >
> > As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843
> ]
> > is "PATCH AVAILABLE". Is this ticket finished?
> >
> > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:
> >
> > > Hello all.
> > >
> > > I'm working on TDE cache group key rotation [1] and I have a couple of
> > > questions about partition re-encryption.
> > >
> > > As described in the wiki [2], the process of re-encryption at the
> > > moment consists of sequentially marking memory pages as dirty, this
> > > process looks not resource-intensive.
> > > Do you think it is necessary to do this in a multithreaded mode or
> > > single thread is enough?
> > > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > > single-threaded encryption loaded disk within 30%. At the same time,
> > > the total re-encryption speed was around 60 MB/s, which allows one
> > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > > this performance is enough.)
> > >
> > > The second question is about the approach to storing the re-encryption
> > > status.
> > > At the moment, the re-encryption status includes two parameters - the
> > > total number of pages in the partition at the time of the start of
> > > re-encryption (int) and the index of the last re-encrypted page (int).
> > > These 8 bytes are stored in the metapage on the checkpoint (which
> > > ensures that if the checkpoint does not happen, we will continue the
> > > process from the last page written to disk).
> > > However, if multithread partition scanning does not make sense, then
> > > it seems that it is possible to change the implementation and don't
> > > change the metapage structure. Store only the "pointer" of the
> > > partition (and the cache group) in the metastore and scan in strict
> > > order.
> > > The approach with storing the status in the metapage of the partition
> > > seems to me more flexible, stable and has a number of advantages over
> > > the "pointer" approach:
> > > 1. Since we saving the total number of pages at the re-encryption
> > > startup - we will not scan extra pages that may be added to the
> > > partition later.
> > > 2. We can move partitions between nodes and re-encryption should
> > > continue from a certain point on the new node.
> > > 3. If a partition is (re)created during cache group re-encryption, it
> > > will not be re-encrypted (since its re-encryption status will be reset
> > > and all data is encrypted with the latest encryption key after
> > > (re)creation.
> > >
> > > Do you think single-threaded mode is enough?
> > > Is it better to keep the re-encryption status in the metapage or store
> > > the "pointer" in the metastore?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > > [2]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> > >
> > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> > > >
> > > > Hello,
> > > >
> > > > I'll expand the answer a bit about calculating CRC, the problem is
> not
> > > > that it is calculated twice, but that now for encrypted pages,
> > > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > > the correctness of the encryption key, but from my point of view, it
> > > > should be vice versa - the lower (delegated) FileIO should check the
> > > > physical integrity and EncryptedFileIO should check the correctness
> of
> > > > the encryption key.
> > > >
> > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > > > >
> > > > > Hello,
> > > > >
> > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > filePageStore - what should we do with this?
> > > > >
> > > > > We need to calculate the CRC of encrypted data, because we may be
> > > > > using the wrong encryption key to decrypt data, in which case we
> will
> > > > > not understand if the physical integrity is violated or the wrong
> > > > > encryption key is used.
> > > > >
> > > > > > 9. Question - How do we optimize when we can check that this
> page is
> > > > > > already encrypted by parallel loading? Maybe we should do this in
> > > Phase 4?
> > > > >
> > > > > To do this, we need to store the encryption key ID in memory (at
> > > > > least), but this is not easy to do right now without breaking
> binary
> > > > > compatibility.
> > > > >
> > > > > > 7. Question -the current implementation does not use the
> throttling
> > > that
> > > > > > is implemented in PDS. Users should set the throughput such as 5
> MB
> > > per
> > > > > > second, but not the timeout, packet size, or stream size.
> > > > >
> > > > > I've added a simple rate limiter for this.
> > > > >
> > > > > > 8. Question - why we add a lot of system properties?
> > > > > >> Can you, please, list system properties that should be moved to
> the
> > > configuration?
> > > > >
> > > > > It's about the background re-encryption properties, for now, it is:
> > > > > - re-encryption speed limit (in megabytes per second)
> > > > > - threads count used for re-encryption
> > > > > - count of pages in batch, processed under checkpoint lock
> > > > > - flag to completely disable background re-encryption
> > > > >
> > > > > > 11. We should remember about complicated test scenarios with
> failover
> > > > >
> > > > > PR contains tests for re-encryption (and key rotation) on unstable
> > > > > topology (with baseline change and without it). I'll expand them
> if I
> > > > > missed some cases.
> > > > >
> > > > > > 13. Will re-encryption continue after the cluster is completely
> > > stopped?
> > > > >
> > > > > Yes, as I mentioned earlier, we save the re-encryption status in
> the
> > > > > meta page of each re-encrypted partition and trigger re-encryption
> on
> > > > > node startup if needed (more detailed description on the wiki).
> > > > >
> > > > > Thanks a lot for your comments, I am still working on PR and
> expanding
> > > > > wiki documentation. I'll let you know when it will be ready for the
> > > > > review.
> > > > >
> > > > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com>:
> > > > > >
> > > > > > Hello Nikolay,
> > > > > >
> > > > > >
> > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > > filePageStore - what should we do with this?
> > > > > > >
> > > > > > > filePageStore checks CRC of the encrypted page. This required
> to
> > > confirm
> > > > > > > the page not corrupted on the disk.
> > > > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> > > stored in the
> > > > > > > encrypted data).
> > > > > > > This required to be sure the decrypted page contains correct
> data
> > > and not
> > > > > > > replaced with some malicious content.
> > > > > > >
> > > > > >
> > > > > > I still do not see why we need CRC twice, can you please
> elaborate
> > > on this
> > > > > > statement? If an attacker is able to replace the contents of an
> > > encrypted
> > > > > > page, it means that they have access to the encryption key. What
> will
> > > > > > prevent them from calculating the CRC of malicious content and
> then
> > > > > > encrypting it?
> > >
>

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Pavel Pereslegin <xx...@gmail.com>.
Hello, Maksim!

I am currently working on a review notes from Alexey Plekhanov, will
let you know when I finish.

пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev <ma...@gmail.com>:
>
> Hi, Pavel.
>
> As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843]
> is "PATCH AVAILABLE". Is this ticket finished?
>
> чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:
>
> > Hello all.
> >
> > I'm working on TDE cache group key rotation [1] and I have a couple of
> > questions about partition re-encryption.
> >
> > As described in the wiki [2], the process of re-encryption at the
> > moment consists of sequentially marking memory pages as dirty, this
> > process looks not resource-intensive.
> > Do you think it is necessary to do this in a multithreaded mode or
> > single thread is enough?
> > (We started testing re-encryption on dedicated servers (Xeon E5-2680
> > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> > single-threaded encryption loaded disk within 30%. At the same time,
> > the total re-encryption speed was around 60 MB/s, which allows one
> > node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> > this performance is enough.)
> >
> > The second question is about the approach to storing the re-encryption
> > status.
> > At the moment, the re-encryption status includes two parameters - the
> > total number of pages in the partition at the time of the start of
> > re-encryption (int) and the index of the last re-encrypted page (int).
> > These 8 bytes are stored in the metapage on the checkpoint (which
> > ensures that if the checkpoint does not happen, we will continue the
> > process from the last page written to disk).
> > However, if multithread partition scanning does not make sense, then
> > it seems that it is possible to change the implementation and don't
> > change the metapage structure. Store only the "pointer" of the
> > partition (and the cache group) in the metastore and scan in strict
> > order.
> > The approach with storing the status in the metapage of the partition
> > seems to me more flexible, stable and has a number of advantages over
> > the "pointer" approach:
> > 1. Since we saving the total number of pages at the re-encryption
> > startup - we will not scan extra pages that may be added to the
> > partition later.
> > 2. We can move partitions between nodes and re-encryption should
> > continue from a certain point on the new node.
> > 3. If a partition is (re)created during cache group re-encryption, it
> > will not be re-encrypted (since its re-encryption status will be reset
> > and all data is encrypted with the latest encryption key after
> > (re)creation.
> >
> > Do you think single-threaded mode is enough?
> > Is it better to keep the re-encryption status in the metapage or store
> > the "pointer" in the metastore?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12843
> > [2]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
> >
> > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> > >
> > > Hello,
> > >
> > > I'll expand the answer a bit about calculating CRC, the problem is not
> > > that it is calculated twice, but that now for encrypted pages,
> > > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > > the correctness of the encryption key, but from my point of view, it
> > > should be vice versa - the lower (delegated) FileIO should check the
> > > physical integrity and EncryptedFileIO should check the correctness of
> > > the encryption key.
> > >
> > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > > >
> > > > Hello,
> > > >
> > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > filePageStore - what should we do with this?
> > > >
> > > > We need to calculate the CRC of encrypted data, because we may be
> > > > using the wrong encryption key to decrypt data, in which case we will
> > > > not understand if the physical integrity is violated or the wrong
> > > > encryption key is used.
> > > >
> > > > > 9. Question - How do we optimize when we can check that this page is
> > > > > already encrypted by parallel loading? Maybe we should do this in
> > Phase 4?
> > > >
> > > > To do this, we need to store the encryption key ID in memory (at
> > > > least), but this is not easy to do right now without breaking binary
> > > > compatibility.
> > > >
> > > > > 7. Question -the current implementation does not use the throttling
> > that
> > > > > is implemented in PDS. Users should set the throughput such as 5 MB
> > per
> > > > > second, but not the timeout, packet size, or stream size.
> > > >
> > > > I've added a simple rate limiter for this.
> > > >
> > > > > 8. Question - why we add a lot of system properties?
> > > > >> Can you, please, list system properties that should be moved to the
> > configuration?
> > > >
> > > > It's about the background re-encryption properties, for now, it is:
> > > > - re-encryption speed limit (in megabytes per second)
> > > > - threads count used for re-encryption
> > > > - count of pages in batch, processed under checkpoint lock
> > > > - flag to completely disable background re-encryption
> > > >
> > > > > 11. We should remember about complicated test scenarios with failover
> > > >
> > > > PR contains tests for re-encryption (and key rotation) on unstable
> > > > topology (with baseline change and without it). I'll expand them if I
> > > > missed some cases.
> > > >
> > > > > 13. Will re-encryption continue after the cluster is completely
> > stopped?
> > > >
> > > > Yes, as I mentioned earlier, we save the re-encryption status in the
> > > > meta page of each re-encrypted partition and trigger re-encryption on
> > > > node startup if needed (more detailed description on the wiki).
> > > >
> > > > Thanks a lot for your comments, I am still working on PR and expanding
> > > > wiki documentation. I'll let you know when it will be ready for the
> > > > review.
> > > >
> > > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com>:
> > > > >
> > > > > Hello Nikolay,
> > > > >
> > > > >
> > > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > > filePageStore - what should we do with this?
> > > > > >
> > > > > > filePageStore checks CRC of the encrypted page. This required to
> > confirm
> > > > > > the page not corrupted on the disk.
> > > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> > stored in the
> > > > > > encrypted data).
> > > > > > This required to be sure the decrypted page contains correct data
> > and not
> > > > > > replaced with some malicious content.
> > > > > >
> > > > >
> > > > > I still do not see why we need CRC twice, can you please elaborate
> > on this
> > > > > statement? If an attacker is able to replace the contents of an
> > encrypted
> > > > > page, it means that they have access to the encryption key. What will
> > > > > prevent them from calculating the CRC of malicious content and then
> > > > > encrypting it?
> >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Posted by Maksim Stepachev <ma...@gmail.com>.
Hi, Pavel.

As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843]
is "PATCH AVAILABLE". Is this ticket finished?

чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <xx...@gmail.com>:

> Hello all.
>
> I'm working on TDE cache group key rotation [1] and I have a couple of
> questions about partition re-encryption.
>
> As described in the wiki [2], the process of re-encryption at the
> moment consists of sequentially marking memory pages as dirty, this
> process looks not resource-intensive.
> Do you think it is necessary to do this in a multithreaded mode or
> single thread is enough?
> (We started testing re-encryption on dedicated servers (Xeon E5-2680
> 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> single-threaded encryption loaded disk within 30%. At the same time,
> the total re-encryption speed was around 60 MB/s, which allows one
> node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> this performance is enough.)
>
> The second question is about the approach to storing the re-encryption
> status.
> At the moment, the re-encryption status includes two parameters - the
> total number of pages in the partition at the time of the start of
> re-encryption (int) and the index of the last re-encrypted page (int).
> These 8 bytes are stored in the metapage on the checkpoint (which
> ensures that if the checkpoint does not happen, we will continue the
> process from the last page written to disk).
> However, if multithread partition scanning does not make sense, then
> it seems that it is possible to change the implementation and don't
> change the metapage structure. Store only the "pointer" of the
> partition (and the cache group) in the metastore and scan in strict
> order.
> The approach with storing the status in the metapage of the partition
> seems to me more flexible, stable and has a number of advantages over
> the "pointer" approach:
> 1. Since we saving the total number of pages at the re-encryption
> startup - we will not scan extra pages that may be added to the
> partition later.
> 2. We can move partitions between nodes and re-encryption should
> continue from a certain point on the new node.
> 3. If a partition is (re)created during cache group re-encryption, it
> will not be re-encrypted (since its re-encryption status will be reset
> and all data is encrypted with the latest encryption key after
> (re)creation.
>
> Do you think single-threaded mode is enough?
> Is it better to keep the re-encryption status in the metapage or store
> the "pointer" in the metastore?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12843
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
>
> пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <xx...@gmail.com>:
> >
> > Hello,
> >
> > I'll expand the answer a bit about calculating CRC, the problem is not
> > that it is calculated twice, but that now for encrypted pages,
> > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > the correctness of the encryption key, but from my point of view, it
> > should be vice versa - the lower (delegated) FileIO should check the
> > physical integrity and EncryptedFileIO should check the correctness of
> > the encryption key.
> >
> > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <xx...@gmail.com>:
> > >
> > > Hello,
> > >
> > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > filePageStore - what should we do with this?
> > >
> > > We need to calculate the CRC of encrypted data, because we may be
> > > using the wrong encryption key to decrypt data, in which case we will
> > > not understand if the physical integrity is violated or the wrong
> > > encryption key is used.
> > >
> > > > 9. Question - How do we optimize when we can check that this page is
> > > > already encrypted by parallel loading? Maybe we should do this in
> Phase 4?
> > >
> > > To do this, we need to store the encryption key ID in memory (at
> > > least), but this is not easy to do right now without breaking binary
> > > compatibility.
> > >
> > > > 7. Question -the current implementation does not use the throttling
> that
> > > > is implemented in PDS. Users should set the throughput such as 5 MB
> per
> > > > second, but not the timeout, packet size, or stream size.
> > >
> > > I've added a simple rate limiter for this.
> > >
> > > > 8. Question - why we add a lot of system properties?
> > > >> Can you, please, list system properties that should be moved to the
> configuration?
> > >
> > > It's about the background re-encryption properties, for now, it is:
> > > - re-encryption speed limit (in megabytes per second)
> > > - threads count used for re-encryption
> > > - count of pages in batch, processed under checkpoint lock
> > > - flag to completely disable background re-encryption
> > >
> > > > 11. We should remember about complicated test scenarios with failover
> > >
> > > PR contains tests for re-encryption (and key rotation) on unstable
> > > topology (with baseline change and without it). I'll expand them if I
> > > missed some cases.
> > >
> > > > 13. Will re-encryption continue after the cluster is completely
> stopped?
> > >
> > > Yes, as I mentioned earlier, we save the re-encryption status in the
> > > meta page of each re-encrypted partition and trigger re-encryption on
> > > node startup if needed (more detailed description on the wiki).
> > >
> > > Thanks a lot for your comments, I am still working on PR and expanding
> > > wiki documentation. I'll let you know when it will be ready for the
> > > review.
> > >
> > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> alexey.goncharuk@gmail.com>:
> > > >
> > > > Hello Nikolay,
> > > >
> > > >
> > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > filePageStore - what should we do with this?
> > > > >
> > > > > filePageStore checks CRC of the encrypted page. This required to
> confirm
> > > > > the page not corrupted on the disk.
> > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> stored in the
> > > > > encrypted data).
> > > > > This required to be sure the decrypted page contains correct data
> and not
> > > > > replaced with some malicious content.
> > > > >
> > > >
> > > > I still do not see why we need CRC twice, can you please elaborate
> on this
> > > > statement? If an attacker is able to replace the contents of an
> encrypted
> > > > page, it means that they have access to the encryption key. What will
> > > > prevent them from calculating the CRC of malicious content and then
> > > > encrypting it?
>