You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Dong0829 (Jira)" <ji...@apache.org> on 2021/10/13 04:59:00 UTC

[jira] [Created] (HADOOP-17966) S3A SSE-KMS inconsistency issue during COPY

Dong0829 created HADOOP-17966:
---------------------------------

             Summary: S3A SSE-KMS inconsistency issue during COPY
                 Key: HADOOP-17966
                 URL: https://issues.apache.org/jira/browse/HADOOP-17966
             Project: Hadoop Common
          Issue Type: Bug
          Components: common
    Affects Versions: 3.1.2
            Reporter: Dong0829


According to the document:
[https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/encryption.html#S3_Default_Encryption]
"Organizations may define a default key in the Amazon KMS; if a default key is set, then it will be used whenever SSE-KMS encryption is chosen and the value of fs.s3a.server-side-encryption.key is empty."

So basically two conditions to make the object with default KMS: 1. Set  SSE-KMS encryption 2. Did not set fs.s3a.server-side-encryption.key

But there is another confusing scenario below:

1. User want to rely on s3 bucket side encryption using their customer KMS key(kms-keyA, for example), so user did not set fs.s3a.server-side-encryption-algorithm or fs.s3a.server-side-encryption.key, and the files uploaded to this bucket will use bucket custom KMS key kms-keyA
2. Next step,  user want to copy the file to other file using s3a, the process will invoke copyFile() in S3AFileSystem, during the copy, s3a will clone the meta data of the source in cloneObjectMetadata(), in the clone, there is copy of SSE algorithm but no specific kms key copy for the SSE-KMS, it will cause the destination using SSE-KMS without any key id, the final file will use account level default key under aws/s3(
[https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html),]
 lets say its kms-keyB.

It means when ever there is a copy, the kms key will be changed from customer key kms-keyA to kms-keyB, which will cause inconsistency, for example:

hdfs dfs -put test s3://ssetest/

During this put, there will be rename processing from test.__COPYING__ to test, it will cause the final test file encrypted with account default key kms-keyB instead of s3 bucket customer key kms-keyA which is expected. This issue will also happen during some spark commit process.

Should we consider to clone the KMS key id also to keep the consistency?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org