You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/07/13 20:58:49 UTC
[GitHub] [hadoop] steveloughran commented on a change in pull request #2706: HADOOP-13887. Support S3 client side encryption (S3-CSE) using AWS-SDK

steveloughran commented on a change in pull request #2706:
URL: https://github.com/apache/hadoop/pull/2706#discussion_r669093714



##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -2029,3 +2029,51 @@ signer for all services.
 For a specific service, the service specific signer is looked up first.
 If that is not specified, the common signer is looked up. If this is
 not specified as well, SDK settings are used.
+
+### <a name="cse"></a> Amazon S3 Client Side Encryption
+Amazon S3 Client Side Encryption(CSE), uses `AmazonS3EncryptionClientV2.java
+` AmazonS3 client. The encryption and decryption is done in AWS SDK side. Atm

Review comment:
       "atm?"
   
   Recommend an intro, features and limitations section:
   
   
   A key reason this feature (HADOOP-13887) has been unavailable for a long time is that the AWS S3 client pads uploaded objects with a 16 byte footer. This meant that files were shorter when being read that when are listed them through any of the list API calls/getFileStatus(). Which broke many applications, including anything seeking near the end of a file to read a footer, as ORC and Parquet do.
   
   There is now a workaround: compensate for the footer in listings when CSE is enabled.
   
   * when listing files and directories, 16 bytes are subtracted from the length of all non-empty objects.
   * directory markers MAY be longer than 0 bytes long.
   
   This "appears" to work; secondly it does in the testing to date. However, the length of files when listed through the S3A client is now going to be shorter than the length of files listed with other clients -including S3A clients where S3-CSE has not been enabled.
   
   features
   * Supports client side encryption with keys managed in AWS KMS
   * encryption settings propagated into jobs through any issued delegation tokens
   * encryption information stored as headers in the uploaded object
   
   Limitations
   * performance will be reduced. All encrypt/decrypt is now being done on the client.
   * writing files may be slower, as only a single block can be encrypted and uploaded at a time
   * Multipart Uploader API disabled
   
   
   
   
   

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
##########
@@ -2029,3 +2029,51 @@ signer for all services.
 For a specific service, the service specific signer is looked up first.
 If that is not specified, the common signer is looked up. If this is
 not specified as well, SDK settings are used.
+
+### <a name="cse"></a> Amazon S3 Client Side Encryption

Review comment:
       move to encryption.md

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
##########
@@ -112,15 +124,73 @@ public AmazonS3 createS3Client(
     }
 
     try {
-      return buildAmazonS3Client(
-          awsConf,
-          parameters);
+      if (conf.get(CLIENT_SIDE_ENCRYPTION_METHOD) == null) {
+        return buildAmazonS3Client(
+            awsConf,
+            parameters);
+      } else {
+        return buildAmazonS3EncryptionClient(
+            awsConf,
+            parameters);
+      }
     } catch (SdkClientException e) {
       // SDK refused to build.
       throw translateException("creating AWS S3 client", uri.toString(), e);
     }
   }
 
+  /**
+   * Create an {@link AmazonS3} client of type
+   * {@link AmazonS3EncryptionV2} if CSE is enabled.
+   *
+   * @param awsConf    AWS configuration.
+   * @param parameters parameters
+   *
+   * @return new AmazonS3 client.
+   */
+  protected AmazonS3 buildAmazonS3EncryptionClient(

Review comment:
       InconsistentS3Client doesn't do this; I can see tests skip it. Should we have that client throw some Unsupported Exception here

##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractCommitITest.java
##########
@@ -179,11 +178,9 @@ public void setup() throws Exception {
     if (useInconsistentClient()) {
       AmazonS3 client = getFileSystem()
           .getAmazonS3ClientForTesting("fault injection");
-      Assert.assertTrue(
-          "AWS client is not inconsistent, even though the test requirees it "
-          + client,
-          client instanceof InconsistentAmazonS3Client);
-      inconsistentClient = (InconsistentAmazonS3Client) client;
+      if(client instanceof InconsistentAmazonS3Client) {

Review comment:
       nit: add space after if

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
##########
@@ -43,9 +52,12 @@
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.s3a.statistics.impl.AwsStatisticsCollector;
 import org.apache.hadoop.fs.store.LogExactlyOnce;
+import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;

Review comment:
       can you move up to L44. Yes, it its a PITA.

##########
File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
##########
@@ -242,6 +242,7 @@ protected void logTimePerIOP(String operation,
 
   @Test
   public void testTimeToOpenAndReadWholeFileBlocks() throws Throwable {
+    skipIfClientSideEncryption();

Review comment:
       why are we turning this off ? Performance? seek() perf?

##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
##########
@@ -1682,3 +1682,105 @@ com.amazonaws.SdkClientException: Unable to execute HTTP request:
 
 When this happens, try to set `fs.s3a.connection.request.timeout` to a larger value or disable it
 completely by setting it to `0`.
+
+### <a name="client-side-encryption"></a> S3 Client Side Encryption

Review comment:
       The troubleshooting documents need to assume that the person reading them or googling for them has just seen a specific error message and is trying to work out what's wrong/what to do.
   
   This means the phrasing here needs to be tweaked slightly
   
   * move these into the "encryption" section (ahead of those which clearly shouldn't be there :)
   * Add a ### subtitle for every stack trace
   * with the error text the title of it -so it's the first thing people see
   * interpretation after the stack trace
   * and ideally a recommendation, if there's a simple solution (including "don't")
   
   ###  "Instruction file not found for S3 object"
   
   Reading a file fails when the client is configured with S3-CSE
   
   
   (insert <stack trace>)
     
   The file is not encrypted with S3-CSE.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org