You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/11/24 17:32:40 UTC

[GitHub] [hadoop] steveloughran commented on a diff in pull request #5163: HADOOP-18073. Upgrade AWS SDK to v2 in S3A [work in progress]

steveloughran commented on code in PR #5163:
URL: https://github.com/apache/hadoop/pull/5163#discussion_r1031695967


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/adapter/V1ToV2AwsCredentialProviderAdapter.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.adapter;
+
+import com.amazonaws.auth.AWSCredentials;
+import com.amazonaws.auth.AWSCredentialsProvider;
+import com.amazonaws.auth.AWSSessionCredentials;
+import com.amazonaws.auth.AnonymousAWSCredentials;
+

Review Comment:
   should all be in the same "not java/not apache import block"



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/AbstractSessionCredentialsProvider.java:
##########
@@ -23,29 +23,26 @@
 import java.io.IOException;
 import java.util.concurrent.atomic.AtomicBoolean;
 
-import com.amazonaws.SdkBaseException;
-import com.amazonaws.auth.AWSCredentials;
 import org.apache.hadoop.classification.VisibleForTesting;
-
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.s3a.CredentialInitializationException;
 import org.apache.hadoop.fs.s3a.Invoker;
 import org.apache.hadoop.fs.s3a.Retries;
 
+import software.amazon.awssdk.auth.credentials.AwsCredentials;

Review Comment:
   move back up



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/SDKStreamDrainer.java:
##########
@@ -58,17 +51,9 @@ public class SDKStreamDrainer implements CallableRaisingIOE<Boolean> {
   private final String uri;
 
   /**
-   * Request object; usually S3Object

Review Comment:
   are you confident we don't need to hold on to any outer class now?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/MultiObjectDeleteException.java:
##########
@@ -16,39 +16,53 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.fs.s3a.impl;
+package org.apache.hadoop.fs.s3a;
 
 import java.io.IOException;
 import java.nio.file.AccessDeniedException;
 import java.util.List;
 
-import com.amazonaws.services.s3.model.MultiObjectDeleteException;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import org.apache.hadoop.fs.s3a.AWSS3IOException;
+import software.amazon.awssdk.services.s3.model.S3Error;
+import software.amazon.awssdk.services.s3.model.S3Exception;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+
+import static org.apache.hadoop.fs.s3a.impl.InternalConstants.SC_200_OK;
 
 /**
- * Support for Multi Object Deletion.
- * This is used to be a complex piece of code as it was required to
- * update s3guard.
- * Now all that is left is the exception extraction for better
- * reporting,
+ * Exception raised in {@link S3AFileSystem#deleteObjects} when
+ * one or more of the keys could not be deleted.
+ *
+ * Used to reproduce the behaviour of SDK v1 for partial failures
+ * on DeleteObjects. In SDK v2, the errors are returned as part of
+ * the response objects.
  */
-public final class MultiObjectDeleteSupport {
+@InterfaceAudience.Public

Review Comment:
   is it public? otherwise, move to s3a impl package



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -75,19 +75,20 @@
 import java.util.Collection;
 import java.util.Collections;
 import java.util.Date;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
 import java.util.Set;
+import java.util.concurrent.CompletionException;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.TimeUnit;
 
 import static org.apache.commons.lang3.StringUtils.isEmpty;
 import static org.apache.hadoop.fs.s3a.Constants.*;
 import static org.apache.hadoop.fs.s3a.impl.ErrorTranslation.isUnknownBucket;
-import static org.apache.hadoop.fs.s3a.impl.InternalConstants.CSE_PADDING_LENGTH;
-import static org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport.translateDeleteException;
+import static org.apache.hadoop.fs.s3a.impl.InternalConstants.*;

Review Comment:
   restore the single import



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/AbstractAWSCredentialProvider.java:
##########
@@ -65,10 +62,4 @@ public URI getUri() {
     return binding;
   }
 
-  /**
-   * Refresh is a no-op by default.
-   */
-  @Override
-  public void refresh() {
-  }
 }

Review Comment:
   has refresh() gone?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/MarshalledCredentialBinding.java:
##########
@@ -194,25 +193,26 @@ public static AWSCredentials toAWSCredentials(
    */
   @Retries.RetryTranslated
   public static MarshalledCredentials requestSessionCredentials(
-      final AWSCredentialsProvider parentCredentials,
-      final ClientConfiguration awsConf,
+      final AwsCredentialsProvider parentCredentials,
+      final Configuration configuration,
       final String stsEndpoint,
       final String stsRegion,
       final int duration,
-      final Invoker invoker) throws IOException {
+      final Invoker invoker,
+      final String bucket) throws IOException {

Review Comment:
   update javadocs



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/api/RequestFactory.java:
##########
@@ -19,39 +19,33 @@
 package org.apache.hadoop.fs.s3a.api;
 
 import javax.annotation.Nullable;
-import java.io.File;
-import java.io.InputStream;
 import java.util.List;
-import java.util.Optional;
-
-import com.amazonaws.services.s3.model.AbortMultipartUploadRequest;
-import com.amazonaws.services.s3.model.CannedAccessControlList;
-import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest;
-import com.amazonaws.services.s3.model.CopyObjectRequest;
-import com.amazonaws.services.s3.model.DeleteObjectRequest;
-import com.amazonaws.services.s3.model.DeleteObjectsRequest;
-import com.amazonaws.services.s3.model.GetObjectMetadataRequest;
-import com.amazonaws.services.s3.model.GetObjectRequest;
-import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest;
-import com.amazonaws.services.s3.model.ListMultipartUploadsRequest;
-import com.amazonaws.services.s3.model.ListNextBatchOfObjectsRequest;
-import com.amazonaws.services.s3.model.ListObjectsRequest;
-import com.amazonaws.services.s3.model.ListObjectsV2Request;
-import com.amazonaws.services.s3.model.ObjectListing;
-import com.amazonaws.services.s3.model.ObjectMetadata;
-import com.amazonaws.services.s3.model.PartETag;
-import com.amazonaws.services.s3.model.PutObjectRequest;
-import com.amazonaws.services.s3.model.SSEAwsKeyManagementParams;
-import com.amazonaws.services.s3.model.SSECustomerKey;
-import com.amazonaws.services.s3.model.SelectObjectContentRequest;
-import com.amazonaws.services.s3.model.StorageClass;
-import com.amazonaws.services.s3.model.UploadPartRequest;
 
 import org.apache.hadoop.fs.PathIOException;
 import org.apache.hadoop.fs.s3a.S3AEncryptionMethods;
 import org.apache.hadoop.fs.s3a.auth.delegation.EncryptionSecrets;
 import org.apache.hadoop.fs.s3a.impl.PutObjectOptions;
 
+import software.amazon.awssdk.services.s3.model.AbortMultipartUploadRequest;

Review Comment:
   can move these imports back to where the others were; it will makes bacporting/cherrypicking even harder. same everywhere else too



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ChangeDetectionPolicy.java:
##########
@@ -398,17 +418,24 @@ static class VersionIdChangeDetectionPolicy extends
     }
 
     @Override
-    public String getRevisionId(ObjectMetadata objectMetadata, String uri) {
-      String versionId = objectMetadata.getVersionId();
+    public String getRevisionId(HeadObjectResponse objectMetadata, String uri) {
+      return logIfNull(objectMetadata.versionId(), uri);
+    }
+
+    @Override
+    public String getRevisionId(GetObjectResponse getObjectResponse, String uri) {
+      return logIfNull(getObjectResponse.versionId(), uri);
+    }
+
+    private String logIfNull(String versionId, String uri) {
       if (versionId == null) {
         // this policy doesn't work if the bucket doesn't have object versioning
         // enabled (which isn't by default)
         getLogNoVersionSupport().warn(
             CHANGE_DETECT_MODE + " set to " + Source.VersionId
                 + " but no versionId available while reading {}. "
                 + "Ensure your bucket has object versioning enabled. "
-                + "You may see inconsistent reads.",
-            uri);
+                + "You may see inconsistent reads.", uri);

Review Comment:
   superflous change



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/MarshalledCredentialBinding.java:
##########
@@ -194,25 +193,26 @@ public static AWSCredentials toAWSCredentials(
    */
   @Retries.RetryTranslated
   public static MarshalledCredentials requestSessionCredentials(
-      final AWSCredentialsProvider parentCredentials,
-      final ClientConfiguration awsConf,
+      final AwsCredentialsProvider parentCredentials,
+      final Configuration configuration,
       final String stsEndpoint,
       final String stsRegion,
       final int duration,
-      final Invoker invoker) throws IOException {
+      final Invoker invoker,
+      final String bucket) throws IOException {
     try {
-      final AWSSecurityTokenService tokenService =
+      final StsClient tokenService =
           STSClientFactory.builder(parentCredentials,
-              awsConf,
+              configuration,
               stsEndpoint.isEmpty() ? null : stsEndpoint,
-              stsRegion)
+              stsRegion, bucket)

Review Comment:
   put on new line



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/InternalConstants.java:
##########
@@ -110,11 +110,50 @@ private InternalConstants() {
     S3A_OPENFILE_KEYS = Collections.unmodifiableSet(keys);
   }
 
-  /** 403 error code. */
-  public static final int SC_403 = 403;
+  /** 200 status code: OK. */

Review Comment:
   where these copied from somewhere or did you write them yourself? if somewhere, was it an ASF project like httpclient (license good) or from somewhere else....



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/select/BlockingEnumeration.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.select;
+
+import java.util.Enumeration;
+import java.util.NoSuchElementException;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.LinkedBlockingQueue;
+
+import org.reactivestreams.Subscriber;
+import org.reactivestreams.Subscription;
+
+import software.amazon.awssdk.core.async.SdkPublisher;
+import software.amazon.awssdk.core.exception.SdkException;
+
+/**
+ * Implements the {@link Enumeration} interface by subscribing to a
+ * {@link SdkPublisher} instance. The enumeration will buffer a fixed
+ * number of elements and only request new ones from the publisher
+ * when they are consumed. Calls to {@link #hasMoreElements()} and
+ * {@link #nextElement()} may block while waiting for new elements.
+ * @param <T> the type of element.
+ */
+public final class BlockingEnumeration<T> implements Enumeration<T> {

Review Comment:
   expecting to see some unit tests for this as it is complex enough



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java:
##########
@@ -18,42 +18,41 @@
 
 package org.apache.hadoop.fs.s3a.impl;
 
-import java.io.File;
-import java.io.IOException;
-import java.io.InputStream;
-import java.util.ArrayList;
+import java.util.Base64;
+import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
-import java.util.Optional;
 import javax.annotation.Nullable;
 
-import com.amazonaws.AmazonWebServiceRequest;
-import com.amazonaws.services.s3.model.AbortMultipartUploadRequest;
-import com.amazonaws.services.s3.model.CannedAccessControlList;
-import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest;
-import com.amazonaws.services.s3.model.CopyObjectRequest;
-import com.amazonaws.services.s3.model.DeleteObjectRequest;
-import com.amazonaws.services.s3.model.DeleteObjectsRequest;
-import com.amazonaws.services.s3.model.GetObjectMetadataRequest;
-import com.amazonaws.services.s3.model.GetObjectRequest;
-import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest;
-import com.amazonaws.services.s3.model.ListMultipartUploadsRequest;
-import com.amazonaws.services.s3.model.ListNextBatchOfObjectsRequest;
-import com.amazonaws.services.s3.model.ListObjectsRequest;
-import com.amazonaws.services.s3.model.ListObjectsV2Request;
-import com.amazonaws.services.s3.model.ObjectListing;
-import com.amazonaws.services.s3.model.ObjectMetadata;
-import com.amazonaws.services.s3.model.PartETag;
-import com.amazonaws.services.s3.model.PutObjectRequest;
-import com.amazonaws.services.s3.model.SSEAwsKeyManagementParams;
-import com.amazonaws.services.s3.model.SSECustomerKey;
-import com.amazonaws.services.s3.model.SelectObjectContentRequest;
-import com.amazonaws.services.s3.model.StorageClass;
-import com.amazonaws.services.s3.model.UploadPartRequest;
 import org.apache.hadoop.util.Preconditions;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import software.amazon.awssdk.core.SdkRequest;

Review Comment:
   delete line above to merge



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/tools/MarkerTool.java:
##########
@@ -32,10 +32,8 @@
 import java.util.Map;
 import java.util.stream.Collectors;
 
-import com.amazonaws.AmazonClientException;
-import com.amazonaws.services.s3.model.DeleteObjectsRequest;
-import com.amazonaws.services.s3.model.MultiObjectDeleteException;
 import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.fs.s3a.MultiObjectDeleteException;

Review Comment:
   move into the "real" hadoop block; L35 is some guava migration artifact



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/InternalConstants.java:
##########
@@ -110,11 +110,50 @@ private InternalConstants() {
     S3A_OPENFILE_KEYS = Collections.unmodifiableSet(keys);
   }
 
-  /** 403 error code. */
-  public static final int SC_403 = 403;
+  /** 200 status code: OK. */
+  public static final int SC_200_OK = 200;
 
-  /** 404 error code. */

Review Comment:
   reinstate and just refer to the SC_404_NOT_FOUND; mark @Deprecated.



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/AWSCannedACL.java:
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+/**
+ * Enum to map AWS SDK V1 Acl values to SDK V2.
+ */
+public enum AWSCannedACL {

Review Comment:
   move to s3a.impl package



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentS3ClientFactory.java:
##########
@@ -18,37 +18,93 @@
 
 package org.apache.hadoop.fs.s3a;
 
-import com.amazonaws.ClientConfiguration;
-import com.amazonaws.services.s3.AmazonS3;
+import java.util.concurrent.atomic.AtomicLong;
+
+import software.amazon.awssdk.awscore.exception.AwsServiceException;
+import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration;
+import software.amazon.awssdk.core.exception.SdkException;
+import software.amazon.awssdk.core.interceptor.Context;
+import software.amazon.awssdk.core.interceptor.ExecutionAttributes;
+import software.amazon.awssdk.core.interceptor.ExecutionInterceptor;
 
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
 
 /**
  * S3 Client factory used for testing with eventual consistency fault injection.
  * This client is for testing <i>only</i>; it is in the production
  * {@code hadoop-aws} module to enable integration tests to use this
  * just by editing the Hadoop configuration used to bring up the client.
  *
- * The factory uses the older constructor-based instantiation/configuration
- * of the client, so does not wire up metrics, handlers etc.
+ * The factory injects an {@link ExecutionInterceptor} to inject failures.
  */
 @InterfaceAudience.Private
 @InterfaceStability.Unstable
 public class InconsistentS3ClientFactory extends DefaultS3ClientFactory {

Review Comment:
   maybe we should cut this class, unless we are aware of any use



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/AWSClientConfig.java:
##########
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.time.Duration;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration;
+import software.amazon.awssdk.core.client.config.SdkAdvancedClientOption;
+import software.amazon.awssdk.core.retry.RetryPolicy;
+import software.amazon.awssdk.http.apache.ApacheHttpClient;
+import software.amazon.awssdk.http.apache.ProxyConfiguration;
+import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient;
+import software.amazon.awssdk.thirdparty.org.apache.http.client.utils.URIBuilder;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.util.VersionInfo;
+
+import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ESTABLISH_TIMEOUT;
+import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_MAXIMUM_CONNECTIONS;
+import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_MAX_ERROR_RETRIES;
+import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_REQUEST_TIMEOUT;
+import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_SECURE_CONNECTIONS;
+import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_SOCKET_TIMEOUT;
+import static org.apache.hadoop.fs.s3a.Constants.ESTABLISH_TIMEOUT;
+import static org.apache.hadoop.fs.s3a.Constants.MAXIMUM_CONNECTIONS;
+import static org.apache.hadoop.fs.s3a.Constants.MAX_ERROR_RETRIES;
+import static org.apache.hadoop.fs.s3a.Constants.PROXY_DOMAIN;
+import static org.apache.hadoop.fs.s3a.Constants.PROXY_HOST;
+import static org.apache.hadoop.fs.s3a.Constants.PROXY_PASSWORD;
+import static org.apache.hadoop.fs.s3a.Constants.PROXY_PORT;
+import static org.apache.hadoop.fs.s3a.Constants.PROXY_USERNAME;
+import static org.apache.hadoop.fs.s3a.Constants.PROXY_WORKSTATION;
+import static org.apache.hadoop.fs.s3a.Constants.REQUEST_TIMEOUT;
+import static org.apache.hadoop.fs.s3a.Constants.SECURE_CONNECTIONS;
+import static org.apache.hadoop.fs.s3a.Constants.SOCKET_TIMEOUT;
+import static org.apache.hadoop.fs.s3a.Constants.USER_AGENT_PREFIX;
+
+/**
+ * Methods for configuring the S3 client.
+ * These methods are used when creating and configuring
+ * {@link software.amazon.awssdk.services.s3.S3Client} which communicates with the S3 service.
+ */
+public final class AWSClientConfig {

Review Comment:
   1. move to s3a.impl package
   2. need to also work with versions of the sdk which don't have shaded httpclient classes -see HADOOP-17337



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java:
##########
@@ -575,24 +578,30 @@ private int putObject() throws IOException {
     final PutObjectRequest putObjectRequest = uploadData.hasFile() ?
         writeOperationHelper.createPutObjectRequest(
             key,
-            uploadData.getFile(),
-            builder.putOptions)
+            uploadData.getFile().length(),
+            builder.putOptions,
+            true)
         : writeOperationHelper.createPutObjectRequest(
             key,
-            uploadData.getUploadStream(),
             size,
-            builder.putOptions);
-    BlockUploadProgress callback =
-        new BlockUploadProgress(
-            block, progressListener,  now());
-    putObjectRequest.setGeneralProgressListener(callback);
+            builder.putOptions,
+        false);
+
+          // TODO: You cannot currently add progress listeners to requests not via the TM.
+          // There is an open ticket for this with the SDK team. But need to check how important

Review Comment:
   it is critical  to send heartbeats back from processes like distcp workers up to the distcp manager...if things time out then the worker process is killed.
   it's most important for close()



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -41,54 +43,65 @@
 import java.util.Objects;
 import java.util.TreeSet;
 import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.CompletionException;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.LinkedBlockingQueue;
 import java.util.concurrent.ThreadPoolExecutor;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicBoolean;
 import javax.annotation.Nullable;
 
-import com.amazonaws.AmazonClientException;
-import com.amazonaws.AmazonServiceException;
-import com.amazonaws.SdkBaseException;
-import com.amazonaws.services.s3.AmazonS3;
 import com.amazonaws.services.s3.Headers;
-import com.amazonaws.services.s3.model.CannedAccessControlList;
-import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest;
-import com.amazonaws.services.s3.model.CompleteMultipartUploadResult;
-import com.amazonaws.services.s3.model.CopyObjectRequest;
-import com.amazonaws.services.s3.model.DeleteObjectsRequest;
-import com.amazonaws.services.s3.model.DeleteObjectsResult;
-import com.amazonaws.services.s3.model.GetObjectMetadataRequest;
-import com.amazonaws.services.s3.model.GetObjectRequest;
-import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest;
-import com.amazonaws.services.s3.model.InitiateMultipartUploadResult;
-import com.amazonaws.services.s3.model.ListMultipartUploadsRequest;
-import com.amazonaws.services.s3.model.ListObjectsRequest;
-import com.amazonaws.services.s3.model.ListObjectsV2Request;
-import com.amazonaws.services.s3.model.MultiObjectDeleteException;
-import com.amazonaws.services.s3.model.MultipartUpload;
-import com.amazonaws.services.s3.model.ObjectMetadata;
-import com.amazonaws.services.s3.model.PutObjectRequest;
-import com.amazonaws.services.s3.model.PutObjectResult;
-import com.amazonaws.services.s3.model.S3Object;
-import com.amazonaws.services.s3.model.SelectObjectContentRequest;
-import com.amazonaws.services.s3.model.SelectObjectContentResult;
-import com.amazonaws.services.s3.model.StorageClass;
-import com.amazonaws.services.s3.model.UploadPartRequest;
-import com.amazonaws.services.s3.model.UploadPartResult;
-import com.amazonaws.services.s3.transfer.Copy;
-import com.amazonaws.services.s3.transfer.TransferManager;
-import com.amazonaws.services.s3.transfer.TransferManagerConfiguration;
-import com.amazonaws.services.s3.transfer.Upload;
-import com.amazonaws.services.s3.transfer.model.CopyResult;
-import com.amazonaws.services.s3.transfer.model.UploadResult;
-import com.amazonaws.event.ProgressListener;
 
 import org.apache.hadoop.fs.impl.prefetch.ExecutorServiceFuturePool;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import software.amazon.awssdk.core.ResponseInputStream;

Review Comment:
   remove the line above to keep in same block



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -676,7 +712,7 @@ public static AWSCredentialProviderList buildAWSProviderList(
    * @return the instantiated class
    * @throws IOException on any instantiation failure.
    */
-  private static AWSCredentialsProvider createAWSCredentialProvider(
+  private static AWSCredentialsProvider createAWSV1CredentialProvider(

Review Comment:
   this is something which could be moved into the .auth module and while built with the v1 sdk, should be something we don't need on the CP except when there's what is (probably) a v1 credential provider listed



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -810,8 +820,8 @@ protected static S3AStorageStatistics createStorageStatistics(
   }
 
   /**
-   * Verify that the bucket exists. This does not check permissions,
-   * not even read access.
+   * Verify that the bucket exists.

Review Comment:
   was the original probe, retained because 3rd party stores don't always handle v2.
   now we have changed the default probe to "0", we could think about cutting v1 entirely



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -750,6 +786,105 @@ private static AWSCredentialsProvider createAWSCredentialProvider(
     }
   }
 
+  /**
+   * Create an AWS credential provider from its class by using reflection.  The
+   * class must implement one of the following means of construction, which are
+   * attempted in order:
+   *
+   * <ol>
+   * <li>a public constructor accepting java.net.URI and
+   *     org.apache.hadoop.conf.Configuration</li>
+   * <li>a public constructor accepting
+   *    org.apache.hadoop.conf.Configuration</li>
+   * <li>a public static method named getInstance that accepts no
+   *    arguments and returns an instance of
+   *    software.amazon.awssdk.auth.credentials.AwsCredentialsProvider, or</li>
+   * <li>a public default constructor.</li>
+   * </ol>
+   *
+   * @param conf configuration
+   * @param credClass credential class
+   * @param uri URI of the FS
+   * @return the instantiated class
+   * @throws IOException on any instantiation failure.
+   */
+  private static AwsCredentialsProvider createAWSV2CredentialProvider(

Review Comment:
   i'm going to propose all the logic to create credentials providers is move to something fs.s3.auth ; now is the time to clean it up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org