You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Aaron Fabbri (JIRA)" <ji...@apache.org> on 2017/09/01 23:36:00 UTC

[jira] [Commented] (HADOOP-13421) Switch to v2 of the S3 List Objects API in S3A

    [ https://issues.apache.org/jira/browse/HADOOP-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151302#comment-16151302 ] 

Aaron Fabbri commented on HADOOP-13421:
---------------------------------------

Whoever added the forced list response paging to ITestS3AContractGetFileStatus, thank you.  Was going to add that and see it is already there.

Also explains why that test was timing out with v2 list.. not just slow home internet.. I needed to change this bit:

{noformat}
diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
index e8b739432d1..eb80d37a12f 100644
--- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
+++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
@@ -1113,7 +1113,7 @@ protected ListObjectsV2Result continueListObjects(ListObjectsV2Request req,
       ListObjectsV2Result objects) {
     incrementStatistic(OBJECT_CONTINUE_LIST_REQUESTS);
     incrementReadOperations();
-    req.setContinuationToken(objects.getContinuationToken());
+    req.setContinuationToken(objects.getNextContinuationToken());
     return s3.listObjectsV2(req);
   }
{noformat}

So, the v2 response has two continuation token fields, {{ContinuationToken}} and {{NextContinuationToken}}.  Turns out i was using the former and retrieving the same 2 results over and over.  Gave me a giggle, had to share..  V2 patch coming soon.

> Switch to v2 of the S3 List Objects API in S3A
> ----------------------------------------------
>
>                 Key: HADOOP-13421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13421
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steven K. Wong
>            Assignee: Aaron Fabbri
>            Priority: Minor
>         Attachments: HADOOP-13421-HADOOP-13345.001.patch
>
>
> Unlike [version 1|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html] of the S3 List Objects API, [version 2|http://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html] by default does not fetch object owner information, which S3A doesn't need anyway. By switching to v2, there will be less data to transfer/process. Also, it should be more robust when listing a versioned bucket with "a large number of delete markers" ([according to AWS|https://aws.amazon.com/releasenotes/Java/0735652458007581]).
> Methods in S3AFileSystem that use this API include:
> * getFileStatus(Path)
> * innerDelete(Path, boolean)
> * innerListStatus(Path)
> * innerRename(Path, Path)
> Requires AWS SDK 1.10.75 or later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org