You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/08/08 16:56:28 UTC

[GitHub] [hadoop] bgaborg commented on a change in pull request #2149: HADOOP-13230. S3A to optionally retain directory markers

bgaborg commented on a change in pull request #2149:
URL: https://github.com/apache/hadoop/pull/2149#discussion_r467481408



##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java
##########
@@ -142,12 +142,27 @@ public FileStatusListingIterator createFileStatusListingIterator(
       Listing.FileStatusAcceptor acceptor,
       RemoteIterator<S3AFileStatus> providedStatus) throws IOException {
     return new FileStatusListingIterator(
-        new ObjectListingIterator(listPath, request),
+        createObjectListingIterator(listPath, request),
         filter,
         acceptor,
         providedStatus);
   }
 
+  /**
+   * Create an object listing iterator against a path, with a given
+   * list object request.
+   * @param listPath path of the listing
+   * @param request initial request to make
+   * @return the iterator
+   * @throws IOException IO Problems
+   */
+  @Retries.RetryRaw
+  public ObjectListingIterator createObjectListingIterator(

Review comment:
       why is this required, will there be some logic in the future other than creating the new object?

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentAmazonS3Client.java
##########
@@ -330,7 +330,9 @@ private boolean isDescendant(String parent, String child, boolean recursive) {
     } else {
       Path actualParentPath = new Path(child).getParent();
       Path expectedParentPath = new Path(parent);
-      return actualParentPath.equals(expectedParentPath);
+      // children which are directory markers are excluded here
+      return actualParentPath.equals(expectedParentPath)

Review comment:
       It would be better to have a `boolean` var here and return the value - I prefer that for the sake of readability but this will do as well

##########
File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -2890,72 +2942,114 @@ S3AFileStatus innerGetFileStatus(final Path f,
       // Skip going to s3 if the file checked is a directory. Because if the
       // dest is also a directory, there's no difference.
 
-      if (!pm.getFileStatus().isDirectory() &&
+      if (!msStatus.isDirectory() &&
           !allowAuthoritative &&
           probes.contains(StatusProbeEnum.Head)) {
         // a file has been found in a non-auth path and the caller has not said
         // they only care about directories
         LOG.debug("Metadata for {} found in the non-auth metastore.", path);
-        final long msModTime = pm.getFileStatus().getModificationTime();
-
-        S3AFileStatus s3AFileStatus;
-        try {
-          s3AFileStatus = s3GetFileStatus(path, key, probes, tombstones);
-        } catch (FileNotFoundException fne) {
-          s3AFileStatus = null;
-        }
-        if (s3AFileStatus == null) {
-          LOG.warn("Failed to find file {}. Either it is not yet visible, or "
-              + "it has been deleted.", path);
-        } else {
-          final long s3ModTime = s3AFileStatus.getModificationTime();
-
-          if(s3ModTime > msModTime) {
-            LOG.debug("S3Guard metadata for {} is outdated;"
-                + " s3modtime={}; msModTime={} updating metastore",
-                path, s3ModTime, msModTime);
-            return S3Guard.putAndReturn(metadataStore, s3AFileStatus,
-                ttlTimeProvider);
+        // If the timestamp of the pm is close to "now", we don't need to

Review comment:
       This logic should be located in S3Guard imho, not in the fs code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org