You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/01/21 13:03:45 UTC

[GitHub] [hadoop] bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode

bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode
URL: https://github.com/apache/hadoop/pull/1810#discussion_r368971623
 
 

 ##########
 File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
 ##########
 @@ -299,29 +313,72 @@ public static BulkOperationState initiateBulkWrite(
     // Since the authoritative case is already handled outside this function,
     // we will basically start with the set of directory entries in the
     // DirListingMetadata, and add any that only exist in the backingStatuses.
+    //
+    // We try to avoid writing any more child entries than need be to :-
+    //  (a) save time and money.
+    //  (b) avoid overwriting the authoritative bit of children (HADOOP-16746).
+    // For auth mode updates, we supply the full listing and a list of which
+    // child entries have not been changed; the store gets to optimize its
+    // update however it chooses.
+    //
+    // for non-auth-mode S3Guard, we just build a list of entries to add and
+    // submit them in a batch; this is more efficient than trickling out the
+    // updates one-by-one.
+
+    // track all unchanged entries; used so the metastore can identify entries
+    // it doesn't need to update
+    List<Path> unchangedEntries = new ArrayList<>(dirMeta.getListing().size());
+    List<PathMetadata> nonAuthEntriesToAdd = new ArrayList<>(backingStatuses.size());
     boolean changed = false;
-    final Map<Path, FileStatus> dirMetaMap = dirMeta.getListing().stream()
-        .collect(Collectors.toMap(
-            pm -> pm.getFileStatus().getPath(), PathMetadata::getFileStatus)
-        );
+    final Map<Path, PathMetadata> dirMetaMap = dirMeta.getListing().stream()
+        .collect(Collectors.toMap(pm -> pm.getFileStatus().getPath(), pm -> pm));
     BulkOperationState operationState = ms.initiateBulkWrite(
         BulkOperationState.OperationType.Listing,
         path);
     for (S3AFileStatus s : backingStatuses) {
-      if (deleted.contains(s.getPath())) {
+      final Path statusPath = s.getPath();
+      if (deleted.contains(statusPath)) {
         continue;
       }
 
-      final PathMetadata pathMetadata = new PathMetadata(s);
-
-      if (!isAuthoritative){
-        FileStatus status = dirMetaMap.get(s.getPath());
-        if (status != null
-            && s.getModificationTime() > status.getModificationTime()) {
-          LOG.debug("Update ms with newer metadata of: {}", status);
-          S3Guard.putWithTtl(ms, pathMetadata, timeProvider, operationState);
+      final PathMetadata originalMD = dirMetaMap.get(statusPath);
+
+      // this is built up to be whatever entry is added to the dirMeta
+      // collection
+      PathMetadata pathMetadata = originalMD;
+
+      if (!isAuthoritative) {
+        // in non-auth listings, we compare the file status of the metastore
+        // list with those in the FS, and overwrite the MS entry if
+        // either of two conditions are met
+        // - there is no entry in the metadata and
 
 Review comment:
   nit: there is no entry in the _metastore_ and

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org