You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/11/17 06:33:01 UTC

[GitHub] [hive] ayushtkn commented on a change in pull request #2793: HIVE-25609: Preserve XAttrs in normal file copy case.

ayushtkn commented on a change in pull request #2793:
URL: https://github.com/apache/hive/pull/2793#discussion_r750211816



##########
File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java
##########
@@ -662,10 +663,27 @@ static boolean copy(FileSystem srcFS, Path src,
       // wherein if distcp fails, there is good reason to not plod along with a trivial
       // implementation, and fail instead.
       copied = FileUtil.copy(srcFS, src, dstFS, dst, deleteSource, overwrite, conf);
+      if (copied && !deleteSource

Review comment:
       The xAttrs won't be if `deleteSource` is true, which isn't same as in case of distcp?
   If not, better to get the Xattrs before deletion and then preserve it if copy is successful.

##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##########
@@ -988,6 +990,52 @@ public void testReadOperationsNotCapturedInNotificationLog() throws Throwable {
     }
   }
 
+  @Test
+  public void testXAttrsPreserved() throws Throwable {
+    String tbName = "dummyTable";
+    primary.run("use " + primaryDbName)
+            .run("CREATE TABLE " + tbName + "(a int) STORED AS TEXTFILE")
+            .run("INSERT into " + tbName + " values(1)");
+    Table srcTb = primary.getTable(primaryDbName, tbName);
+    Path tablePath = new Path(srcTb.getSd().getLocation());
+    FileSystem fs = tablePath.getFileSystem(conf);
+    setXAttrsRecursive(fs, fs.getFileStatus(tablePath));
+    primary.dump(primaryDbName);
+    replica.load(replicatedDbName, primaryDbName);
+    Table dstTb = primary.getTable(replicatedDbName, tbName);
+    verifyXAttrsPreserved(fs, fs.getFileStatus(tablePath), fs.getFileStatus(new Path(dstTb.getSd().getLocation())));
+  }
+
+  private void setXAttrsRecursive(FileSystem srcFS, FileStatus srcStatus) throws Exception {
+    Path src = srcStatus.getPath();
+    if (srcStatus.isDirectory()) {

Review comment:
       set on the directory itself

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -111,8 +111,17 @@ String checkSumFor(Path srcFile, FileSystem fs) throws IOException {
   @VisibleForTesting
   void copyFilesBetweenFS(FileSystem sourceFs, Path[] paths, FileSystem destinationFs,
                                   Path finalDestination, boolean deleteSource, boolean overwrite) throws IOException {
-    retryableFxn(() -> FileUtil
-            .copy(sourceFs, paths, destinationFs, finalDestination, deleteSource, overwrite, hiveConf));
+    retryableFxn(() -> {
+      boolean copied = FileUtil
+              .copy(sourceFs, paths, destinationFs, finalDestination, deleteSource, overwrite, hiveConf);
+      if (copied && !deleteSource

Review comment:
       Same as above, deleteSource shouldn't bother and give a check to distCp options

##########
File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java
##########
@@ -662,10 +663,27 @@ static boolean copy(FileSystem srcFS, Path src,
       // wherein if distcp fails, there is good reason to not plod along with a trivial
       // implementation, and fail instead.
       copied = FileUtil.copy(srcFS, src, dstFS, dst, deleteSource, overwrite, conf);
+      if (copied && !deleteSource
+              && Utils.checkFileSystemXAttrSupport(srcFS) && Utils.checkFileSystemXAttrSupport(dstFS)) {
+        copyXAttrs(srcFS, srcFS.getFileStatus(src), dstFS, dst);
+      }
     }
     return copied;
   }
 
+  public static void copyXAttrs(FileSystem srcFS, FileStatus srcStatus, FileSystem dstFS, Path dst) throws IOException {
+    Path src = srcStatus.getPath();
+    if (srcStatus.isDirectory()) {
+      for(FileStatus content: srcFS.listStatus(src)) {
+        copyXAttrs(srcFS, content, dstFS, new Path(dst, content.getPath().getName()));
+      }

Review comment:
       This would move to preserving Xattrs on files, We need to preserve Xattrs on the directory itself too

##########
File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java
##########
@@ -662,10 +663,27 @@ static boolean copy(FileSystem srcFS, Path src,
       // wherein if distcp fails, there is good reason to not plod along with a trivial
       // implementation, and fail instead.
       copied = FileUtil.copy(srcFS, src, dstFS, dst, deleteSource, overwrite, conf);
+      if (copied && !deleteSource
+              && Utils.checkFileSystemXAttrSupport(srcFS) && Utils.checkFileSystemXAttrSupport(dstFS)) {

Review comment:
       In case the user explicitly pass distCp as say -pb, then distcp won't preserve Xattrs or do we still preserve in case of distcp? If not, then we should do in case of normal copy as well. Should keep the behaviour in sync.

##########
File path: shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##########
@@ -272,21 +272,6 @@ public void refreshDefaultQueue(Configuration conf, String userName) throws IOEx
     //no op
   }
 
-  private boolean isFairScheduler (Configuration conf) {
-    return "org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler".
-        equalsIgnoreCase(conf.get(YarnConfiguration.RM_SCHEDULER));
-  }

Review comment:
       Unrelated change, should remove it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org