You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/10/05 15:19:27 UTC

[GitHub] [hive] ArkoSharma commented on a change in pull request #2539: HIVE-25397: snapshot support for controlled failover

ArkoSharma commented on a change in pull request #2539:
URL: https://github.com/apache/hive/pull/2539#discussion_r722350281



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##########
@@ -192,64 +196,135 @@ private void dirLocationToCopy(String tableName, FileList fileList, Path sourceP
     fileList.add(new DirCopyWork(tableName, sourcePath, targetPath, copyMode, snapshotPrefix).convertToString());
   }
 
-  private SnapshotUtils.SnapshotCopyMode createSnapshotsAtSource(Path sourcePath, String snapshotPrefix,
-      boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList,
-      ArrayList<String> prevSnaps, boolean isBootstrap) throws IOException {
+  private Map<String, SnapshotUtils.SnapshotCopyMode> createSnapshotsAtSource(Path sourcePath, Path targetPath, String snapshotPrefix,
+                                                                              boolean isSnapshotEnabled, HiveConf conf, SnapshotUtils.ReplSnapshotCount replSnapshotCount, FileList snapPathFileList,
+                                                                              ArrayList<String> prevSnaps, boolean isBootstrap) throws IOException {
+    Map<String, SnapshotUtils.SnapshotCopyMode> ret = new HashMap<>();
+    ret.put(snapshotPrefix, FALLBACK_COPY);
     if (!isSnapshotEnabled) {
       LOG.info("Snapshot copy not enabled for path {} Will use normal distCp for copying data.", sourcePath);
-      return FALLBACK_COPY;
+      return ret;
     }
+    String prefix = snapshotPrefix;
+    SnapshotUtils.SnapshotCopyMode copyMode = FALLBACK_COPY;
     DistributedFileSystem sourceDfs = SnapshotUtils.getDFS(sourcePath, conf);
     try {
-      if(isBootstrap) {
+      if(conf.getBoolVar(HiveConf.ConfVars.REPL_REUSE_SNAPSHOTS)) {
+        try {
+          FileStatus[] listing = sourceDfs.listStatus(new Path(sourcePath, ".snapshot"));
+          for (FileStatus elem : listing) {
+            String snapShotName = elem.getPath().getName();
+            if (snapShotName.contains(OLD_SNAPSHOT)) {
+              prefix = snapShotName.substring(0, snapShotName.lastIndexOf(OLD_SNAPSHOT));
+              break;
+            }
+            if (snapShotName.contains(NEW_SNAPSHOT)) {
+              prefix = snapShotName.substring(0, snapShotName.lastIndexOf(NEW_SNAPSHOT));
+              break;
+            }
+          }
+          ret.clear();
+          ret.put(prefix, copyMode);
+          snapshotPrefix = prefix;
+        } catch (SnapshotException e) {
+          //dir not snapshottable, continue
+        }
+      }
+      boolean isFirstSnapshotAvl =
+              SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, snapshotPrefix, OLD_SNAPSHOT, conf);
+      boolean isSecondSnapAvl =
+              SnapshotUtils.isSnapshotAvailable(sourceDfs, sourcePath, snapshotPrefix, NEW_SNAPSHOT, conf);
+      //for bootstrap and non - failback case, use initial_copy
+      if(isBootstrap && !(!isSecondSnapAvl && isFirstSnapshotAvl)) {

Review comment:
       Considered making this change, but realised this would then require a similar listing and name-checking for snapshot on the target side. This can occur in case of reverse replication after failover with different names of src and tgt dbs.
   Hence decided to proceed with this implementation itself as it allows for identifying which snapshots are being re-used. Also makes sense to include this work with dump considering that in general, dump should take lesser time than load (except possibly cases with just external tables in db with data-copy on src). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org