You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/04/13 06:40:02 UTC

[GitHub] [hive] ayushtkn commented on a diff in pull request #2993: Hive 25921: Overwrite table metadata for bootstraped tables.

ayushtkn commented on code in PR #2993:
URL: https://github.com/apache/hive/pull/2993#discussion_r849128393


##########
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java:
##########
@@ -157,10 +159,20 @@ public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
       Path incBootstrapDir = new Path(dumpDirectory, ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME);
       if (fs.exists(incBootstrapDir)) {
         if (isSecondFailover) {
-          String[] tableList = getBootstrapTableList(dumpDirParent, hiveConf);
-          tablesToBootstrap = Arrays.asList(tableList);
-          LOG.info("Optimised bootstrap for database {} with load with bootstrap table list as {}", dbNameToLoadIn,
+          String[] bootstrappedTables = getBootstrapTableList(new Path(dumpDirectory).getParent(), hiveConf);
+          tablesToBootstrap = new ArrayList<String>(Arrays.asList(bootstrappedTables));
+          LOG.info("Optimised bootstrap for database {} with load with bootstrapped table list as {}", dbNameToLoadIn,
               tablesToBootstrap);
+          ArrayList<String> tableList = new ArrayList<String>(Arrays.asList(bootstrappedTables));
+          // Get list of tables bootstrapped.
+          Path tableMetaPath = new Path(incBootstrapDir, EximUtil.METADATA_PATH_NAME + "/" + sourceDbName);
+          FileStatus[] listing = fs.listStatus(tableMetaPath);
+          for (FileStatus tablePath : listing) {
+            tableList.remove(tablePath.getPath().getName());
+          }
+          tablesToDrop = tableList;

Review Comment:
   They both are different:
   tablesToBootstrap are the ones which we need to bootstrap, the tables which got modified during a DR scenario. They exist both at source & target cluster, but have diverged. We would overwrite the table on target cluster using bootstrap. 
   
   tablesToDrop: This contains the tables, which exists on Target cluster but not on source cluster. This also happens only in case of Disaster Recovery(DR) scenario only. So, now since the table doesn't exist on source cluster we need to drop that table on target cluster, so that source & target cluster can be in sync in terms of tables available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org